- 09 8月, 2013 6 次提交
-
-
由 Tejun Heo 提交于
cgroup is currently in the process of transitioning to using css (cgroup_subsys_state) as the primary handle instead of cgroup in subsystem API. For hierarchy iterators, this is beneficial because * In most cases, css is the only thing subsystems care about anyway. * On the planned unified hierarchy, iterations for different subsystems will need to skip over different subtrees of the hierarchy depending on which subsystems are enabled on each cgroup. Passing around css makes it unnecessary to explicitly specify the subsystem in question as css is intersection between cgroup and subsystem * For the planned unified hierarchy, css's would need to be created and destroyed dynamically independent from cgroup hierarchy. Having cgroup core manage css iteration makes enforcing deref rules a lot easier. Most subsystem conversions are straight-forward. Noteworthy changes are * blkio: cgroup_to_blkcg() is no longer used. Removed. * freezer: cgroup_freezer() is no longer used. Removed. * devices: cgroup_to_devcgroup() is no longer used. Removed. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup. Please see the previous commit which converts the subsystem methods for rationale. This patch converts all cftype file operations to take @css instead of @cgroup. cftypes for the cgroup core files don't have their subsytem pointer set. These will automatically use the dummy_css added by the previous patch and can be converted the same way. Most subsystem conversions are straight forwards but there are some interesting ones. * freezer: update_if_frozen() is also converted to take @css instead of @cgroup for consistency. This will make the code look simpler too once iterators are converted to use css. * memory/vmpressure: mem_cgroup_from_css() needs to be exported to vmpressure while mem_cgroup_from_cont() can be made static. Updated accordingly. * cpu: cgroup_tg() doesn't have any user left. Removed. * cpuacct: cgroup_ca() doesn't have any user left. Removed. * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left. Removed. * net_cls: cgrp_cls_state() doesn't have any user left. Removed. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
-
由 Tejun Heo 提交于
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup * in subsystem implementations for the following reasons. * With unified hierarchy, subsystems will be dynamically bound and unbound from cgroups and thus css's (cgroup_subsys_state) may be created and destroyed dynamically over the lifetime of a cgroup, which is different from the current state where all css's are allocated and destroyed together with the associated cgroup. This in turn means that cgroup_css() should be synchronized and may return NULL, making it more cumbersome to use. * Differing levels of per-subsystem granularity in the unified hierarchy means that the task and descendant iterators should behave differently depending on the specific subsystem the iteration is being performed for. * In majority of the cases, subsystems only care about its part in the cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods often obtain the matching css pointer from the cgroup and don't bother with the cgroup pointer itself. Passing around css fits much better. This patch converts all cgroup_subsys methods to take @css instead of @cgroup. The conversions are mostly straight-forward. A few noteworthy changes are * ->css_alloc() now takes css of the parent cgroup rather than the pointer to the new cgroup as the css for the new cgroup doesn't exist yet. Knowing the parent css is enough for all the existing subsystems. * In kernel/cgroup.c::offline_css(), unnecessary open coded css dereference is replaced with local variable access. This patch shouldn't cause any behavior differences. v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced with local variable @css as suggested by Li Zefan. Rebased on top of new for-3.12 which includes for-3.11-fixes so that ->css_free() invocation added by da0a12ca ("cgroup: fix a leak when percpu_ref_init() fails") is converted too. Suggested by Li Zefan. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
-
由 Tejun Heo 提交于
Currently, controllers have to explicitly follow the cgroup hierarchy to find the parent of a given css. cgroup is moving towards using cgroup_subsys_state as the main controller interface construct, so let's provide a way to climb the hierarchy using just csses. This patch implements css_parent() which, given a css, returns its parent. The function is guarnateed to valid non-NULL parent css as long as the target css is not at the top of the hierarchy. freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices are converted to use css_parent() instead of accessing cgroup->parent directly. * __parent_ca() is dropped from cpuacct and its usage is replaced with parent_ca(). The only difference between the two was NULL test on cgroup->parent which is now embedded in css_parent() making the distinction moot. Note that eventually a css->parent field will be added to css and the NULL check in css_parent() will go away. This patch shouldn't cause any behavior differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
css (cgroup_subsys_state) is usually embedded in a subsys specific data structure. Subsystems either use container_of() directly to cast from css to such data structure or has an accessor function wrapping such cast. As cgroup as whole is moving towards using css as the main interface handle, add and update such accessors to ease dealing with css's. All accessors explicitly handle NULL input and return NULL in those cases. While this looks like an extra branch in the code, as all controllers specific data structures have css as the first field, the casting doesn't involve any offsetting and the compiler can trivially optimize out the branch. * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such accessor. Added. * memory, hugetlb and devices already had one but didn't explicitly handle NULL input. Updated. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 24 5月, 2013 1 次提交
-
-
由 Tejun Heo 提交于
During a config change, propagate_exception() needs to traverse the subtree to update config on the subtree. Because such config updates need to allocate memory, it couldn't directly use cgroup_for_each_descendant_pre() which required the whole iteration to be contained in a single RCU read critical section. To work around the limitation, propagate_exception() built a linked list of descendant cgroups while read-locking RCU and then walked the list afterwards, which is safe as the whole iteration is protected by devcgroup_mutex. This works but is cumbersome. With the recent updates, cgroup iterators now allow dropping RCU read lock while iteration is in progress making this workaround no longer necessary. This patch replaces dev_cgroup->propagate_pending list and get_online_devcg() with direct cgroup_for_each_descendant_pre() walk. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Aristeu Rozanski <aris@redhat.com> Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz>
-
- 19 4月, 2013 1 次提交
-
-
由 Rami Rosen 提交于
In devcgroup_css_alloc(), there is no longer need for parent_cgroup. bd2953eb("devcg: propagate local changes down the hierarchy") made the variable parent_cgroup redundant. This patch removes parent_cgroup from devcgroup_css_alloc(). Signed-off-by: NRami Rosen <ramirose@gmail.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 08 4月, 2013 1 次提交
-
-
由 Tejun Heo 提交于
bd2953eb ("devcg: propagate local changes down the hierarchy") implemented proper hierarchy support. Remove the broken tag. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com>
-
- 20 3月, 2013 4 次提交
-
-
由 Aristeu Rozanski 提交于
This patch makes exception changes to propagate down in hierarchy respecting when possible local exceptions. New exceptions allowing additional access to devices won't be propagated, but it'll be possible to add an exception to access all of part of the newly allowed device(s). New exceptions disallowing access to devices will be propagated down and the local group's exceptions will be revalidated for the new situation. Example: A / \ B group behavior exceptions A allow "b 8:* rwm", "c 116:1 rw" B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm" If a new exception is added to group A: # echo "c 116:* r" > A/devices.deny it'll propagate down and after revalidating B's local exceptions, the exception "c 116:2 rwm" will be removed. In case parent's exceptions change and local exceptions are not allowed anymore, they'll be deleted. v7: - do not allow behavior change when the cgroup has children - update documentation v6: fixed issues pointed by Serge Hallyn - only copy parent's exceptions while propagating behavior if the local behavior is different - while propagating exceptions, do not clear and copy parent's: it'd be against the premise we don't propagate access to more devices v5: fixed issues pointed by Serge Hallyn - updated documentation - not propagating when an exception is written to devices.allow - when propagating a new behavior, clean the local exceptions list if they're for a different behavior v4: fixed issues pointed by Tejun Heo - separated function to walk the tree and collect valid propagation targets v3: fixed issues pointed by Tejun Heo - update documentation - move css_online/css_offline changes to a new patch - use cgroup_for_each_descendant_pre() instead of own descendant walk - move exception_copy rework to a separared patch - move exception_clean rework to a separated patch v2: fixed issues pointed by Tejun Heo - instead of keeping the local settings that won't apply anymore, remove them Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Aristeu Rozanski 提交于
Allocate resources and change behavior only when online. This is needed in order to determine if a node is suitable for hierarchy propagation or if it's being removed. Locking: Both functions take devcgroup_mutex to make changes to device_cgroup structure. Hierarchy propagation will also take devcgroup_mutex before walking the tree while walking the tree itself is protected by rcu lock. Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Aristeu Rozanski 提交于
Currently may_access() is only able to verify if an exception is valid for the current cgroup, which has the same behavior. With hierarchy, it'll be also used to verify if a cgroup local exception is valid towards its cgroup parent, which might have different behavior. v2: - updated patch description - rebased on top of a new patch to expand the may_access() logic to make it more clear - fixed argument description order in may_access() Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Aristeu Rozanski 提交于
In order to make the next patch more clear, expand may_access() logic. v2: may_access() returns bool now Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 22 2月, 2013 1 次提交
-
-
由 Jerry Snitselaar 提交于
Commit 103a197c ("security/device_cgroup: lock assert fails in dev_exception_clean()") grabs devcgroup_mutex to fix assert failure, but a mutex can't be grabbed in rcu callback. Since there shouldn't be any other references when css_free is called, mutex isn't needed for list cleanup in devcgroup_css_free(). Signed-off-by: NJerry Snitselaar <jerry.snitselaar@oracle.com> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 1月, 2013 1 次提交
-
-
由 Jerry Snitselaar 提交于
devcgroup_css_free() calls dev_exception_clean() without the devcgroup_mutex being locked. Shutting down a kvm virt was giving me the following trace: [36280.732764] ------------[ cut here ]------------ [36280.732778] WARNING: at /home/snits/dev/linux/security/device_cgroup.c:172 dev_exception_clean+0xa9/0xc0() [36280.732782] Hardware name: Studio XPS 8100 [36280.732785] Modules linked in: xt_REDIRECT fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle bridge stp llc nf_conntrack_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter it87 hwmon_vid xt_state nf_conntrack ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq coretemp snd_seq_device crc32c_intel snd_pcm snd_page_alloc snd_timer snd broadcom tg3 serio_raw i7core_edac edac_core ptp pps_core lpc_ich pcspkr mfd_core soundcore microcode i2c_i801 nfsd auth_rpcgss nfs_acl lockd vhost_net sunrpc tun macvtap macvlan kvm_intel kvm uinput binfmt_misc autofs4 usb_storage firewire_ohci firewire_core crc_itu_t radeon drm_kms_helper ttm [36280.732921] Pid: 933, comm: libvirtd Tainted: G W 3.8.0-rc3-00307-g4c217de #1 [36280.732922] Call Trace: [36280.732927] [<ffffffff81044303>] warn_slowpath_common+0x93/0xc0 [36280.732930] [<ffffffff8104434a>] warn_slowpath_null+0x1a/0x20 [36280.732932] [<ffffffff812deaf9>] dev_exception_clean+0xa9/0xc0 [36280.732934] [<ffffffff812deb2a>] devcgroup_css_free+0x1a/0x30 [36280.732938] [<ffffffff810ccd76>] cgroup_diput+0x76/0x210 [36280.732941] [<ffffffff8119eac0>] d_delete+0x120/0x180 [36280.732943] [<ffffffff81195cff>] vfs_rmdir+0xef/0x130 [36280.732945] [<ffffffff81195e47>] do_rmdir+0x107/0x1c0 [36280.732949] [<ffffffff8132d17e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [36280.732951] [<ffffffff81198646>] sys_rmdir+0x16/0x20 [36280.732954] [<ffffffff8173bd82>] system_call_fastpath+0x16/0x1b [36280.732956] ---[ end trace ca39dced899a7d9f ]--- Signed-off-by: NJerry Snitselaar <jerry.snitselaar@oracle.com> Cc: stable@kernel.org Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 20 11月, 2012 1 次提交
-
-
由 Tejun Heo 提交于
Rename cgroup_subsys css lifetime related callbacks to better describe what their roles are. Also, update documentation. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 07 11月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
device_cgroup uses RCU safe ->exceptions list which is write-protected by devcgroup_mutex and has had some issues using locking correctly. Add lockdep asserts to utility functions so that future errors can be easily detected. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
dev_cgroup->exceptions is protected with devcgroup_mutex for writes and RCU for reads; however, RCU usage isn't correct. * dev_exception_clean() doesn't use RCU variant of list_del() and kfree(). The function can race with may_access() and may_access() may end up dereferencing already freed memory. Use list_del_rcu() and kfree_rcu() instead. * may_access() may be called only with RCU read locked but doesn't use RCU safe traversal over ->exceptions. Use list_for_each_entry_rcu(). Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com> Cc: stable@vger.kernel.org Cc: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
-
- 06 11月, 2012 1 次提交
-
-
由 Aristeu Rozanski 提交于
In 4cef7299 ("device_cgroup: add proper checking when changing default behavior") the cgroup parent usage is unchecked. root will not have a parent and trying to use device.{allow,deny} will cause problems. For some reason my stressing scripts didn't test the root directory so I didn't catch it on my regular tests. Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 26 10月, 2012 4 次提交
-
-
由 Aristeu Rozanski 提交于
Before changing a group's default behavior to ALLOW, we must check if its parent's behavior is also ALLOW. Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aristeu Rozanski 提交于
Convert the code to use kstrtou32() instead of simple_strtoul() which is deprecated. The real size of the variables are u32, so use kstrtou32 instead of kstrtoul Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aristeu Rozanski 提交于
This was done in a v2 patch but v1 ended up being committed. The variable name is less confusing and stores the default behavior when no matching exception exists. Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jiri Slaby 提交于
Commit ad676077 ("device_cgroup: convert device_cgroup internally to policy + exceptions") removed rcu locks which are needed in task_devcgroup called in this chain: devcgroup_inode_mknod OR __devcgroup_inode_permission -> __devcgroup_inode_permission -> task_devcgroup -> task_subsys_state -> task_subsys_state_check. Change the code so that task_devcgroup is safely called with rcu read lock held. =============================== [ INFO: suspicious RCU usage. ] 3.6.0-rc5-next-20120913+ #42 Not tainted ------------------------------- include/linux/cgroup.h:553 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by kdevtmpfs/23: #0: (sb_writers){.+.+.+}, at: [<ffffffff8116873f>] mnt_want_write+0x1f/0x50 #1: (&sb->s_type->i_mutex_key#3/1){+.+.+.}, at: [<ffffffff811558af>] kern_path_create+0x7f/0x170 stack backtrace: Pid: 23, comm: kdevtmpfs Not tainted 3.6.0-rc5-next-20120913+ #42 Call Trace: lockdep_rcu_suspicious+0xfd/0x130 devcgroup_inode_mknod+0x19d/0x240 vfs_mknod+0x71/0xf0 handle_create.isra.2+0x72/0x200 devtmpfsd+0x114/0x140 ? handle_create.isra.2+0x200/0x200 kthread+0xd6/0xe0 kernel_thread_helper+0x4/0x10 Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Dave Jones <davej@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 10月, 2012 4 次提交
-
-
由 Aristeu Rozanski 提交于
This patch replaces the "whitelist" usage in the code and comments and replace them by exception list related information. Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aristeu Rozanski 提交于
The original model of device_cgroup is having a whitelist where all the allowed devices are listed. The problem with this approach is that is impossible to have the case of allowing everything but few devices. The reason for that lies in the way the whitelist is handled internally: since there's only a whitelist, the "all devices" entry would have to be removed and replaced by the entire list of possible devices but the ones that are being denied. Since dev_t is 32 bits long, representing the allowed devices as a bitfield is not memory efficient. This patch replaces the "whitelist" by a "exceptions" list and the default policy is kept as "deny_all" variable in dev_cgroup structure. The current interface determines that whenever "a" is written to devices.allow or devices.deny, the entry masking all devices will be added or removed, respectively. This behavior is kept and it's what will determine the default policy: # cat devices.list a *:* rwm # echo a >devices.deny # cat devices.list # echo a >devices.allow # cat devices.list a *:* rwm The interface is also preserved. For example, if one wants to block only access to /dev/null: # ls -l /dev/null crw-rw-rw- 1 root root 1, 3 Jul 24 16:17 /dev/null # echo a >devices.allow # echo "c 1:3 rwm" >devices.deny # cat /dev/null cat: /dev/null: Operation not permitted # echo >/dev/null bash: /dev/null: Operation not permitted mknod /tmp/null c 1 3 mknod: `/tmp/null': Operation not permitted # echo "c 1:3 r" >devices.allow # cat /dev/null # echo >/dev/null bash: /dev/null: Operation not permitted mknod /tmp/null c 1 3 mknod: `/tmp/null': Operation not permitted # echo "c 1:3 rw" >devices.allow # echo >/dev/null # cat /dev/null # mknod /tmp/null c 1 3 mknod: `/tmp/null': Operation not permitted # echo "c 1:3 rwm" >devices.allow # echo >/dev/null # cat /dev/null # mknod /tmp/null c 1 3 # Note that I didn't rename the functions/variables in this patch, but in the next one to make reviewing easier. Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aristeu Rozanski 提交于
This function cleans all the items in a whitelist and will be used by the next patches. Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aristeu Rozanski 提交于
deny_all will determine if the default policy is to deny all device access unless for the ones in the exception list. This variable will be used in the next patches to convert device_cgroup internally into a default policy + rules. Signed-off-by: NAristeu Rozanski <aris@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 9月, 2012 1 次提交
-
-
由 Tejun Heo 提交于
Currently, cgroup hierarchy support is a mess. cpu related subsystems behave correctly - configuration, accounting and control on a parent properly cover its children. blkio and freezer completely ignore hierarchy and treat all cgroups as if they're directly under the root cgroup. Others show yet different behaviors. These differing interpretations of cgroup hierarchy make using cgroup confusing and it impossible to co-mount controllers into the same hierarchy and obtain sane behavior. Eventually, we want full hierarchy support from all subsystems and probably a unified hierarchy. Users using separate hierarchies expecting completely different behaviors depending on the mounted subsystem is deterimental to making any progress on this front. This patch adds cgroup_subsys.broken_hierarchy and sets it to %true for controllers which are lacking in hierarchy support. The goal of this patch is two-fold. * Move users away from using hierarchy on currently non-hierarchical subsystems, so that implementing proper hierarchy support on those doesn't surprise them. * Keep track of which controllers are broken how and nudge the subsystems to implement proper hierarchy support. For now, start with a single warning message. We can whine louder later on. v2: Fixed a typo spotted by Michal. Warning message updated. v3: Updated memcg part so that it doesn't generate warning in the cases where .use_hierarchy=false doesn't make the behavior different from root.use_hierarchy=true. Fixed a typo spotted by Glauber. v4: Check ->broken_hierarchy after cgroup creation is complete so that ->create() can affect the result per Michal. Dropped unnecessary memcg root handling per Michal. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NSerge E. Hallyn <serue@us.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
- 02 4月, 2012 1 次提交
-
-
由 Tejun Heo 提交于
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio, net_cls and device controllers to use the new cftype based interface. Termination entry is added to cftype arrays and populate callbacks are replaced with cgroup_subsys->base_cftypes initializations. This is functionally identical transformation. There shouldn't be any visible behavior change. memcg is rather special and will be converted separately. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vivek Goyal <vgoyal@redhat.com>
-
- 03 2月, 2012 1 次提交
-
-
由 Li Zefan 提交于
The argument is not used at all, and it's not necessary, because a specific callback handler of course knows which subsys it belongs to. Now only ->pupulate() takes this argument, because the handlers of this callback always call cgroup_add_file()/cgroup_add_files(). So we reduce a few lines of code, though the shrinking of object size is minimal. 16 files changed, 113 insertions(+), 162 deletions(-) text data bss dec hex filename 5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig 5486170 656987 7039960 13183117 c9288d vmlinux.o Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 13 12月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
Currently, there's no way to pass multiple tasks to cgroup_subsys methods necessitating the need for separate per-process and per-task methods. This patch introduces cgroup_taskset which can be used to pass multiple tasks and their associated cgroups to cgroup_subsys methods. Three methods - can_attach(), cancel_attach() and attach() - are converted to use cgroup_taskset. This unifies passed parameters so that all methods have access to all information. Conversions in this patchset are identical and don't introduce any behavior change. -v2: documentation updated as per Paul Menage's suggestion. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPaul Menage <paul@paulmenage.org> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org>
-
- 21 7月, 2011 1 次提交
-
-
由 Lai Jiangshan 提交于
The rcu callback whitelist_item_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(whitelist_item_free). Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NJames Morris <jmorris@namei.org> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 20 6月, 2011 1 次提交
-
-
由 Al Viro 提交于
inode_permission() calls devcgroup_inode_permission() and almost all such calls are _not_ for device nodes; let's at least keep the common path straight... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 5月, 2011 1 次提交
-
-
由 Ben Blum 提交于
Add cgroup subsystem callbacks for per-thread attachment in atomic contexts Add can_attach_task(), pre_attach(), and attach_task() as new callbacks for cgroups's subsystem interface. Unlike can_attach and attach, these are for per-thread operations, to be called potentially many times when attaching an entire threadgroup. Also, the old "bool threadgroup" interface is removed, as replaced by this. All subsystems are modified for the new interface - of note is cpuset, which requires from/to nodemasks for attach to be globally scoped (though per-cpuset would work too) to persist from its pre_attach to attach_task and attach. This is a pre-patch for cgroup-procs-writable.patch. Signed-off-by: NBen Blum <bblum@andrew.cmu.edu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Reviewed-by: NPaul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 4月, 2010 1 次提交
-
-
由 Justin P. Mattock 提交于
Whitespace coding style fixes. Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 30 3月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-
- 24 9月, 2009 1 次提交
-
-
由 Ben Blum 提交于
Alter the ss->can_attach and ss->attach functions to be able to deal with a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a pre-patch to cgroup-procs-writable.patch.) Currently, new mode of the attach function can only tell the subsystem about the old cgroup of the threadgroup leader. No subsystem currently needs that information for each thread that's being moved, but if one were to be added (for example, one that counts tasks within a group) this bit would need to be reworked a bit to tell the subsystem the right information. [hidave.darkstar@gmail.com: fix build] Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NMatt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 6月, 2009 1 次提交
-
-
由 Li Zefan 提交于
While walking through the whitelist, if the DEV_ALL item is found, no more check is needed. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2009 1 次提交
-
-
由 Li Zefan 提交于
There is nothing special that has to be protected by cgroup_lock, so introduce devcgroup_mtuex for it's own use. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 1月, 2009 1 次提交
-
-
由 Serge E. Hallyn 提交于
The devcgroup_inode_permission() hook in the devices whitelist cgroup has always bypassed access checks on fifos. But the mknod hook did not. The devices whitelist is only about block and char devices, and fifos can't even be added to the whitelist, so fifos can't be created at all except by tasks which have 'a' in their whitelist (meaning they have access to all devices). Fix the behavior by bypassing access checks to mkfifo. Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org> Reported-by: NDaniel Lezcano <dlezcano@fr.ibm.com> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-