- 13 2月, 2014 1 次提交
-
-
由 Tejun Heo 提交于
If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching css. The intention of the interface is to make it easy to skip css's (cgroup_subsys_states) which already match the migration target; however, this is entirely unnecessary as migration taskset doesn't include tasks which are already in the target cgroup. Drop @skip_css from cgroup_taskset_for_each(). Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Daniel Borkmann <dborkman@redhat.com>
-
- 08 2月, 2014 3 次提交
-
-
由 Tejun Heo 提交于
cgroup_subsys is a bit messier than it needs to be. * The name of a subsys can be different from its internal identifier defined in cgroup_subsys.h. Most subsystems use the matching name but three - cpu, memory and perf_event - use different ones. * cgroup_subsys_id enums are postfixed with _subsys_id and each cgroup_subsys is postfixed with _subsys. cgroup.h is widely included throughout various subsystems, it doesn't and shouldn't have claim on such generic names which don't have any qualifier indicating that they belong to cgroup. * cgroup_subsys->subsys_id should always equal the matching cgroup_subsys_id enum; however, we require each controller to initialize it and then BUG if they don't match, which is a bit silly. This patch cleans up cgroup_subsys names and initialization by doing the followings. * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each cgroup_subsys with _cgrp_subsys. * With the above, renaming subsys identifiers to match the userland visible names doesn't cause any naming conflicts. All non-matching identifiers are renamed to match the official names. cpu_cgroup -> cpu mem_cgroup -> memory perf -> perf_event * controllers no longer need to initialize ->subsys_id and ->name. They're generated in cgroup core and set automatically during boot. * Redundant cgroup_subsys declarations removed. * While updating BUG_ON()s in cgroup_init_early(), convert them to WARN()s. BUGging that early during boot is stupid - the kernel can't print anything, even through serial console and the trap handler doesn't even link stack frame properly for back-tracing. This patch doesn't introduce any behavior changes. v2: Rebased on top of fe1217c4 ("net: net_cls: move cgroupfs classid handling into core"). Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: N"David S. Miller" <davem@davemloft.net> Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Acked-by: NIngo Molnar <mingo@redhat.com> Acked-by: NLi Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Thomas Graf <tgraf@suug.ch>
-
由 Tejun Heo 提交于
With module supported dropped from net_prio, no controller is using cgroup module support. None of actual resource controllers can be built as a module and we aren't gonna add new controllers which don't control resources. This patch drops module support from cgroup. * cgroup_[un]load_subsys() and cgroup_subsys->module removed. * As there's no point in distinguishing IS_BUILTIN() and IS_MODULE(), cgroup_subsys.h now uses IS_ENABLED() directly. * enum cgroup_subsys_id now exactly matches the list of enabled controllers as ordered in cgroup_subsys.h. * cgroup_subsys[] is now a contiguously occupied array. Size specification is no longer necessary and dropped. * for_each_builtin_subsys() is removed and for_each_subsys() is updated to not require any locking. * module ref handling is removed from rebind_subsystems(). * Module related comments dropped. v2: Rebased on top of fe1217c4 ("net: net_cls: move cgroupfs classid handling into core"). v3: Added {} around the if (need_forkexit_callback) block in cgroup_post_fork() for readability as suggested by Li. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
net_prio is the only cgroup which is allowed to be built as a module. The savings from allowing one controller to be built as a module are tiny especially given that cgroup module support itself adds quite a bit of complexity. Given that none of other controllers has much chance of being made a module and that we're unlikely to add new modular controllers, the added complexity is simply not justifiable. As a first step to drop cgroup module support, this patch changes the config option to bool from tristate and drops module related code from it. Also, while an earlier commit fe1217c4 ("net: net_cls: move cgroupfs classid handling into core") dropped module support from net_cls cgroup, it retained a call to cgroup_load_subsys(), which is noop for built-in controllers. Drop it along with init_netclassid_cgroup(). v2: Removed modular version of task_netprioidx() in include/net/netprio_cgroup.h as suggested by Li Zefan. v3: Rebased on top of fe1217c4 ("net: net_cls: move cgroupfs classid handling into core"). net_cls cgroup part is mostly dropped except for removal of init_netclassid_cgroup(). Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: N"David S. Miller" <davem@davemloft.net> Acked-by: NLi Zefan <lizefan@huawei.com> Cc: Thomas Graf <tgraf@suug.ch>
-
- 11 12月, 2013 1 次提交
-
-
由 stephen hemminger 提交于
Various spelling fixes in networking stack Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2013 2 次提交
-
-
由 Tejun Heo 提交于
In preparation of conversion to kernfs, cgroup file handling is updated so that it can be easily mapped to kernfs. This patch replaces cftype->read_seq_string() with cftype->seq_show() which is not limited to single_open() operation and will map directcly to kernfs seq_file interface. The conversions are mechanical. As ->seq_show() doesn't have @css and @cft, the functions which make use of them are converted to use seq_css() and seq_cft() respectively. In several occassions, e.f. if it has seq_string in its name, the function name is updated to fit the new method better. This patch does not introduce any behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NAristeu Rozanski <arozansk@redhat.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NLi Zefan <lizefan@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Neil Horman <nhorman@tuxdriver.com>
-
由 Tejun Heo 提交于
In preparation of conversion to kernfs, cgroup file handling is being consolidated so that it can be easily mapped to the seq_file based interface of kernfs. cftype->read_map() doesn't add any value and being replaced with ->read_seq_string(). Update read_priomap() to use ->read_seq_string() instead. This patch doesn't make any visible behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 09 10月, 2013 1 次提交
-
-
由 Gao feng 提交于
Since the tasks have been migrated to the cgroup, there is no need to call task_netprioidx to get task's cgroup id. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2013 5 次提交
-
-
由 Tejun Heo 提交于
cgroup is in the process of converting to css (cgroup_subsys_state) from cgroup as the principal subsystem interface handle. This is mostly to prepare for the unified hierarchy support where css's will be created and destroyed dynamically but also helps cleaning up subsystem implementations as css is usually what they are interested in anyway. cgroup_taskset which is used by the subsystem attach methods is the last cgroup subsystem API which isn't using css as the handle. Update cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp. The conversions are pretty mechanical. One exception is cpuset::cgroup_cs(), which lost its last user and got removed. This patch shouldn't introduce any functional changes. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>
-
由 Tejun Heo 提交于
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup. Please see the previous commit which converts the subsystem methods for rationale. This patch converts all cftype file operations to take @css instead of @cgroup. cftypes for the cgroup core files don't have their subsytem pointer set. These will automatically use the dummy_css added by the previous patch and can be converted the same way. Most subsystem conversions are straight forwards but there are some interesting ones. * freezer: update_if_frozen() is also converted to take @css instead of @cgroup for consistency. This will make the code look simpler too once iterators are converted to use css. * memory/vmpressure: mem_cgroup_from_css() needs to be exported to vmpressure while mem_cgroup_from_cont() can be made static. Updated accordingly. * cpu: cgroup_tg() doesn't have any user left. Removed. * cpuacct: cgroup_ca() doesn't have any user left. Removed. * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left. Removed. * net_cls: cgrp_cls_state() doesn't have any user left. Removed. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
-
由 Tejun Heo 提交于
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup * in subsystem implementations for the following reasons. * With unified hierarchy, subsystems will be dynamically bound and unbound from cgroups and thus css's (cgroup_subsys_state) may be created and destroyed dynamically over the lifetime of a cgroup, which is different from the current state where all css's are allocated and destroyed together with the associated cgroup. This in turn means that cgroup_css() should be synchronized and may return NULL, making it more cumbersome to use. * Differing levels of per-subsystem granularity in the unified hierarchy means that the task and descendant iterators should behave differently depending on the specific subsystem the iteration is being performed for. * In majority of the cases, subsystems only care about its part in the cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods often obtain the matching css pointer from the cgroup and don't bother with the cgroup pointer itself. Passing around css fits much better. This patch converts all cgroup_subsys methods to take @css instead of @cgroup. The conversions are mostly straight-forward. A few noteworthy changes are * ->css_alloc() now takes css of the parent cgroup rather than the pointer to the new cgroup as the css for the new cgroup doesn't exist yet. Knowing the parent css is enough for all the existing subsystems. * In kernel/cgroup.c::offline_css(), unnecessary open coded css dereference is replaced with local variable access. This patch shouldn't cause any behavior differences. v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced with local variable @css as suggested by Li Zefan. Rebased on top of new for-3.12 which includes for-3.11-fixes so that ->css_free() invocation added by da0a12ca ("cgroup: fix a leak when percpu_ref_init() fails") is converted too. Suggested by Li Zefan. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
-
由 Tejun Heo 提交于
cgroup controller API will be converted to primarily use struct cgroup_subsys_state instead of struct cgroup. In preparation, make the internal functions of netprio_cgroup pass around @css instead of @cgrp. While at it, kill struct cgroup_netprio_state which only contained struct cgroup_subsys_state without serving any purpose. All functions are converted to deal with @css directly. This patch shouldn't cause any behavior differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 29 5月, 2013 1 次提交
-
-
由 Jiri Pirko 提交于
So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: NJiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 4月, 2013 1 次提交
-
-
由 David S. Miller 提交于
This reverts commit 763eff57. It causes build regressions, as per Stephen Rothwell: ==================== After merging the final tree, today's linux-next build (powerpc allyesconfig) failed like this: net/core/netprio_cgroup.c:250:29: error: static declaration of 'net_prio_subsys' follows non-static declaration include/linux/cgroup_subsys.h:71:1: note: previous declaration of 'net_prio_subsys' was here ==================== Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 4月, 2013 1 次提交
-
-
由 stephen hemminger 提交于
Minor sparse warning Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2013 1 次提交
-
-
由 Joe Perches 提交于
alloc failures already get standardized OOM messages and a dump_stack. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 11月, 2012 6 次提交
-
-
由 Tejun Heo 提交于
Inherit netprio configuration from ->css_online(), allow nesting and remove .broken_hierarchy marking. This makes netprio_cgroup's behavior match netcls_cgroup's. Note that this patch changes userland-visible behavior. Nesting is allowed and the first level cgroups below the root cgroup behave differently - they inherit priorities from the root cgroup on creation instead of starting with 0. This is unfortunate but not doing so is much crazier. Signed-off-by: NTejun Heo <tj@kernel.org> Tested-and-Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
Introduce two helpers - netprio_prio() and netprio_set_prio() - which hide the details of priomap access and expansion. This will help implementing hierarchy support. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
With priomap expansion no longer depending on knowing max id allocated, netprio_cgroup can use cgroup->id insted of cs->prioidx. Drop prioidx alloc/free logic and convert all uses to cgroup->id. * In cgrp_css_alloc(), parent->id test is moved above @cs allocation to simplify error path. * In cgrp_css_free(), @cs assignment is made initialization. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
netprio kept track of the highest prioidx allocated and resized priomaps accordingly when necessary. This makes it necessary to keep track of prioidx allocation and may end up resizing on every new prioidx. Update extend_netdev_table() such that it takes @target_idx which the priomap should be able to accomodate. If the priomap is large enough, nothing happens; otherwise, the size is doubled until @target_idx can be accomodated. This makes max_prioidx and write_update_netdev_table() unnecessary. write_priomap() now calls extend_netdev_table() directly. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
The function is about to go through a rewrite. In preparation, shorten the variable names so that we don't repeat "priomap" so often. This patch is cosmetic. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
sscanf() doesn't bite. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 11月, 2012 1 次提交
-
-
由 Tejun Heo 提交于
Rename cgroup_subsys css lifetime related callbacks to better describe what their roles are. Also, update documentation. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 26 10月, 2012 1 次提交
-
-
由 Daniel Wagner 提交于
net_prio_attach() is only access via cgroup_subsys callbacks, therefore we can reduce the visibility of this function. Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Tejun Heo <tj@kernel.org> Cc: <netdev@vger.kernel.org> Cc: <cgroups@vger.kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 9月, 2012 1 次提交
-
-
由 Al Viro 提交于
iterates through the opened files in given descriptor table, calling a supplied function; we stop once non-zero is returned. Callback gets struct file *, descriptor number and const void * argument passed to iterator. It is called with files->file_lock held, so it is not allowed to block. tty_io, netprio_cgroup and selinux flush_unauthorized_files() converted to its use. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 9月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
Currently, cgroup hierarchy support is a mess. cpu related subsystems behave correctly - configuration, accounting and control on a parent properly cover its children. blkio and freezer completely ignore hierarchy and treat all cgroups as if they're directly under the root cgroup. Others show yet different behaviors. These differing interpretations of cgroup hierarchy make using cgroup confusing and it impossible to co-mount controllers into the same hierarchy and obtain sane behavior. Eventually, we want full hierarchy support from all subsystems and probably a unified hierarchy. Users using separate hierarchies expecting completely different behaviors depending on the mounted subsystem is deterimental to making any progress on this front. This patch adds cgroup_subsys.broken_hierarchy and sets it to %true for controllers which are lacking in hierarchy support. The goal of this patch is two-fold. * Move users away from using hierarchy on currently non-hierarchical subsystems, so that implementing proper hierarchy support on those doesn't surprise them. * Keep track of which controllers are broken how and nudge the subsystems to implement proper hierarchy support. For now, start with a single warning message. We can whine louder later on. v2: Fixed a typo spotted by Michal. Warning message updated. v3: Updated memcg part so that it doesn't generate warning in the cases where .use_hierarchy=false doesn't make the behavior different from root.use_hierarchy=true. Fixed a typo spotted by Glauber. v4: Check ->broken_hierarchy after cgroup creation is complete so that ->create() can affect the result per Michal. Dropped unnecessary memcg root handling per Michal. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NSerge E. Hallyn <serue@us.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
由 Daniel Wagner 提交于
WARNING: With this change it is impossible to load external built controllers anymore. In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is set, corresponding subsys_id should also be a constant. Up to now, net_prio_subsys_id and net_cls_subsys_id would be of the type int and the value would be assigned during runtime. By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN to IS_ENABLED, all *_subsys_id will have constant value. That means we need to remove all the code which assumes a value can be assigned to net_prio_subsys_id and net_cls_subsys_id. A close look is necessary on the RCU part which was introduces by following patch: commit f8451725 Author: Herbert Xu <herbert@gondor.apana.org.au> Mon May 24 09:12:34 2010 Committer: David S. Miller <davem@davemloft.net> Mon May 24 09:12:34 2010 cls_cgroup: Store classid in struct sock Tis code was added to init_cgroup_cls() /* We can't use rcu_assign_pointer because this is an int. */ smp_wmb(); net_cls_subsys_id = net_cls_subsys.subsys_id; respectively to exit_cgroup_cls() net_cls_subsys_id = -1; synchronize_rcu(); and in module version of task_cls_classid() rcu_read_lock(); id = rcu_dereference(net_cls_subsys_id); if (id >= 0) classid = container_of(task_subsys_state(p, id), struct cgroup_cls_state, css)->classid; rcu_read_unlock(); Without an explicit explaination why the RCU part is needed. (The rcu_deference was fixed by exchanging it to rcu_derefence_index_check() in a later commit, but that is a minor detail.) So here is my pondering why it was introduced and why it safe to remove it now. Note that this code was copied over to net_prio the reasoning holds for that subsystem too. The idea behind the RCU use for net_cls_subsys_id is to make sure we get a valid pointer back from task_subsys_state(). task_subsys_state() is just blindly accessing the subsys array and returning the pointer. Obviously, passing in -1 as id into task_subsys_state() returns an invalid value (out of lower bound). So this code makes sure that only after module is loaded and the subsystem registered, the id is assigned. Before unregistering the module all old readers must have left the critical section. This is done by assigning -1 to the id and issuing a synchronized_rcu(). Any new readers wont call task_subsys_state() anymore and therefore it is safe to unregister the subsystem. The new code relies on the same trick, but it looks at the subsys pointer return by task_subsys_state() (remember the id is constant and therefore we allways have a valid index into the subsys array). No precautions need to be taken during module loading module. Eventually, all CPUs will get a valid pointer back from task_subsys_state() because rebind_subsystem() which is called after the module init() function will assigned subsys[net_cls_subsys_id] the newly loaded module subsystem pointer. When the subsystem is about to be removed, rebind_subsystem() will called before the module exit() function. In this case, rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer and then it calls synchronize_rcu(). All old readers have left by then the critical section. Any new reader wont access the subsystem anymore. At this point we are safe to unregister the subsystem. No synchronize_rcu() call is needed. Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
-
- 14 9月, 2012 2 次提交
-
-
由 Srivatsa S. Bhat 提交于
Replace the current (inefficient) for-loop with memcpy, to copy priomap. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Srivatsa S. Bhat 提交于
The update_netdev_tables() function appears to be unnecessary, since the write_update_netdev_table() function will adjust the priomaps as and when required anyway. So drop the usage of update_netdev_tables() entirely. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 8月, 2012 2 次提交
-
-
由 John Fastabend 提交于
A race exists where creating cgroups and also updating the priomap may result in losing a priomap update. This is because priomap writers are not protected by rtnl_lock. Move priority writer into rtnl_lock()/rtnl_unlock(). CC: Neil Horman <nhorman@tuxdriver.com> Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
Add lock to prevent a race with a file closing and also remove useless and ugly sscanf code. The extra code was never needed and the case it supposedly protected against is in fact handled correctly by sock_from_file as pointed out by Al Viro. CC: Neil Horman <nhorman@tuxdriver.com> Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 7月, 2012 1 次提交
-
-
由 John Fastabend 提交于
Instead of updating the sk_cgrp_prioidx struct field on every send this only updates the field when a task is moved via cgroup infrastructure. This allows sockets that may be used by a kernel worker thread to be managed. For example in the iscsi case today a user can put iscsid in a netprio cgroup and control traffic will be sent with the correct sk_cgrp_prioidx value set but as soon as data is sent the kernel worker thread isssues a send and sk_cgrp_prioidx is updated with the kernel worker threads value which is the default case. It seems more correct to only update the field when the user explicitly sets it via control group infrastructure. This allows the users to manage sockets that may be used with other threads. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 7月, 2012 1 次提交
-
-
由 Gao feng 提交于
there are some out of bound accesses in netprio cgroup. now before accessing the dev->priomap.priomap array,we only check if the dev->priomap exist.and because we don't want to see additional bound checkings in fast path, so we should make sure that dev->priomap is null or array size of dev->priomap.priomap is equal to max_prioidx + 1; so in write_priomap logic,we should call extend_netdev_table when dev->priomap is null and dev->priomap.priomap_len < max_len. and in cgrp_create->update_netdev_tables logic,we should call extend_netdev_table only when dev->priomap exist and dev->priomap.priomap_len < max_len. and it's not needed to call update_netdev_tables in write_priomap, we can only allocate the net device's priomap which we change through net_prio.ifpriomap. this patch also add a return value for update_netdev_tables & extend_netdev_table, so when new_priomap is allocated failed, write_priomap will stop to access the priomap,and return -ENOMEM back to the userspace to tell the user what happend. Change From v3: 1. add rtnl protect when reading max_prioidx in write_priomap. 2. only call extend_netdev_table when map->priomap_len < max_len, this will make sure array size of dev->map->priomap always bigger than any prioidx. 3. add a function write_update_netdev_table to make codes clear. Change From v2: 1. protect extend_netdev_table by RTNL. 2. when extend_netdev_table failed,call dev_put to reduce device's refcount. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 7月, 2012 2 次提交
-
-
由 Liu Bo 提交于
poitner -> pointer. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Eric Dumazet 提交于
dev->priomap is allocated by extend_netdev_table() called from update_netdev_tables(). And this is only called if write_priomap() is called. But if write_priomap() is not called, it seems we can have out of bounds accesses in cgrp_destroy(), read_priomap() & skb_update_prio() With help from Gao Feng Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 7月, 2012 1 次提交
-
-
由 Gao feng 提交于
we set max_prioidx to the first zero bit index of prioidx_map in function get_prioidx. So when we delete the low index netprio cgroup and adding a new netprio cgroup again,the max_prioidx will be set to the low index. when we set the high index cgroup's net_prio.ifpriomap,the function write_priomap will call update_netdev_tables to alloc memory which size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1), so the size of array that map->priomap point to is max_prioidx +1, which is low than what we actually need. fix this by adding check in get_prioidx,only set max_prioidx when max_prioidx low than the new prioidx. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2012 1 次提交
-
-
由 Joe Perches 提交于
Use the current logging style. This enables use of dynamic debugging as well. Convert printk(KERN_<LEVEL> to pr_<level>. Add pr_fmt. Remove embedded prefixes, use %s, __func__ instead. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 4月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio, net_cls and device controllers to use the new cftype based interface. Termination entry is added to cftype arrays and populate callbacks are replaced with cgroup_subsys->base_cftypes initializations. This is functionally identical transformation. There shouldn't be any visible behavior change. memcg is rather special and will be converted separately. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
blk-cgroup, netprio_cgroup, cls_cgroup and tcp_memcontrol unnecessarily define cftype array and cgroup_subsys structures at the top of the file, which is unconventional and necessiates forward declaration of methods. This patch relocates those below the definitions of the methods and removes the forward declarations. Note that forward declaration of tcp_files[] is added in tcp_memcontrol.c for tcp_init_cgroup(). This will be removed soon by another patch. This patch doesn't introduce any functional change. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
-