- 24 4月, 2020 2 次提交
-
-
由 Yihao Wu 提交于
to #26424323 Since we concern idle, let's take idle as the center state. And omit transition between other stats. Below is the state transition graph: sleep->deque +-----------+ cpumask +-------+ exit->deque +-------+ |ineffective|-------- | idle | <-----------|running| +-----------+ +-------+ +-------+ ^ | unthrtl child -> deque | | wake -> deque | |thrtl chlid -> enque migrate -> deque | |migrate -> enque | v +-------+ | steal | +-------+ We conclude idle state condition as: !se->on_rq && !my_q->throttled && cpu allowed. From this graph and condition, we can hook (de|en)queue_task_fair update_cpumasks_hier, (un|)throttle_cfs_rq to account idle state. In the hooked functions, we also check the conditions, to avoid accounting unwanted cpu clocks. Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com> Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
-
由 Xunlei Pang 提交于
to #26424323 It's relatively easy to maintain nr_uninterruptible in scheduler compared to doing it in cpuacct, we assume that "cpu,cpuacct" are bound together, so that it can be used for per-cgroup load. This will be needed to calculate per-cgroup load average later. Reviewed-by: NMichael Wang <yun.wang@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
-
- 18 3月, 2020 2 次提交
-
-
由 Xu Yu 提交于
Since commit e0205ae40f12 ("mm: memcontrol: use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks()") made mem_cgroup_scan_tasks() to check only one thread from each thread group, we can make cgroup_subsys_state::nr_tasks to record only the thread group leader, i.e., process, instead of thread(s). Furthermore, this renames cgroup_subsys_state::nr_tasks to cgroup_subsys_state::nr_procs. Fixes: f061cd88 ("alinux: kernel: cgroup: account number of tasks in the css and its descendants") Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Wenwei Tao 提交于
Account number of the tasks in the css and its descendants, this is prepared for the incoming memcg priority patch. In memcg priority oom, we will select victim cgroup which has victim tasks in it. We need to know whether the memcg and its descendants have tasks before the selection can move on. Signed-off-by: NWenwei Tao <wenwei.tao@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
- 17 1月, 2020 1 次提交
-
-
由 Xunlei Pang 提交于
Export "cpu|io|memory.pressure" under cgroup v1 "cpuacct" subsystem. Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
- 27 12月, 2019 7 次提交
-
-
由 Tejun Heo 提交于
commit 38cf3a687f5827fcfc81cbc433ef5822693a49c1 upstream. a5e112e6424a ("cgroup: add cgroup_parse_float()") accidentally added cgroup_parse_float() inside CONFIG_SYSFS block. Move it outside so that it doesn't cause failures on !CONFIG_SYSFS builds. Signed-off-by: NTejun Heo <tj@kernel.org> Fixes: a5e112e6424a ("cgroup: add cgroup_parse_float()") Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Tejun Heo 提交于
commit a5e112e6424adb77d953eac20e6936b952fd6b32 upstream. cgroup already uses floating point for percent[ile] numbers and there are several controllers which want to take them as input. Add a generic parse helper to handle inputs. Update the interface convention documentation about the use of percentage numbers. While at it, also clarify the default time unit. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Dan Schatzberg 提交于
commit df5ba5be7425e1df296d40c5f37a39d98ec666a2 upstream. Pressure metrics are already recorded and exposed in procfs for the entire system, but any tool which monitors cgroup pressure has to special case the root cgroup to read from procfs. This patch exposes the already recorded pressure metrics on the root cgroup. Link: http://lkml.kernel.org/r/20190510174938.3361741-1-dschatzberg@fb.comSigned-off-by: NDan Schatzberg <dschatzberg@fb.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Suren Baghdasaryan 提交于
commit 0e94682b73bfa6c44c98af7a26771c9c08c055d5 upstream. Psi monitor aims to provide a low-latency short-term pressure detection mechanism configurable by users. It allows users to monitor psi metrics growth and trigger events whenever a metric raises above user-defined threshold within user-defined time window. Time window and threshold are both expressed in usecs. Multiple psi resources with different thresholds and window sizes can be monitored concurrently. Psi monitors activate when system enters stall state for the monitored psi metric and deactivate upon exit from the stall state. While system is in the stall state psi signal growth is monitored at a rate of 10 times per tracking window. Min window size is 500ms, therefore the min monitoring interval is 50ms. Max window size is 10s with monitoring interval of 1s. When activated psi monitor stays active for at least the duration of one tracking window to avoid repeated activations/deactivations when psi signal is bouncing. Notifications to the users are rate-limited to one per tracking window. Link: http://lkml.kernel.org/r/20190319235619.260832-8-surenb@google.comSigned-off-by: NSuren Baghdasaryan <surenb@google.com> Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Johannes Weiner 提交于
commit dc50537bdd1a0804fa2cbc990565ee9a944e66fa upstream. Cgroup has a standardized poll/notification mechanism for waking all pollers on all fds when a filesystem node changes. To allow polling for custom events, add a .poll callback that can override the default. This is in preparation for pollable cgroup pressure files which have per-fd trigger configurations. Link: http://lkml.kernel.org/r/20190124211518.244221-3-surenb@google.comSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NSuren Baghdasaryan <surenb@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Johannes Weiner 提交于
commit 2ce7135adc9ad081aa3c49744144376ac74fea60 upstream. On a system that executes multiple cgrouped jobs and independent workloads, we don't just care about the health of the overall system, but also that of individual jobs, so that we can ensure individual job health, fairness between jobs, or prioritize some jobs over others. This patch implements pressure stall tracking for cgroups. In kernels with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure, and io.pressure files that track aggregate pressure stall times for only the tasks inside the cgroup. Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NDaniel Drake <drake@endlessm.com> Tested-by: NSuren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> [Joseph: fix apply conflicts in cgroup_create()] Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Conflicts: kernel/cgroup/cgroup.c
-
由 Jiufei Xue 提交于
Here we add a global radix tree to link memcg and blkcg that the user attach the tasks to when using cgroup v1, which is used for writeback cgroup. Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 18 12月, 2019 1 次提交
-
-
由 Aleksa Sarai 提交于
commit a713af394cf382a30dd28a1015cbe572f1b9ca75 upstream. Because pids->limit can be changed concurrently (but we don't want to take a lock because it would be needlessly expensive), use atomic64_ts instead. Fixes: commit 49b786ea ("cgroup: implement the PIDs subsystem") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: NAleksa Sarai <cyphar@cyphar.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 09 8月, 2019 4 次提交
-
-
由 Tejun Heo 提交于
commit c596687a008b579c503afb7a64fcacc7270fae9e upstream. While adding handling for dying task group leaders c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") added an inverted cset skip condition to css_task_iter_advance_css_set(). It should skip cset if it's completely empty but was incorrectly testing for the inverse condition for the dying_tasks list. Fix it. Signed-off-by: NTejun Heo <tj@kernel.org> Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") Reported-by: syzbot+d4bba5ccd4f9a2a68681@syzkaller.appspotmail.com Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Tejun Heo 提交于
commit cee0c33c546a93957a52ae9ab6bebadbee765ec5 upstream. b636fd38dc40 ("cgroup: Implement css_task_iter_skip()") introduced css_task_iter_skip() which is used to fix task iterations skipping dying threadgroup leaders with live threads. Skipping is implemented as a subportion of full advancing but css_task_iter_next() forgot to fully advance a skipped iterator before determining the next task to visit causing it to return invalid task pointers. Fix it by making css_task_iter_next() fully advance the iterator if it has been skipped since the previous iteration. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: syzbot Link: http://lkml.kernel.org/r/00000000000097025d058a7fd785@google.com Fixes: b636fd38dc40 ("cgroup: Implement css_task_iter_skip()") Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Tejun Heo 提交于
commit c03cd7738a83b13739f00546166969342c8ff014 upstream. CSS_TASK_ITER_PROCS currently iterates live group leaders; however, this means that a process with dying leader and live threads will be skipped. IOW, cgroup.procs might be empty while cgroup.threads isn't, which is confusing to say the least. Fix it by making cset track dying tasks and include dying leaders with live threads in PROCS iteration. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-and-tested-by: NTopi Miettinen <toiwoton@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Tejun Heo 提交于
commit b636fd38dc40113f853337a7d2a6885ad23b8811 upstream. When a task is moved out of a cset, task iterators pointing to the task are advanced using the normal css_task_iter_advance() call. This is fine but we'll be tracking dying tasks on csets and thus moving tasks from cset->tasks to (to be added) cset->dying_tasks. When we remove a task from cset->tasks, if we advance the iterators, they may move over to the next cset before we had the chance to add the task back on the dying list, which can allow the task to escape iteration. This patch separates out skipping from advancing. Skipping only moves the affected iterators to the next pointer rather than fully advancing it and the following advancing will recognize that the cursor has already been moved forward and do the rest of advancing. This ensures that when a task moves from one list to another in its cset, as long as it moves in the right direction, it's always visible to iteration. This doesn't cause any visible behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 10 7月, 2019 1 次提交
-
-
由 Joel Savitz 提交于
[ Upstream commit d477f8c202d1f0d4791ab1263ca7657bbe5cf79e ] In the case that a process is constrained by taskset(1) (i.e. sched_setaffinity(2)) to a subset of available cpus, and all of those are subsequently offlined, the scheduler will set tsk->cpus_allowed to the current value of task_cs(tsk)->effective_cpus. This is done via a call to do_set_cpus_allowed() in the context of cpuset_cpus_allowed_fallback() made by the scheduler when this case is detected. This is the only call made to cpuset_cpus_allowed_fallback() in the latest mainline kernel. However, this is not sane behavior. I will demonstrate this on a system running the latest upstream kernel with the following initial configuration: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffff Cpus_allowed_list: 0-63 (Where cpus 32-63 are provided via smt.) If we limit our current shell process to cpu2 only and then offline it and reonline it: # taskset -p 4 $$ pid 2272's current affinity mask: ffffffffffffffff pid 2272's new affinity mask: 4 # echo off > /sys/devices/system/cpu/cpu2/online # dmesg | tail -3 [ 2195.866089] process 2272 (bash) no longer affine to cpu2 [ 2195.872700] IRQ 114: no longer affine to CPU2 [ 2195.879128] smpboot: CPU 2 is now offline # echo on > /sys/devices/system/cpu/cpu2/online # dmesg | tail -1 [ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4 We see that our current process now has an affinity mask containing every cpu available on the system _except_ the one we originally constrained it to: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffffb Cpus_allowed_list: 0-1,3-63 This is not sane behavior, as the scheduler can now not only place the process on previously forbidden cpus, it can't even schedule it on the cpu it was originally constrained to! Other cases result in even more exotic affinity masks. Take for instance a process with an affinity mask containing only cpus provided by smt at the moment that smt is toggled, in a configuration such as the following: # taskset -p f000000000 $$ # grep -i cpu /proc/$$/status Cpus_allowed: 000000f0,00000000 Cpus_allowed_list: 36-39 A double toggle of smt results in the following behavior: # echo off > /sys/devices/system/cpu/smt/control # echo on > /sys/devices/system/cpu/smt/control # grep -i cpus /proc/$$/status Cpus_allowed: ffffff00,ffffffff Cpus_allowed_list: 0-31,40-63 This is even less sane than the previous case, as the new affinity mask excludes all smt-provided cpus with ids less than those that were previously in the affinity mask, as well as those that were actually in the mask. With this patch applied, both of these cases end in the following state: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,ffffffff Cpus_allowed_list: 0-63 The original policy is discarded. Though not ideal, it is the simplest way to restore sanity to this fallback case without reinventing the cpuset wheel that rolls down the kernel just fine in cgroup v2. A user who wishes for the previous affinity mask to be restored in this fallback case can use that mechanism instead. This patch modifies scheduler behavior by instead resetting the mask to task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy mode. I tested the cases above on both modes. Note that the scheduler uses this fallback mechanism if and only if _every_ other valid avenue has been traveled, and it is the last resort before calling BUG(). Suggested-by: NWaiman Long <longman@redhat.com> Suggested-by: NPhil Auld <pauld@redhat.com> Signed-off-by: NJoel Savitz <jsavitz@redhat.com> Acked-by: NPhil Auld <pauld@redhat.com> Acked-by: NWaiman Long <longman@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 31 5月, 2019 1 次提交
-
-
由 Roman Gushchin 提交于
[ Upstream commit 4dcabece4c3a9f9522127be12cc12cc120399b2f ] The number of descendant cgroups and the number of dying descendant cgroups are currently synchronized using the cgroup_mutex. The number of descendant cgroups will be required by the cgroup v2 freezer, which will use it to determine if a cgroup is frozen (depending on total number of descendants and number of frozen descendants). It's not always acceptable to grab the cgroup_mutex, especially from quite hot paths (e.g. exit()). To avoid this, let's additionally synchronize these counters using the css_set_lock. So, it's safe to read these counters with either cgroup_mutex or css_set_lock locked, and for changing both locks should be acquired. Signed-off-by: NRoman Gushchin <guro@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: kernel-team@fb.com Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 06 4月, 2019 2 次提交
-
-
由 Oleg Nesterov 提交于
[ Upstream commit 51bee5abeab2058ea5813c5615d6197a23dbf041 ] The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which needs pids_free() to uncharge the pid. However, ->free() is called from __put_task_struct()->cgroup_free() and this is too late. Even the trivial program which does for (;;) { int pid = fork(); assert(pid >= 0); if (pid) wait(NULL); else exit(0); } can run out of limits because release_task()->call_rcu(delayed_put_task_struct) implies an RCU gp after the task/pid goes away and before the final put(). Test-case: mkdir -p /tmp/CG mount -t cgroup2 none /tmp/CG echo '+pids' > /tmp/CG/cgroup.subtree_control mkdir /tmp/CG/PID echo 2 > /tmp/CG/PID/pids.max perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' & echo $! > /tmp/CG/PID/cgroup.procs Without this patch the forking process fails soon after migration. Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite into the new helper, cgroup_release(), called by release_task() which actually frees the pid(s). Reported-by: NHerton R. Krzesinski <hkrzesin@redhat.com> Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Tejun Heo 提交于
[ Upstream commit b4ff1b44bcd384d22fcbac6ebaf9cc0d33debe50 ] cgroup_rstat_cpu_pop_updated() is used to traverse the updated cgroups on flush. While it was only visiting updated ones in the subtree, it was visiting @root unconditionally. We can easily check whether @root is updated or not by looking at its ->updated_next just as with the cgroups in the subtree. * Remove the unnecessary cgroup_parent() test. The system root cgroup is never updated and thus its ->updated_next is always NULL. No need to test whether cgroup_parent() exists in addition to ->updated_next. * Terminate traverse if ->updated_next is NULL. This can only happen for subtree @root and there's no reason to visit it if it's not marked updated. This reduces cpu consumption when reading a lot of rstat backed files. In a micro benchmark reading stat from ~1600 cgroups, the sys time was lowered by >40%. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 24 3月, 2019 1 次提交
-
-
由 Al Viro 提交于
commit 399504e21a10be16dd1408ba0147367d9d82a10c upstream. same story as with last May fixes in sysfs (7b745a4e "unfuck sysfs_mount()"); new_sb is left uninitialized in case of early errors in kernfs_mount_ns() and papering over it by treating any error from kernfs_mount_ns() as equivalent to !new_ns ends up conflating the cases when objects had never been transferred to a superblock with ones when that has happened and resulting new superblock had been dropped. Easily fixed (same way as in sysfs case). Additionally, there's a superblock leak on kernfs_node_dentry() failure *and* a dentry leak inside kernfs_node_dentry() itself - the latter on probably impossible errors, but the former not impossible to trigger (as the matter of fact, injecting allocation failures at that point *does* trigger it). Cc: stable@kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 2月, 2019 1 次提交
-
-
由 Ondrej Mosnacek 提交于
[ Upstream commit e250d91d65750a0c0c62483ac4f9f357e7317617 ] This fixes the case where all mount options specified are consumed by an LSM and all that's left is an empty string. In this case cgroupfs should accept the string and not fail. How to reproduce (with SELinux enabled): # umount /sys/fs/cgroup/unified # mount -o context=system_u:object_r:cgroup_t:s0 -t cgroup2 cgroup2 /sys/fs/cgroup/unified mount: /sys/fs/cgroup/unified: wrong fs type, bad option, bad superblock on cgroup2, missing codepage or helper program, or other error. # dmesg | tail -n 1 [ 31.575952] cgroup: cgroup2: unknown option "" Fixes: 67e9c74b ("cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type") [NOTE: should apply on top of commit 5136f636 ("cgroup: implement "nsdelegate" mount option"), older versions need manual rebase] Suggested-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 10 1月, 2019 1 次提交
-
-
由 Tejun Heo 提交于
commit e9d81a1bc2c48ea9782e3e8b53875f419766ef47 upstream. CSS_TASK_ITER_PROCS implements process-only iteration by making css_task_iter_advance() skip tasks which aren't threadgroup leaders; however, when an iteration is started css_task_iter_start() calls the inner helper function css_task_iter_advance_css_set() instead of css_task_iter_advance(). As the helper doesn't have the skip logic, when the first task to visit is a non-leader thread, it doesn't get skipped correctly as shown in the following example. # ps -L 2030 PID LWP TTY STAT TIME COMMAND 2030 2030 pts/0 Sl+ 0:00 ./test-thread 2030 2031 pts/0 Sl+ 0:00 ./test-thread # mkdir -p /sys/fs/cgroup/x/a/b # echo threaded > /sys/fs/cgroup/x/a/cgroup.type # echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type # echo 2030 > /sys/fs/cgroup/x/a/cgroup.procs # cat /sys/fs/cgroup/x/a/cgroup.threads 2030 2031 # cat /sys/fs/cgroup/x/cgroup.procs 2030 # echo 2030 > /sys/fs/cgroup/x/a/b/cgroup.threads # cat /sys/fs/cgroup/x/cgroup.procs 2031 2030 The last read of cgroup.procs is incorrectly showing non-leader 2031 in cgroup.procs output. This can be fixed by updating css_task_iter_advance() to handle the first advance and css_task_iters_tart() to call css_task_iter_advance() instead of the inner helper. After the fix, the same commands result in the following (correct) result: # ps -L 2062 PID LWP TTY STAT TIME COMMAND 2062 2062 pts/0 Sl+ 0:00 ./test-thread 2062 2063 pts/0 Sl+ 0:00 ./test-thread # mkdir -p /sys/fs/cgroup/x/a/b # echo threaded > /sys/fs/cgroup/x/a/cgroup.type # echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type # echo 2062 > /sys/fs/cgroup/x/a/cgroup.procs # cat /sys/fs/cgroup/x/a/cgroup.threads 2062 2063 # cat /sys/fs/cgroup/x/cgroup.procs 2062 # echo 2062 > /sys/fs/cgroup/x/a/b/cgroup.threads # cat /sys/fs/cgroup/x/cgroup.procs 2062 Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: N"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Fixes: 8cfd8147 ("cgroup: implement cgroup v2 thread support") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 05 10月, 2018 1 次提交
-
-
由 Tejun Heo 提交于
A cgroup which is already a threaded domain may be converted into a threaded cgroup if the prerequisite conditions are met. When this happens, all threaded descendant should also have their ->dom_cgrp updated to the new threaded domain cgroup. Unfortunately, this propagation was missing leading to the following failure. # cd /sys/fs/cgroup/unified # cat cgroup.subtree_control # show that no controllers are enabled # mkdir -p mycgrp/a/b/c # echo threaded > mycgrp/a/b/cgroup.type At this point, the hierarchy looks as follows: mycgrp [d] a [dt] b [t] c [inv] Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"): # echo threaded > mycgrp/a/cgroup.type By this point, we now have a hierarchy that looks as follows: mycgrp [dt] a [t] b [t] c [inv] But, when we try to convert the node "c" from "domain invalid" to "threaded", we get ENOTSUP on the write(): # echo threaded > mycgrp/a/b/c/cgroup.type sh: echo: write error: Operation not supported This patch fixes the problem by * Moving the opencoded ->dom_cgrp save and restoration in cgroup_enable_threaded() into cgroup_{save|restore}_control() so that mulitple cgroups can be handled. * Updating all threaded descendants' ->dom_cgrp to point to the new dom_cgrp when enabling threaded mode. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-and-tested-by: N"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Reported-by: NAmin Jamali <ajamali@pivotal.io> Reported-by: NJoao De Almeida Pereira <jpereira@pivotal.io> Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com Fixes: 454000ad ("cgroup: introduce cgroup->dom_cgrp and threaded css_set handling") Cc: stable@vger.kernel.org # v4.14+
-
- 21 7月, 2018 1 次提交
-
-
由 Dmitry Torokhov 提交于
This change allows creating kernfs files and directories with arbitrary uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments. The "simple" kernfs_create_file() and kernfs_create_dir() are left alone and always create objects belonging to the global root. When creating symlinks ownership (uid/gid) is taken from the target kernfs object. Co-Developed-by: NTyler Hicks <tyhicks@canonical.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: NTyler Hicks <tyhicks@canonical.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2018 1 次提交
-
-
由 Steven Rostedt (VMware) 提交于
It is unwise to take spin locks from the handlers of trace events. Mainly, because they can introduce lockups, because it introduces locks in places that are normally not tested. Worse yet, because trace events are tucked away in the include/trace/events/ directory, locks that are taken there are forgotten about. As a general rule, I tell people never to take any locks in a trace event handler. Several cgroup trace event handlers call cgroup_path() which eventually takes the kernfs_rename_lock spinlock. This injects the spinlock in the code without people realizing it. It also can cause issues for the PREEMPT_RT patch, as the spinlock becomes a mutex, and the trace event handlers are called with preemption disabled. By moving the calculation of the cgroup_path() out of the trace event handlers and into a macro (surrounded by a trace_cgroup_##type##_enabled()), then we could place the cgroup_path into a string, and pass that to the trace event. Not only does this remove the taking of the spinlock out of the trace event handler, but it also means that the cgroup_path() only needs to be called once (it is currently called twice, once to get the length to reserver the buffer for, and once again to get the path itself. Now it only needs to be done once. Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 16 6月, 2018 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked if the produced result is valid, removing a few false-positives. Acked-by: NTakashi Iwai <tiwai@suse.de> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Acked-by: NStephen Boyd <sboyd@kernel.org> Acked-by: NCharles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: NColy Li <colyli@suse.de> Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NJonathan Corbet <corbet@lwn.net>
-
- 13 6月, 2018 2 次提交
-
-
由 Kees Cook 提交于
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 07 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This patch makes the changes for kmalloc()-family (and kvmalloc()-family) uses. It was done via automatic conversion with manual review for the "CHECKME" non-standard cases noted below, using the following Coccinelle script: // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len * // sizeof *pkey_cache->table, GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // Same pattern, but can't trivially locate the trailing element name, // or variable name. @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; expression SOMETHING, COUNT, ELEMENT; @@ - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP) + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 24 5月, 2018 1 次提交
-
-
由 Tejun Heo 提交于
cgroup_enable_task_cg_lists() incorrectly nests non-irq-safe tasklist_lock inside irq-safe css_set_lock triggering the following lockdep warning. WARNING: possible irq lock inversion dependency detected 4.17.0-rc1-00027-gb37d049 #6 Not tainted -------------------------------------------------------- systemd/1 just changed the state of lock: 00000000fe57773b (css_set_lock){..-.}, at: cgroup_free+0xf2/0x12a but this lock took another, SOFTIRQ-unsafe lock in the past: (tasklist_lock){.+.+} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(tasklist_lock); local_irq_disable(); lock(css_set_lock); lock(tasklist_lock); <Interrupt> lock(css_set_lock); *** DEADLOCK *** The condition is highly unlikely to actually happen especially given that the path is executed only once per boot. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NBoqun Feng <boqun.feng@gmail.com>
-
- 16 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 08 5月, 2018 1 次提交
-
-
由 Andy Shevchenko 提交于
The new helper returns index of the matching string in an array. We are going to use it here. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 27 4月, 2018 6 次提交
-
-
由 Tejun Heo 提交于
cgroup_rstat_updated() ensures that the cgroup's rstat is linked to the parent. If there's no parent, it never gets linked and the function ends up grabbing and releasing the cgroup_rstat_lock each time for no reason which can be expensive. This hasn't been a problem till now because nobody was calling the function for the root cgroup but rstat is gonna be exposed to controllers and use cases, so let's get ready. Make cgroup_rstat_updated() an no-op for the root cgroup. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
cgroup_rstat_updated() has a small race window where an updated signaling can race with flush and could be lost till the next update. This wasn't a problem for the existing usages, but we plan to use rstat to track counters which need to be accurate. This patch plugs the race window by synchronizing cgroup_rstat_updated() and flush path with memory barriers around cgroup_rstat_cpu->updated_next pointer. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
This patch adds cgroup_subsys->css_rstat_flush(). If a subsystem has this callback, its csses are linked on cgrp->css_rstat_list and rstat will call the function whenever the associated cgroup is flushed. Flush is also performed when such csses are released so that residual counts aren't lost. Combined with the rstat API previous patches factored out, this allows controllers to plug into rstat to manage their statistics in a scalable way. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Currently, rstat flush path is protected with a mutex which is fine as all the existing users are from interface file show path. However, rstat is being generalized for use by controllers and flushing from atomic contexts will be necessary. This patch replaces cgroup_rstat_mutex with a spinlock and adds a irq-safe flush function - cgroup_rstat_flush_irqsafe(). Explicit yield handling is added to the flush path so that other flush functions can yield to other threads and flushers. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
cgroup_rstat is being generalized so that controllers can use it too. This patch factors out and exposes the following interface functions. * cgroup_rstat_updated(): Renamed from cgroup_rstat_cpu_updated() for consistency. * cgroup_rstat_flush_hold/release(): Factored out from base stat implementation. * cgroup_rstat_flush(): Verbatim expose. While at it, drop assert on cgroup_rstat_mutex in cgroup_base_stat_flush() as it crosses layers and make a minor comment update. v2: Added EXPORT_SYMBOL_GPL(cgroup_rstat_updated) to fix a build bug. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Currently, rstat.c has rstat and base stat implementations intermixed. Collect base stat implementation at the end of the file. Also, reorder the prototypes. This patch doesn't make any functional changes. Signed-off-by: NTejun Heo <tj@kernel.org>
-