- 04 7月, 2013 10 次提交
-
-
由 Oleg Nesterov 提交于
Cleanup and preparation for the next changes. Move the "if (clone_flags & CLONE_THREAD)" code down under "if (likely(p->pid))" and turn it into into the "else" branch. This makes the process/thread initialization more symmetrical and removes one check. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Paris 提交于
When a task is attempting to violate the RLIMIT_NPROC limit we have a check to see if the task is sufficiently priviledged. The check first looks at CAP_SYS_ADMIN, then CAP_SYS_RESOURCE, then if the task is uid=0. A result is that tasks which are allowed by the uid=0 check are first checked against the security subsystem. This results in the security subsystem auditting a denial for sys_admin and sys_resource and then the task passing the uid=0 check. This patch rearranges the code to first check uid=0, since if we pass that we shouldn't hit the security system at all. We then check sys_resource, since it is the smallest capability which will solve the problem. Lastly we check the fallback everything cap_sysadmin. We don't want to give this capability many places since it is so powerful. This will eliminate many of the false positive/needless denial messages we get when a root task tries to violate the nproc limit. (note that kthreads count against root, so on a sufficiently large machine we can actually get past the default limits before any userspace tasks are launched.) Signed-off-by: NEric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Move __set_special_pids() from exit.c to sys.c close to its single caller and make it static. And rename it to set_special_pids(), another helper with this name has gone away. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
call_usermodehelper_exec() does nothing but returns success if path[0] == 0. The only user which needs this strange feature is request_module(), it can check modprobe_path[0] itself like other users do if they want to detect the "disabled by admin" case. Kill it. Not only it looks strange, it can confuse other callers. And this allows us to revert 264b83c0 ("usermodehelper: check subprocess_info->path != NULL"), do_execve(NULL) is safe. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Vagin 提交于
crtools uses a parasite code for dumping processes. The parasite code is injected into a process with help PTRACE_SEIZE. Currently crtools blocks signals from a parasite code. If a process has pending signals, crtools wait while a process handles these signals. This method is not suitable for stopped tasks. A stopped task can have a few pending signals, when we will try to execute a parasite code, we will need to drop SIGSTOP, but all other signals must remain pending, because a state of processes must not be changed during checkpointing. This patch adds two ptrace commands to set/get signal-blocked mask. I think gdb can use this commands too. [akpm@linux-foundation.org: be consistent with brace layout] Signed-off-by: NAndrey Vagin <avagin@openvz.org> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mathias Krause 提交于
When writing invalid input to 'debug/kprobes/enabled' it'll silently be ignored. Even worse, when writing an empty string to this file, the outcome is purely random as the switch statement will make its decision based on the value of an uninitialized stack variable. Fix this by handling invalid/empty input as error returning -EINVAL. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Change do_sysinfo() to use get_monotonic_boottime() instead of do_posix_clock_monotonic_gettime() + monotonic_to_bootbased(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 liguang 提交于
If LINUX_REBOOT_CMD_HALT for reboot failed, the message "cannot halt" will stay on the same line with the next message, so append a '\n'. Signed-off-by: Nliguang <lig.fnst@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Calling kthread_run with a single name parameter causes it to be handled as a format string. Many callers are passing potentially dynamic string content, so use "%s" in those cases to avoid any potential accidents. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jiang Liu 提交于
The global variable num_physpages is scheduled to be removed, so use totalram_pages instead of num_physpages at runtime. Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 6月, 2013 2 次提交
-
-
由 Tejun Heo 提交于
0ce6cba3 ("cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options") only updated the remount path but CGRP_ROOT_SUBSYS_BOUND should also be ignored when comparing options while mounting an existing hierarchy. As option mismatch triggers a warning but doesn't fail the mount without sane_behavior, this only triggers a spurious warning message. Fix it by only comparing CGRP_ROOT_OPTION_MASK bits when comparing new and existing root options. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Mathieu Desnoyers 提交于
This __put_user() could be used by unprivileged processes to write into kernel memory. The issue here is that even if copy_siginfo_to_user() fails, the error code is not checked before __put_user() is executed. Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle, so it has not hit a stable release yet. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Roland McGrath <roland@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Pedro Alves <palves@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 6月, 2013 8 次提交
-
-
由 Davidlohr Bueso 提交于
Use the already defined macro to pass the function return address. Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1367347569.1784.3.camel@buesod1.americas.hpqcorp.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Alex Shi 提交于
Now that we are using runnable load avg in sched balance, we don't need to keep it under CONFIG_FAIR_GROUP_SCHED. Also align the code style to #ifdef instead of #if defined() and reorder the tg output info. Signed-off-by: NAlex Shi <alex.shi@intel.com> Cc: pjt@google.com Cc: kamalesh@linux.vnet.ibm.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1372417835-4698-1-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Randy Dunlap <rdunlap@infradead.org>
-
由 Fabio Estevam 提交于
When building imx_v6_v7_defconfig with imx-drm drivers selected as modules, we get the following build errors: ERROR: "irq_gc_mask_clr_bit" [drivers/staging/imx-drm/ipu-v3/imx-ipu-v3.ko] undefined! ERROR: "irq_gc_mask_set_bit" [drivers/staging/imx-drm/ipu-v3/imx-ipu-v3.ko] undefined! ERROR: "irq_gc_ack_set_bit" [drivers/staging/imx-drm/ipu-v3/imx-ipu-v3.ko] undefined! Export the required functions to avoid this problem. Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com> Cc: shawn.guo@linaro.org Cc: kernel@pengutronix.de Link: http://lkml.kernel.org/r/1372389789-7048-1-git-send-email-festevam@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ben Hutchings 提交于
Commit 02725e74 ('genirq: Use irq_get/put functions'), inadvertently changed can_request_irq() to return 0 for IRQs that have no action. This causes pcibios_lookup_irq() to select only IRQs that already have an action with IRQF_SHARED set, or to fail if there are none. Change can_request_irq() to return 1 for IRQs that have no action (if the first two conditions are met). Reported-by: NBjarni Ingi Gislason <bjarniig@rhi.hi.is> Tested-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is> (against 3.2) Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Cc: 709647@bugs.debian.org Cc: stable@vger.kernel.org # 2.6.39+ Link: http://bugs.debian.org/709647 Link: http://lkml.kernel.org/r/1372383630.23847.40.camel@deadeye.wl.decadent.org.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Kamalesh Babulal 提交于
This patch alters format string's width, to align all statistics at par with the longest struct sched_statistic member name under /proc/<PID>/sched. Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/20130627165005.GA15583@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tejun Heo 提交于
1672d040 ("cgroup: fix cgroupfs_root early destruction path") introduced CGRP_ROOT_SUBSYS_BOUND which is used to mark completion of subsys binding on a new root; however, this broke remounts. cgroup_remount() doesn't allow changing root options via remount and CGRP_ROOT_SUBSYS_BOUND, which is set on all fully initialized roots, makes the function reject all remounts. Fix it by putting the options part in the lower 16 bits of root->flags and masking the comparions. While at it, make cgroup_remount() emit an error message explaining why it's rejecting a remount request, so that it's less of a mystery. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
eb178d06 ("cgroup: grab cgroup_mutex in drop_parsed_module_refcounts()") made drop_parsed_module_refcounts() grab cgroup_mutex to make lockdep assertion in for_each_subsys() happy. Unfortunately, cgroup_remount() calls the function while holding cgroup_mutex in its failure path leading to the following deadlock. # mount -t cgroup -o remount,memory,blkio cgroup blkio cgroup: option changes via remount are deprecated (pid=525 comm=mount) ============================================= [ INFO: possible recursive locking detected ] 3.10.0-rc4-work+ #1 Not tainted --------------------------------------------- mount/525 is trying to acquire lock: (cgroup_mutex){+.+.+.}, at: [<ffffffff8110a3e1>] drop_parsed_module_refcounts+0x21/0xb0 but task is already holding lock: (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e4e1>] cgroup_remount+0x51/0x200 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(cgroup_mutex); lock(cgroup_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by mount/525: #0: (&type->s_umount_key#30){+.+...}, at: [<ffffffff811e9a0d>] do_mount+0x2bd/0xa30 #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff8110e4d3>] cgroup_remount+0x43/0x200 #2: (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e4e1>] cgroup_remount+0x51/0x200 #3: (cgroup_root_mutex){+.+.+.}, at: [<ffffffff8110e4ef>] cgroup_remount+0x5f/0x200 stack backtrace: CPU: 2 PID: 525 Comm: mount Not tainted 3.10.0-rc4-work+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffffffff829651f0 ffff88000ec2fc28 ffffffff81c24bb1 ffff88000ec2fce8 ffffffff810f420d 0000000000000006 0000000000000001 0000000000000056 ffff8800153b4640 ffff880000000000 ffffffff81c2e468 ffff8800153b4640 Call Trace: [<ffffffff81c24bb1>] dump_stack+0x19/0x1b [<ffffffff810f420d>] __lock_acquire+0x15dd/0x1e60 [<ffffffff810f531c>] lock_acquire+0x9c/0x1f0 [<ffffffff81c2a805>] mutex_lock_nested+0x65/0x410 [<ffffffff8110a3e1>] drop_parsed_module_refcounts+0x21/0xb0 [<ffffffff8110e63e>] cgroup_remount+0x1ae/0x200 [<ffffffff811c9bb2>] do_remount_sb+0x82/0x190 [<ffffffff811e9d41>] do_mount+0x5f1/0xa30 [<ffffffff811ea203>] SyS_mount+0x83/0xc0 [<ffffffff81c2fb82>] system_call_fastpath+0x16/0x1b Fix it by moving the drop_parsed_module_refcounts() invocation outside cgroup_mutex. Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 27 6月, 2013 18 次提交
-
-
由 Kamalesh Babulal 提交于
Fix spelling of 'calling' in description of se flags in enqueue_entity(). Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/20130627055418.GA18582@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kamalesh Babulal 提交于
At present we print per-entity load-tracking statistics for cfs_rq of cgroups/runqueues. Given that per task statistics is maintained, it can be used to know the contribution made by the task to its parenting cfs_rq level. This patch adds per-task load-tracking statistics to /proc/<PID>/sched. Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130625080336.GA20175@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
Based-on-patch-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NAlex Shi <alex.shi@intel.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-14-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
Since no one use it. Signed-off-by: NAlex Shi <alex.shi@intel.com> Reviewed-by: NPaul Turner <pjt@google.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-13-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
Similar to runnable_load_avg, blocked_load_avg variable, long type is enough for removed_load in 64 bit or 32 bit machine. Then we avoid the expensive atomic64 operations on 32 bit machine. Signed-off-by: NAlex Shi <alex.shi@intel.com> Reviewed-by: NPaul Turner <pjt@google.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-12-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
Since tg->load_avg is smaller than tg->load_weight, we don't need a atomic64_t variable for load_avg in 32 bit machine. The same reason for cfs_rq->tg_load_contrib. The atomic_long_t/unsigned long variable type are more efficient and convenience for them. Signed-off-by: NAlex Shi <alex.shi@intel.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-11-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
Since the 'u64 runnable_load_avg, blocked_load_avg' in cfs_rq struct are smaller than 'unsigned long' cfs_rq->load.weight. We don't need u64 vaiables to describe them. unsigned long is more efficient and convenience. Signed-off-by: NAlex Shi <alex.shi@intel.com> Reviewed-by: NPaul Turner <pjt@google.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-10-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
Aside from using runnable load average in background, move_tasks is also the key function in load balance. We need consider the runnable load average in it in order to make it an apple to apple load comparison. Morten had caught a div u64 bug on ARM, thanks! Thanks-to: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: NAlex Shi <alex.shi@intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-8-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
They are the base values in load balance, update them with rq runnable load average, then the load balance will consider runnable load avg naturally. We also try to include the blocked_load_avg as cpu load in balancing, but that cause kbuild performance drop 6% on every Intel machine, and aim7/oltp drop on some of 4 CPU sockets machines. Or only add blocked_load_avg into get_rq_runable_load, hackbench still drop a little on NHM EX. Signed-off-by: NAlex Shi <alex.shi@intel.com> Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-7-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
To get the latest runnable info, we need do this cpuload update after task_tick. Signed-off-by: NAlex Shi <alex.shi@intel.com> Reviewed-by: NPaul Turner <pjt@google.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-6-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
The woken migrated task will __synchronize_entity_decay(se); in migrate_task_rq_fair, then it needs to set `se->avg.last_runnable_update -= (-se->avg.decay_count) << 20' before update_entity_load_avg, in order to avoid sleep time is updated twice for se.avg.load_avg_contrib in both __syncchronize and update_entity_load_avg. However if the sleeping task is woken up from the same cpu, it miss the last_runnable_update before update_entity_load_avg(se, 0, 1), then the sleep time was used twice in both functions. So we need to remove the double sleep time accounting. Paul also contributed some code comments in this commit. Signed-off-by: NAlex Shi <alex.shi@intel.com> Reviewed-by: NPaul Turner <pjt@google.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-5-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
We need to initialize the se.avg.{decay_count, load_avg_contrib} for a new forked task. Otherwise random values of above variables cause a mess when a new task is enqueued: enqueue_task_fair enqueue_entity enqueue_entity_load_avg and make fork balancing imbalance due to incorrect load_avg_contrib. Further more, Morten Rasmussen notice some tasks were not launched at once after created. So Paul and Peter suggest giving a start value for new task runnable avg time same as sched_slice(). PeterZ said: > So the 'problem' is that our running avg is a 'floating' average; ie. it > decays with time. Now we have to guess about the future of our newly > spawned task -- something that is nigh impossible seeing these CPU > vendors keep refusing to implement the crystal ball instruction. > > So there's two asymptotic cases we want to deal well with; 1) the case > where the newly spawned program will be 'nearly' idle for its lifetime; > and 2) the case where its cpu-bound. > > Since we have to guess, we'll go for worst case and assume its > cpu-bound; now we don't want to make the avg so heavy adjusting to the > near-idle case takes forever. We want to be able to quickly adjust and > lower our running avg. > > Now we also don't want to make our avg too light, such that it gets > decremented just for the new task not having had a chance to run yet -- > even if when it would run, it would be more cpu-bound than not. > > So what we do is we make the initial avg of the same duration as that we > guess it takes to run each task on the system at least once -- aka > sched_slice(). > > Of course we can defeat this with wakeup/fork bombs, but in the 'normal' > case it should be good enough. Paul also contributed most of the code comments in this commit. Signed-off-by: NAlex Shi <alex.shi@intel.com> Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: NPaul Turner <pjt@google.com> [peterz; added explanation of sched_slice() usage] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-4-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
The following 2 variables are only used under CONFIG_SMP, so its better to move their definiation into CONFIG_SMP too. atomic64_t load_avg; atomic_t runnable_avg; Signed-off-by: NAlex Shi <alex.shi@intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-3-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Shi 提交于
Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then we can use runnable load variables. Also remove 2 CONFIG_FAIR_GROUP_SCHED setting which is not in reverted patch(introduced in 9ee474f5), but also need to revert. Signed-off-by: NAlex Shi <alex.shi@intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51CA76A3.3050207@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tejun Heo 提交于
kernel/cgroup.c still has places where a RCU pointer is set and accessed directly without going through RCU_INIT_POINTER() or rcu_dereference_protected(). They're all properly protected accesses so nothing is broken but it leads to spurious sparse RCU address space warnings. Substitute direct accesses with RCU_INIT_POINTER() and rcu_dereference_protected(). Note that %true is specified as the extra condition for all derference updates. This isn't ideal as all it does is suppressing warning without actually policing synchronization rules; however, most are scheduled to be removed pretty soon along with css_id itself, so no reason to be more elaborate. Combined with the previous changes, this removes all RCU related sparse warnings from cgroup. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NFengguang Wu <fengguang.wu@intel.com> Acked-by; Li Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
There are several places in kernel/cgroup.c where task->cgroups is accessed and modified without going through proper RCU accessors. None is broken as they're all lock protected accesses; however, this still triggers sparse RCU address space warnings. * Consistently use task_css_set() for task->cgroups dereferencing. * Use RCU_INIT_POINTER() to clear task->cgroups to &init_css_set on exit. * Remove unnecessary rcu_dereference_raw() from cset->subsys[] dereference in cgroup_exit(). Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NFengguang Wu <fengguang.wu@intel.com> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
This isn't strictly necessary as all subsystems specified in @subsys_mask are guaranteed to be pinned; however, it does spuriously trigger lockdep warning. Let's grab cgroup_mutex around it. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
cgroupfs_root used to have ->actual_subsys_mask in addition to ->subsys_mask. a8a648c4 ("cgroup: remove cgroup->actual_subsys_mask") removed it noting that the subsys_mask is essentially temporary and doesn't belong in cgroupfs_root; however, the patch made it impossible to tell whether a cgroupfs_root actually has the subsystems bound or just have the bits set leading to the following BUG when trying to mount with subsystems which are already mounted elsewhere. kernel BUG at kernel/cgroup.c:1038! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ... CPU: 1 PID: 7973 Comm: mount Tainted: G W 3.10.0-rc7-next-20130625-sasha-00011-g1c1dc0e #1105 task: ffff880fc0ae8000 ti: ffff880fc0b9a000 task.ti: ffff880fc0b9a000 RIP: 0010:[<ffffffff81249b29>] [<ffffffff81249b29>] rebind_subsystems+0x409/0x5f0 ... Call Trace: [<ffffffff8124bd4f>] cgroup_kill_sb+0xff/0x210 [<ffffffff813d21af>] deactivate_locked_super+0x4f/0x90 [<ffffffff8124f3b3>] cgroup_mount+0x673/0x6e0 [<ffffffff81257169>] cpuset_mount+0xd9/0x110 [<ffffffff813d2580>] mount_fs+0xb0/0x2d0 [<ffffffff81404afd>] vfs_kern_mount+0xbd/0x180 [<ffffffff814070b5>] do_new_mount+0x145/0x2c0 [<ffffffff814085d6>] do_mount+0x356/0x3c0 [<ffffffff8140873d>] SyS_mount+0xfd/0x140 [<ffffffff854eb600>] tracesys+0xdd/0xe2 We still want rebind_subsystems() to take added/removed masks, so let's fix it by marking whether a cgroupfs_root has finished binding or not. Also, document what's going on around ->subsys_mask initialization so that similar mistakes aren't repeated. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NSasha Levin <sasha.levin@oracle.com> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 26 6月, 2013 2 次提交
-
-
由 Daniel Vetter 提交于
Injects EDEADLK conditions at pseudo-random interval, with exponential backoff up to UINT_MAX (to ensure that every lock operation still completes in a reasonable time). This way we can test the wound slowpath even for ww mutex users where contention is never expected, and the ww deadlock avoidance algorithm is only needed for correctness against malicious userspace. An example would be protecting kernel modesetting properties, which thanks to single-threaded X isn't really expected to contend, ever. I've looked into using the CONFIG_FAULT_INJECTION infrastructure, but decided against it for two reasons: - EDEADLK handling is mandatory for ww mutex users and should never affect the outcome of a syscall. This is in contrast to -ENOMEM injection. So fine configurability isn't required. - The fault injection framework only allows to set a simple probability for failure. Now the probability that a ww mutex acquire stage with N locks will never complete (due to too many injected EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock operations for the completely uncontended case would be O(exp(N)). The per-acuiqire ctx exponential backoff solution choosen here only results in O(log N) overhead due to injection and so O(log N * N) lock operations. This way we can fail with high probability (and so have good test coverage even for fancy backoff and lock acquisition paths) without running into patalogical cases. Note that EDEADLK will only ever be injected when we managed to acquire the lock. This prevents any behaviour changes for users which rely on the EALREADY semantics. Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maarten Lankhorst 提交于
Wound/wait mutexes are used when other multiple lock acquisitions of a similar type can be done in an arbitrary order. The deadlock handling used here is called wait/wound in the RDBMS literature: The older tasks waits until it can acquire the contended lock. The younger tasks needs to back off and drop all the locks it is currently holding, i.e. the younger task is wounded. For full documentation please read Documentation/ww-mutex-design.txt. References: https://lwn.net/Articles/548909/Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Acked-by: NRob Clark <robdclark@gmail.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-