- 29 11月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 10月, 2012 1 次提交
-
-
由 David Howells 提交于
Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NDave Jones <davej@redhat.com>
-
- 09 10月, 2012 1 次提交
-
-
由 Davidlohr Bueso 提交于
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month. Signed-off-by: NDavidlohr Bueso <dave@gnu.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 10月, 2012 1 次提交
-
-
由 Alex Kelly 提交于
Create a new header file, fs/coredump.h, which contains functions only used by the new coredump.c. It also moves do_coredump to the include/linux/coredump.h header file, for consistency. Signed-off-by: NAlex Kelly <alex.page.kelly@gmail.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 10月, 2012 1 次提交
-
-
由 Alex Kelly 提交于
This prepares for making core dump functionality optional. The variable "suid_dumpable" and associated functions are left in fs/exec.c because they're used elsewhere, such as in ptrace. Signed-off-by: NAlex Kelly <alex.page.kelly@gmail.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 10月, 2012 1 次提交
-
-
由 Al Viro 提交于
Let architectures select GENERIC_KERNEL_THREAD and have their copy_thread() treat NULL regs as "it came from kernel_thread(), sp argument contains the function new thread will be calling and stack_size - the argument for that function". Switching the architectures begins shortly... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 9月, 2012 1 次提交
-
-
由 Frederic Weisbecker 提交于
Clear the syscalls hook of a task when it's scheduled out so that if the task migrates, it doesn't run the syscall slow path on a CPU that might not need it. Also set the syscalls hook on the next task if needed. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 25 9月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
We currently use a per socket order-0 page cache for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent) But it wastes a lot of memory for applications handling many mostly idle sockets, since each socket holds one page in sk->sk_sndmsg_page Its also quite inefficient to build TSO 64KB packets, because we need about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit page allocator more than wanted. This patch adds a per task frag allocator and uses bigger pages, if available. An automatic fallback is done in case of memory pressure. (up to 32768 bytes per frag, thats order-3 pages on x86) This increases TCP stream performance by 20% on loopback device, but also benefits on other network devices, since 8x less frags are mapped on transmit and unmapped on tx completion. Alexander Duyck mentioned a probable performance win on systems with IOMMU enabled. Its possible some SG enabled hardware cant cope with bigger fragments, but their ndo_start_xmit() should already handle this, splitting a fragment in sub fragments, since some arches have PAGE_SIZE=65536 Successfully tested on various ethernet devices. (ixgbe, igb, bnx2x, tg3, mellanox mlx4) Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 9月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Always store audit loginuids in type kuid_t. Print loginuids by converting them into uids in the appropriate user namespace, and then printing the resulting uid. Modify audit_get_loginuid to return a kuid_t. Modify audit_set_loginuid to take a kuid_t. Modify /proc/<pid>/loginuid on read to convert the loginuid into the user namespace of the opener of the file. Modify /proc/<pid>/loginud on write to convert the loginuid rom the user namespace of the opener of the file. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Paul Moore <paul@paul-moore.com> ? Cc: David Miller <davem@davemloft.net> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 17 9月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 970e1789. Nikolay Ulyanitsky reported thatthe 3.6-rc5 kernel has a 15-20% performance drop on PostgreSQL 9.2 on his machine (running "pgbench"). Borislav Petkov was able to reproduce this, and bisected it to this commit 970e1789 ("sched: Improve scalability via 'CPU buddies' ...") apparently because the new single-idle-buddy model simply doesn't find idle CPU's to reschedule on aggressively enough. Mike Galbraith suspects that it is likely due to the user-mode spinlocks in PostgreSQL not reacting well to preemption, but we don't really know the details - I'll just revert the commit for now. There are hopefully other approaches to improve scheduler scalability without it causing these kinds of downsides. Reported-by: NNikolay Ulyanitsky <lystor@gmail.com> Bisected-by: NBorislav Petkov <bp@alien8.de> Acked-by: NMike Galbraith <efault@gmx.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 9月, 2012 1 次提交
-
-
由 Oleg Nesterov 提交于
Add the new MMF_RECALC_UPROBES flag, it means that MMF_HAS_UPROBES can be false positive after remove_breakpoint() or uprobe_munmap(). It is also set by uprobe_dup_mmap(), this is not optimal but simple. We could add the new hook, uprobe_dup_vma(), to set MMF_HAS_UPROBES only if the new mm actually has uprobes, but I don't think this makes sense. The next patch will use this flag to clear MMF_HAS_UPROBES. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
-
- 13 9月, 2012 2 次提交
-
-
由 Alex Shi 提交于
There is no load_balancer to be selected now. It just sets the state of the nohz tick to stop. So rename the function, pass the 'cpu' as a parameter and then remove the useless call from tick_nohz_restart_sched_tick(). [ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ] Signed-off-by: NAlex Shi <alex.shi@intel.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venki@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex.shi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Now that the last architecture to use this has stopped doing so (ARM, thanks Catalin!) we can remove this complexity from the scheduler core. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/n/tip-g9p2a1w81xxbrze25v9zpzbf@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 8月, 2012 1 次提交
-
-
由 Oleg Nesterov 提交于
Add the new MMF_HAS_UPROBES flag. It is set by install_breakpoint() and it is copied by dup_mmap(), uprobe_pre_sstep_notifier() checks it to avoid the slow path if the task was never probed. Perhaps it makes sense to check it in valid_vma(is_register => false) as well. This needs the new dup_mmap()->uprobe_dup_mmap() hook. We can't use uprobe_reset_state() or put MMF_HAS_UPROBES into MMF_INIT_MASK, we need oldmm->mmap_sem to avoid the race with uprobe_register() or mmap() from another thread. Currently we never clear this bit, it can be false-positive after uprobe_unregister() or uprobe_munmap() or if dup_mmap() hits the probed VM_DONTCOPY vma. But this is fine correctness-wise and has no effect unless the task hits the non-uprobe breakpoint. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
-
- 14 8月, 2012 1 次提交
-
-
由 Alex Shi 提交于
Since power saving code was removed from sched now, the implement code is out of service in this function, and even pollute other logical. like, 'want_sd' never has chance to be set '0', that remove the effect of SD_WAKE_AFFINE here. So, clean up the obsolete code, includes SD_PREFER_LOCAL. Signed-off-by: NAlex Shi <alex.shi@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5028F431.6000306@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 09 8月, 2012 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Revert commit 45226e94 (NMI watchdog: fix for lockup detector breakage on resume) which breaks resume from system suspend on my SH7372 Mackerel board (by causing a NULL pointer dereference to happen) and is generally wrong, because it abuses the CPU hotplug functionality in a shamelessly blatant way. The original issue should be addressed through appropriate syscore resume callback instead. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 01 8月, 2012 2 次提交
-
-
由 Mel Gorman 提交于
This is needed to allow network softirq packet processing to make use of PF_MEMALLOC. Currently softirq context cannot use PF_MEMALLOC due to it not being associated with a task, and therefore not having task flags to fiddle with - thus the gfp to alloc flag mapping ignores the task flags when in interrupts (hard or soft) context. Allowing softirqs to make use of PF_MEMALLOC therefore requires some trickery. This patch borrows the task flags from whatever process happens to be preempted by the softirq. It then modifies the gfp to alloc flags mapping to not exclude task flags in softirq context, and modify the softirq code to save, clear and restore the PF_MEMALLOC flag. The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't leak into the softirq. The restore ensures a softirq's PF_MEMALLOC flag cannot leak back into the preempted process. This should be safe due to the following reasons Softirqs can run on multiple CPUs sure but the same task should not be executing the same softirq code. Neither should the softirq handler be preempted by any other softirq handler so the flags should not leak to an unrelated softirq. Softirqs re-enable hardware interrupts in __do_softirq() so can be preempted by hardware interrupts so PF_MEMALLOC is inherited by the hard IRQ. However, this is similar to a process in reclaim being preempted by a hardirq. While PF_MEMALLOC is set, gfp_to_alloc_flags() distinguishes between hard and soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS flag. If the softirq is deferred to ksoftirq then its flags may be used instead of a normal tasks but as the softirq cannot be preempted, the PF_MEMALLOC flag does not leak to other code by accident. [davem@davemloft.net: Document why PF_MEMALLOC is safe] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Sanity: CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM [mhocko@suse.cz: fix missed bits] Cc: Glauber Costa <glommer@parallels.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 7月, 2012 2 次提交
-
-
由 Sameer Nanda 提交于
On the suspend/resume path the boot CPU does not go though an offline->online transition. This breaks the NMI detector post-resume since it depends on PMU state that is lost when the system gets suspended. Fix this by forcing a CPU offline->online transition for the lockup detector on the boot CPU during resume. To provide more context, we enable NMI watchdog on Chrome OS. We have seen several reports of systems freezing up completely which indicated that the NMI watchdog was not firing for some reason. Debugging further, we found a simple way of repro'ing system freezes -- issuing the command 'tasket 1 sh -c "echo nmilockup > /proc/breakme"' after the system has been suspended/resumed one or more times. With this patch in place, the system freeze result in panics, as expected. These panics provide a nice stack trace for us to debug the actual issue causing the freeze. [akpm@linux-foundation.org: fiddle with code comment] [akpm@linux-foundation.org: make lockup_detector_bootcpu_resume() conditional on CONFIG_SUSPEND] [akpm@linux-foundation.org: fix section errors] Signed-off-by: NSameer Nanda <snanda@chromium.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Don Zickus <dzickus@redhat.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
When suid_dumpable=2, detect unsafe core_pattern settings and warn when they are seen. Signed-off-by: NKees Cook <keescook@chromium.org> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 7月, 2012 2 次提交
-
-
由 Peter Zijlstra 提交于
Stefan reported a crash on a kernel before a3e5d109 ("sched: Don't call task_group() too many times in set_task_rq()"), he found the reason to be that the multiple task_group() invocations in set_task_rq() returned different values. Looking at all that I found a lack of serialization and plain wrong comments. The below tries to fix it using an extra pointer which is updated under the appropriate scheduler locks. Its not pretty, but I can't really see another way given how all the cgroup stuff works. Reported-and-tested-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mike Galbraith 提交于
Traversing an entire package is not only expensive, it also leads to tasks bouncing all over a partially idle and possible quite large package. Fix that up by assigning a 'buddy' CPU to try to motivate. Each buddy may try to motivate that one other CPU, if it's busy, tough, it may then try its SMT sibling, but that's all this optimization is allowed to cost. Sibling cache buddies are cross-wired to prevent bouncing. 4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better: clients 1 2 4 8 16 32 64 128 .......................................................................... pre 30 41 118 645 3769 6214 12233 14312 post 299 603 1211 2418 4697 6847 11606 14557 A nice increase in performance. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 7月, 2012 3 次提交
-
-
由 Al Viro 提交于
recursion in __scm_destroy() will be cut by delaying final fput() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
task_work and rcu_head are identical now; merge them (calling the result struct callback_head, rcu_head #define'd to it), kill separate allocation in security/keys since we can just use cred->rcu now. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
layout based on Oleg's suggestion; single-linked list, task->task_works points to the last element, forward pointer from said last element points to head. I'd still prefer much more regular scheme with two pointers in task_work, but... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 06 7月, 2012 1 次提交
-
-
由 Peter Zijlstra 提交于
Thanks to Charles Wang for spotting the defects in the current code: - If we go idle during the sample window -- after sampling, we get a negative bias because we can negate our own sample. - If we wake up during the sample window we get a positive bias because we push the sample to a known active period. So rewrite the entire nohz load-avg muck once again, now adding copious documentation to the code. Reported-and-tested-by: NDoug Smythies <dsmythies@telus.net> Reported-and-tested-by: NCharles Wang <muming.wq@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins [ minor edits ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 7月, 2012 1 次提交
-
-
由 Paul E. McKenney 提交于
This reverts commit 616c310e. (Move PREEMPT_RCU preemption to switch_to() invocation). Testing by Sasha Levin <levinsasha928@gmail.com> showed that this can result in deadlock due to invoking the scheduler when one of the runqueue locks is held. Because this commit was simply a performance optimization, revert it. Reported-by: NSasha Levin <levinsasha928@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: NSasha Levin <levinsasha928@gmail.com>
-
- 08 6月, 2012 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
A fix for commit b32dfe37 ("c/r: prctl: add ability to set new mm_struct::exe_file"). After removing mm->num_exe_file_vmas kernel keeps mm->exe_file until final mmput(), it never becomes NULL while task is alive. We can check for other mapped files in mm instead of checking mm->num_exe_file_vmas, and mark mm with flag MMF_EXE_FILE_CHANGED in order to forbid second changing of mm->exe_file. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 6月, 2012 2 次提交
-
-
由 Oleg Nesterov 提交于
Kill the no longer needed uprobes_srcu/uprobe_srcu_id code. It doesn't really work anyway. synchronize_srcu() can only synchronize with the code "inside" the srcu_read_lock/srcu_read_unlock section, while uprobe_pre_sstep_notifier() does srcu_read_lock() _after_ we already hit the breakpoint. I guess this probably works "in practice". synchronize_srcu() is slow and it implies synchronize_sched(), and the probed task enters the non- preemptible section at the start of exception handler. Still this is not right at least in theory, and task->uprobe_srcu_id blows task_struct. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120529193008.GG8057@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Weird topologies can lead to asymmetric domain setups. This needs further consideration since these setups are typically non-minimal too. For now, make it work by adding an extra mask selecting which CPUs are allowed to iterate up. The topology that triggered it is the one from David Rientjes: 10 20 20 30 20 10 20 20 20 20 10 20 30 20 20 10 resulting in boxes that wouldn't even boot. Reported-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3p86l9cuaqnxz7uxsojmz5rm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 6月, 2012 3 次提交
-
-
由 Al Viro 提交于
Only 3 out of 63 do not. Renamed the current variant to __set_current_blocked(), added set_current_blocked() that will exclude unblockable signals, switched open-coded instances to it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
replace boilerplate "should we use ->saved_sigmask or ->blocked?" with calls of obvious inlined helper... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
first fruits of ..._restore_sigmask() helpers: now we can take boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK and restore the blocked mask from ->saved_mask" into a common helper. Open-coded instances switched... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 5月, 2012 2 次提交
-
-
由 Peter Zijlstra 提交于
Since nr_cpus_allowed is used outside of sched/rt.c and wants to be used outside of there more, move it to a more natural site. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kr61f02y9brwzkh6x53pdptm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Follow up on commit 556061b0 ("sched/nohz: Fix rq->cpu_load[] calculations") since while that fixed the busy case it regressed the mostly idle case. Add a callback from the nohz exit to also age the rq->cpu_load[] array. This closes the hole where either there was no nohz load balance pass during the nohz, or there was a 'significant' amount of idle time between the last nohz balance and the nohz exit. So we'll update unconditionally from the tick to not insert any accidental 0 load periods while busy, and we try and catch up from nohz idle balance and nohz exit. Both these are still prone to missing a jiffy, but that has always been the case. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: pjt@google.com Cc: Venkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 5月, 2012 3 次提交
-
-
由 Oleg Nesterov 提交于
Kill the no longer used task_struct->replacement_session_keyring, update copy_creds() and exit_creds(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Oleg Nesterov 提交于
exit_irq_thread() and task->irq_thread are needed to handle the unexpected (and unlikely) exit of irq-thread. We can use task_work instead and make this all private to kernel/irq/manage.c, cleanup plus micro-optimization. 1. rename exit_irq_thread() to irq_thread_dtor(), make it static, and move it up before irq_thread(). 2. change irq_thread() to do task_work_add(irq_thread_dtor) at the start and task_work_cancel() before return. tracehook_notify_resume() can never play with kthreads, only do_exit()->exit_task_work() can call the callback and this is what we want. 3. remove task_struct->irq_thread and the special hook in do_exit(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Oleg Nesterov 提交于
Provide a simple mechanism that allows running code in the (nonatomic) context of the arbitrary task. The caller does task_work_add(task, task_work) and this task executes task_work->func() either from do_notify_resume() or from do_exit(). The callback can rely on PF_EXITING to detect the latter case. "struct task_work" can be embedded in another struct, still it has "void *data" to handle the most common/simple case. This allows us to kill the ->replacement_session_keyring hack, and potentially this can have more users. Performance-wise, this adds 2 "unlikely(!hlist_empty())" checks into tracehook_notify_resume() and do_exit(). But at the same time we can remove the "replacement_session_keyring != NULL" checks from arch/*/signal.c and exit_creds(). Note: task_work_add/task_work_run abuses ->pi_lock. This is only because this lock is already used by lookup_pi_state() to synchronize with do_exit() setting PF_EXITING. Fortunately the scope of this lock in task_work.c is really tiny, and the code is unlikely anyway. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 5月, 2012 1 次提交
-
-
由 Peter Zijlstra 提交于
It's been broken forever (i.e. it's not scheduling in a power aware fashion), as reported by Suresh and others sending patches, and nobody cares enough to fix it properly ... so remove it to make space free for something better. There's various problems with the code as it stands today, first and foremost the user interface which is bound to topology levels and has multiple values per level. This results in a state explosion which the administrator or distro needs to master and almost nobody does. Furthermore large configuration state spaces aren't good, it means the thing doesn't just work right because it's either under so many impossibe to meet constraints, or even if there's an achievable state workloads have to be aware of it precisely and can never meet it for dynamic workloads. So pushing this kind of decision to user-space was a bad idea even with a single knob - it's exponentially worse with knobs on every node of the topology. There is a proposal to replace the user interface with a single 3 state knob: sched_balance_policy := { performance, power, auto } where 'auto' would be the preferred default which looks at things like Battery/AC mode and possible cpufreq state or whatever the hw exposes to show us power use expectations - but there's been no progress on it in the past many months. Aside from that, the actual implementation of the various knobs is known to be broken. There have been sporadic attempts at fixing things but these always stop short of reaching a mergable state. Therefore this wholesale removal with the hopes of spurring people who care to come forward once again and work on a coherent replacement. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 5月, 2012 1 次提交
-
-
由 Peter Zijlstra 提交于
Patches c22402a2 ("sched/fair: Let minimally loaded cpu balance the group") and 0ce90475 ("sched/fair: Add some serialization to the sched_domain load-balance walk") are horribly broken so revert them. The problem is that while it sounds good to have the minimally loaded cpu do the pulling of more load, the way we walk the domains there is absolutely no guarantee this cpu will actually get to the domain. In fact its very likely it wont. Therefore the higher up the tree we get, the less likely it is we'll balance at all. The first of mask always walks up, while sucky in that it accumulates load on the first cpu and needs extra passes to spread it out at least guarantees a cpu gets up that far and load-balancing happens at all. Since its now always the first and idle cpus should always be able to balance so they get a task as fast as possible we can also do away with the added serialization. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-rpuhs5s56aiv1aw7khv9zkw6@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-