- 10 3月, 2012 4 次提交
-
-
由 Alexander Gordeev 提交于
Currently IRQTF_DIED flag is set when a IRQ thread handler calls do_exit() But also PF_EXITING per process flag gets set when a thread exits. This fix eliminates the duplicate by using PF_EXITING flag. Also, there is a race condition in exit_irq_thread(). In case a thread's bit is cleared in desc->threads_oneshot (and the IRQ line gets unmasked), but before IRQTF_DIED flag is set, a new interrupt might come in and set just cleared bit again, this time forever. This fix throws IRQTF_DIED flag away, eliminating the race as a result. [ tglx: Test THREAD_EXITING first as suggested by Oleg ] Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAlexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120309135958.GD2114@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Alexander Gordeev 提交于
Since 63706172 kthread_stop() is not afraid of dead kernel threads. So no need to check if a thread is alive before stopping it. These checks still were racy. Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAlexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120309135939.GC2114@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Alexander Gordeev 提交于
When a new thread handler is created, an irqaction is passed to it as data. Not only that irqaction is stored in task_struct by the handler for later use, but also a structure associated with the kernel thread keeps this value as long as the thread exists. This fix kicks irqaction out off task_struct. Yes, I introduce new bit field. But it allows not only to eliminate the duplicate, but also shortens size of task_struct. Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAlexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120309135925.GB2114@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Alexander Gordeev 提交于
We do not want a bitwise AND between boolean operands Signed-off-by: NAlexander Gordeev <agordeev@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20120309135912.GA2114@dhcp-26-207.brq.redhat.com Cc: stable@vger.kernel.org Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 06 3月, 2012 2 次提交
-
-
由 Heiko Carstens 提交于
The two invoke_softirq() variants are identical except for a single line. So move the #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED inside one of the functions and get rid of the other one. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Russell King 提交于
In 2008, commit 0c5d1eb7 ("genirq: record trigger type") modified the way set_irq_type() handles the 'no trigger' condition. However, this has an adverse effect on PCMCIA support on Intel StrongARM and probably PXA platforms. PCMCIA has several status signals on the socket which can trigger interrupts; some of these status signals depend on the card's mode (whether it is configured in memory or IO mode). For example, cards have a 'Ready/IRQ' signal: in memory mode, this provides an indication to PCMCIA that the card has finished its power up initialization. In IO mode, it provides the device interrupt signal. Other status signals switch between on-board battery status and loud speaker output. In classical PCMCIA implementations, where you have a specific socket controller, the controller provides a method to mask interrupts from the socket, and importantly ignore any state transitions on the pins which correspond with interrupts once masked. This masking prevents unwanted events caused by the removal and application of socket power being forwarded. However, on platforms where there is no socket controller, the PCMCIA status and interrupt signals are routed to standard edge-triggered GPIOs. These GPIOs can be configured to interrupt on rising edge, falling edge, or never. This is where the problems start. Edge triggered interrupts are required to record events while disabled via the usual methods of {free,request,disable,enable}_irq() to prevent problems with dropped interrupts (eg, the 8390 driver uses disable_irq() to defer the delivery of interrupts). As a result, these interfaces can not be used to implement the desired behaviour. The side effect of this is that if the 'Ready/IRQ' GPIO is disabled via disable_irq() on suspend, and enabled via enable_irq() after resume, we will record the state transitions caused by powering events as valid interrupts, and foward them to the card driver, which may attempt to access a card which is not powered up. This leads delays resume while drivers spin in their interrupt handlers, and complaints from drivers before they realize what's happened. Moreover, in the case of the 'Ready/IRQ' signal, this is requested and freed by the card driver itself; the PCMCIA core has no idea whether the interrupt is requested, and, therefore, whether a call to disable_irq() would be valid. (We tried this around 2.4.17 / 2.5.1 kernel era, and ended up throwing it out because of this problem.) Therefore, it was decided back in around 2002 to disable the edge triggering instead, resulting in all state transitions on the GPIO being ignored. That's what we actually need the hardware to do. The commit above changes this behaviour; it explicitly prevents the 'no trigger' state being selected. The reason that request_irq() does not accept the 'no trigger' state is for compatibility with existing drivers which do not provide their desired triggering configuration. The set_irq_type() function is 'new' and not used by non-trigger aware drivers. Therefore, revert this change, and restore previously working platforms back to their former state. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Cc: linux@arm.linux.org.uk Cc: Ingo Molnar <mingo@elte.hu> Cc: stable@vger.kernel.org Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 14 2月, 2012 1 次提交
-
-
由 Dave Young 提交于
Allow bint param accept nul values, just do same as bool param. Signed-off-by: NDave Young <dyoung@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 10 2月, 2012 1 次提交
-
-
由 Dan Carpenter 提交于
"subbuf_size" and "n_subbufs" come from the user and they need to be capped to prevent an integer overflow. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 2月, 2012 2 次提交
-
-
由 Stephane Eranian 提交于
The following patch fixes a bug introduced by the following commit: e050e3f0 ("perf: Fix broken interrupt rate throttling") The patch caused the following warning to pop up depending on the sampling frequency adjustments: ------------[ cut here ]------------ WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4() It was caused by the following call sequence: perf_adjust_freq_unthr_context.part() { stop() if (delta > 0) { perf_adjust_period() { if (period > 8*...) { stop() ... start() } } } start() } Which caused a double start and a double stop, thus triggering the assert in x86_pmu_start(). The patch fixes the problem by avoiding the double calls. We pass a new argument to perf_adjust_period() to indicate whether or not the event is already stopped. We can't just remove the start/stop from that function because it's called from __perf_event_overflow where the event needs to be reloaded via a stop/start back-toback call. The patch reintroduces the assertion in x86_pmu_start() which was removed by commit: 84f2b9b2 ("perf: Remove deprecated WARN_ON_ONCE()") In this second version, we've added calls to disable/enable PMU during unthrottling or frequency adjustment based on bug report of spurious NMI interrupts from Eric Dumazet. Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NStephane Eranian <eranian@google.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: markus@trippelsdorf.de Cc: paulus@samba.org Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad [ Minor edits to the changelog and to the code ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
put_io_context() performed a complex trylock dancing to avoid deferring ioc release to workqueue. It was also broken on UP because trylock was always assumed to succeed which resulted in unbalanced preemption count. While there are ways to fix the UP breakage, even the most pathological microbench (forced ioc allocation and tight fork/exit loop) fails to show any appreciable performance benefit of the optimization. Strip it out. If there turns out to be workloads which are affected by this change, simpler optimization from the discussion thread can be applied later. Signed-off-by: NTejun Heo <tj@kernel.org> LKML-Reference: <1328514611.21268.66.camel@sli10-conroe> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 2月, 2012 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
If freezing of kernel threads fails, we are expected to automatically thaw tasks in the error recovery path. However, at times, we encounter situations in which we would like the automatic error recovery path to thaw only the kernel threads, because we want to be able to do some more cleanup before we thaw userspace. Something like: error = freeze_kernel_threads(); if (error) { /* Do some cleanup */ /* Only then thaw userspace tasks*/ thaw_processes(); } An example of such a situation is where we freeze/thaw filesystems during suspend/hibernation. There, if freezing of kernel threads fails, we would like to thaw the frozen filesystems before thawing the userspace tasks. So, modify freeze_kernel_threads() to thaw only kernel threads in case of freezing failure. And change suspend_freeze_processes() accordingly. (At the same time, let us also get rid of the rather cryptic usage of the conditional operator (:?) in that function.) [rjw: In fact, this patch fixes a regression introduced during the 3.3 merge window, because without it thaw_processes() may be called before swsusp_free() in some situations and that may lead to massive memory allocation failures.] Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NNigel Cunningham <nigel@tuxonice.net> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 04 2月, 2012 1 次提交
-
-
由 Jiang Liu 提交于
In function pre_handler_kretprobe(), the allocated kretprobe_instance object will get leaked if the entry_handler callback returns non-zero. This may cause all the preallocated kretprobe_instance objects exhausted. This issue can be reproduced by changing samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix is straightforward: just put the allocated kretprobe_instance object back onto the free_instances list. [akpm@linux-foundation.org: use raw_spin_lock/unlock] Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Acked-by: NJim Keniston <jkenisto@us.ibm.com> Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 2月, 2012 1 次提交
-
-
由 Christopher Yeoh 提交于
This fixes the race in process_vm_core found by Oleg (see http://article.gmane.org/gmane.linux.kernel/1235667/ for details). This has been updated since I last sent it as the creation of the new mm_access() function did almost exactly the same thing as parts of the previous version of this patch did. In order to use mm_access() even when /proc isn't enabled, we move it to kernel/fork.c where other related process mm access functions already are. Signed-off-by: NChris Yeoh <yeohc@au1.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 2月, 2012 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
In the SNAPSHOT_CREATE_IMAGE ioctl, if the call to hibernation_snapshot() fails, the frozen tasks are not thawed. And in the case of success, if we happen to exit due to a successful freezer test, all tasks (including those of userspace) are thawed, whereas actually we should have thawed only the kernel threads at that point. Fix both these issues. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Cc: stable@vger.kernel.org
-
- 30 1月, 2012 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Commit 2aede851 PM / Hibernate: Freeze kernel threads after preallocating memory introduced a mechanism by which kernel threads were frozen after the preallocation of hibernate image memory to avoid problems with frozen kernel threads not responding to memory freeing requests. However, it overlooked the s2disk code path in which the SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE, which caused freeze_workqueues_begin() to BUG(), because it saw that worqueues had been already frozen. Although in principle this issue might be addressed by removing the relevant BUG_ON() from freeze_workqueues_begin(), that would reintroduce the very problem that commit 2aede851 attempted to avoid into that particular code path. For this reason, to fix the issue at hand, introduce thaw_kernel_threads() and make the SNAPSHOT_FREE ioctl execute it. Special thanks to Srivatsa S. Bhat for detailed analysis of the problem. Reported-and-tested-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: stable@kernel.org
-
- 27 1月, 2012 7 次提交
-
-
由 Chanho Min 提交于
This issue happens under the following conditions: 1. preemption is off 2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined 3. RT scheduling class 4. SMP system Sequence is as follows: 1.suppose current task is A. start schedule() 2.task A is enqueued pushable task at the entry of schedule() __schedule prev = rq->curr; ... put_prev_task put_prev_task_rt enqueue_pushable_task 4.pick the task B as next task. next = pick_next_task(rq); 3.rq->curr set to task B and context_switch is started. rq->curr = next; 4.At the entry of context_swtich, release this cpu's rq->lock. context_switch prepare_task_switch prepare_lock_switch raw_spin_unlock_irq(&rq->lock); 5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context 6.try_to_wake_up() which called by ISR acquires rq->lock try_to_wake_up ttwu_remote rq = __task_rq_lock(p) ttwu_do_wakeup(rq, p, wake_flags); task_woken_rt 7.push_rt_task picks the task A which is enqueued before. task_woken_rt push_rt_tasks(rq) next_task = pick_next_pushable_task(rq) 8.At find_lock_lowest_rq(), If double_lock_balance() returns 0, lowest_rq can be the remote rq. (But,If preemption is on, double_lock_balance always return 1 and it does't happen.) push_rt_task find_lock_lowest_rq if (double_lock_balance(rq, lowest_rq)).. 9.find_lock_lowest_rq return the available rq. task A is migrated to the remote cpu/rq. push_rt_task ... deactivate_task(rq, next_task, 0); set_task_cpu(next_task, lowest_rq->cpu); activate_task(lowest_rq, next_task, 0); 10. But, task A is on irq context at this cpu. So, task A is scheduled by two cpus at the same time until restore from IRQ. Task A's stack is corrupted. To fix it, don't migrate an RT task if it's still running. Signed-off-by: NChanho Min <chanho.min@lge.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Stephane Eranian 提交于
This patch fixes the sampling interrupt throttling mechanism. It was broken in v3.2. Events were not being unthrottled. The unthrottling mechanism required that events be checked at each timer tick. This patch solves this problem and also separates: - unthrottling - multiplexing - frequency-mode period adjustments Not all of them need to be executed at each timer tick. This third version of the patch is based on my original patch + PeterZ proposal (https://lkml.org/lkml/2012/1/7/87). At each timer tick, for each context: - if the current CPU has throttled events, we unthrottle events - if context has frequency-based events, we adjust sampling periods - if we have reached the jiffies interval, we multiplex (rotate) We decoupled rotation (multiplexing) from frequency-mode sampling period adjustments. They should not necessarily happen at the same rate. Multiplexing is subject to jiffies_interval (currently at 1 but could be higher once the tunable is exposed via sysfs). We have grouped frequency-mode adjustment and unthrottling into the same routine to minimize code duplication. When throttled while in frequency mode, we scan the events only once. We have fixed the threshold enforcement code in __perf_event_overflow(). There was a bug whereby it would allow more than the authorized rate because an increment of hwc->interrupts was not executed at the right place. The patch was tested with low sampling limit (2000) and fixed periods, frequency mode, overcommitted PMU. On a 2.1GHz AMD CPU: $ cat /proc/sys/kernel/perf_event_max_sample_rate 2000 We set a rate of 3000 samples/sec (2.1GHz/3000 = 700000): $ perf record -e cycles,cycles -c 700000 noploop 10 $ perf report -D | tail -21 Aggregated stats: TOTAL events: 80086 MMAP events: 88 COMM events: 2 EXIT events: 4 THROTTLE events: 19996 UNTHROTTLE events: 19996 SAMPLE events: 40000 cycles stats: TOTAL events: 40006 MMAP events: 5 COMM events: 1 EXIT events: 4 THROTTLE events: 9998 UNTHROTTLE events: 9998 SAMPLE events: 20000 cycles stats: TOTAL events: 39996 THROTTLE events: 9998 UNTHROTTLE events: 9998 SAMPLE events: 20000 For 10s, the cap is 2x2000x10 = 40000 samples. We get exactly that: 20000 samples/event. Signed-off-by: NStephane Eranian <eranian@google.com> Cc: <stable@kernel.org> # v3.2+ Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120126160319.GA5655@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Yasunori Goto 提交于
try_to_wake_up() has a problem which may change status from TASK_DEAD to TASK_RUNNING in race condition with SMI or guest environment of virtual machine. As a result, exited task is scheduled() again and panic occurs. Here is the sequence how it occurs: ----------------------------------+----------------------------- | CPU A | CPU B ----------------------------------+----------------------------- TASK A calls exit().... do_exit() exit_mm() down_read(mm->mmap_sem); rwsem_down_failed_common() set TASK_UNINTERRUPTIBLE set waiter.task <= task A list_add to sem->wait_list : raw_spin_unlock_irq() (I/O interruption occured) __rwsem_do_wake(mmap_sem) list_del(&waiter->list); waiter->task = NULL wake_up_process(task A) try_to_wake_up() (task is still TASK_UNINTERRUPTIBLE) p->on_rq is still 1.) ttwu_do_wakeup() (*A) : (I/O interruption handler finished) if (!waiter.task) schedule() is not called due to waiter.task is NULL. tsk->state = TASK_RUNNING : check_preempt_curr(); : task->state = TASK_DEAD (*B) <--- set TASK_RUNNING (*C) schedule() (exit task is running again) BUG_ON() is called! -------------------------------------------------------- The execution time between (*A) and (*B) is usually very short, because the interruption is disabled, and setting TASK_RUNNING at (*C) must be executed before setting TASK_DEAD. HOWEVER, if SMI is interrupted between (*A) and (*B), (*C) is able to execute AFTER setting TASK_DEAD! Then, exited task is scheduled again, and BUG_ON() is called.... If the system works on guest system of virtual machine, the time between (*A) and (*B) may be also long due to scheduling of hypervisor, and same phenomenon can occur. By this patch, do_exit() waits for releasing task->pi_lock which is used in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after waking up. Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Prarit Bhargava 提交于
rsyslog will display KERN_EMERG messages on a connected terminal. However, these messages are useless/undecipherable for a general user. For example, after a softlockup we get: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Stack: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Call Trace: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89 This happens because the printk levels for these messages are incorrect. Only an informational message should be displayed on a terminal. I modified the printk levels for various messages in the kernel and tested the output by using the drivers/misc/lkdtm.c kernel modules (ie, softlockups, panics, hard lockups, etc.) and confirmed that the console output was still the same and that the output to the terminals was correct. For example, in the case of a softlockup we now see the much more informative: Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ... BUG: soft lockup - CPU4 stuck for 60s! instead of the above confusing messages. AFAICT, the messages no longer have to be KERN_EMERG. In the most important case of a panic we set console_verbose(). As for the other less severe cases the correct data is output to the console and /var/log/messages. Successfully tested by me using the drivers/misc/lkdtm.c module. Signed-off-by: NPrarit Bhargava <prarit@redhat.com> Cc: dzickus@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Suresh Siddha 提交于
With the recent nohz scheduler changes, rq's nohz flag 'NOHZ_TICK_STOPPED' and its associated state doesn't get cleared immediately after the cpu exits idle. This gets cleared as part of the next tick seen on that cpu. For the cpu offline support, we need to clear this state manually. Fix it by registering a cpu notifier, which clears the nohz idle load balance state for this rq explicitly during the CPU_DYING notification. There won't be any nohz updates for that cpu, after the CPU_DYING notification. But lets be extra paranoid and skip updating the nohz state in the select_nohz_load_balancer() if the cpu is not in active state anymore. Reported-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-and-tested-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1327026538.16150.40.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Christian Borntraeger 提交于
Commit 029632fb ("sched: Make separate sched*.c translation units") removed the include of asm/mutex.h from sched.c. This breaks the combination of: CONFIG_MUTEX_SPIN_ON_OWNER=yes CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX=yes like s390 without mutex debugging: CC kernel/sched/core.o kernel/sched/core.c: In function ‘mutex_spin_on_owner’: kernel/sched/core.c:3287: error: implicit declaration of function ‘arch_mutex_cpu_relax’ Lets re-add the include to kernel/sched/core.c Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1326268696-30904-1-git-send-email-borntraeger@de.ibm.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
KOSAKI Motohiro noticed the following race: > CPU0 CPU1 > -------------------------------------------------------- > deactivate_task() > task->state = TASK_UNINTERRUPTIBLE; > activate_task() > rq->nr_uninterruptible--; > > schedule() > deactivate_task() > rq->nr_uninterruptible++; > Kosaki-San's scenario is possible when CPU0 runs __sched_setscheduler() against CPU1's current @task. __sched_setscheduler() does a dequeue/enqueue in order to move the task to its new queue (position) to reflect the newly provided scheduling parameters. However it should be completely invariant to nr_uninterruptible accounting, sched_setscheduler() doesn't affect readyness to run, merely policy on when to run. So convert the inappropriate activate/deactivate_task usage to enqueue/dequeue_task, which avoids the nr_uninterruptible accounting. Also convert the two other sites: __migrate_task() and normalize_task() that still use activate/deactivate_task. These sites aren't really a problem since __migrate_task() will only be called on non-running task (and therefore are immume to the described problem) and normalize_task() isn't ever used on regular systems. Also remove the comments from activate/deactivate_task since they're misleading at best. Reported-by: NKOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1327486224.2614.45.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 1月, 2012 3 次提交
-
-
由 Randy Dunlap 提交于
Fix new kernel-doc notation warnings: Warning(include/linux/sched.h:2094): No description found for parameter 'p' Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task' Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri' Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set' Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
Fix new kernel-doc warnings in auditsc.c: Warning(kernel/auditsc.c:1875): No description found for parameter 'success' Warning(kernel/auditsc.c:1875): No description found for parameter 'return_code' Warning(kernel/auditsc.c:1875): Excess function parameter 'pt_regs' description in '__audit_syscall_exit' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
Commit ef53d9c5 ("kprobes: improve kretprobe scalability with hashed locking") introduced a bug where we can potentially leak kretprobe_instances since we initialize a hlist head after having used it. Initialize the hlist head before using it. Reported by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: NJim Keniston <jkenisto@us.ibm.com> Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srinivasa D S <srinivasa@in.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 1月, 2012 1 次提交
-
-
由 Glauber Costa 提交于
There is a case in __sk_mem_schedule(), where an allocation is beyond the maximum, but yet we are allowed to proceed. It happens under the following condition: sk->sk_wmem_queued + size >= sk->sk_sndbuf The network code won't revert the allocation in this case, meaning that at some point later it'll try to do it. Since this is never communicated to the underlying res_counter code, there is an inbalance in res_counter uncharge operation. I see two ways of fixing this: 1) storing the information about those allocations somewhere in memcg, and then deducting from that first, before we start draining the res_counter, 2) providing a slightly different allocation function for the res_counter, that matches the original behavior of the network code more closely. I decided to go for #2 here, believing it to be more elegant, since #1 would require us to do basically that, but in a more obscure way. Signed-off-by: NGlauber Costa <glommer@parallels.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> CC: Tejun Heo <tj@kernel.org> CC: Li Zefan <lizf@cn.fujitsu.com> CC: Laurent Chavey <chavey@google.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 1月, 2012 2 次提交
-
-
由 Namhyung Kim 提交于
The perf_event_time() will call perf_cgroup_event_time() if @event is a cgroup event. Just do it directly and avoid the extra check.. Signed-off-by: NNamhyung Kim <namhyung.kim@lge.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1327021966-27688-2-git-send-email-namhyung.kim@lge.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Namhyung Kim 提交于
When alloc_callchain_buffers() fails, it frees all of entries before return. In addition, calling the release_callchain_buffers() will cause a NULL pointer dereference since callchain_cpu_entries is not set. Signed-off-by: NNamhyung Kim <namhyung.kim@lge.com> Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1327021966-27688-1-git-send-email-namhyung.kim@lge.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 1月, 2012 1 次提交
-
-
由 Namhyung Kim 提交于
The struct bm_block is allocated by chain_alloc(), so it'd better counting it in LINKED_PAGE_DATA_SIZE. Signed-off-by: NNamhyung Kim <namhyung.kim@lge.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 18 1月, 2012 11 次提交
-
-
由 Kees Cook 提交于
audit_log_d_path() injects an additional space before the prefix, which serves no purpose and doesn't mix well with other audit_log*() functions that do not sneak extra characters into the log. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Xi Wang 提交于
In the loop, a size_t "len" is used to hold the return value of audit_log_single_execve_arg(), which returns -1 on error. In that case the error handling (len <= 0) will be bypassed since "len" is unsigned, and the loop continues with (p += len) being wrapped. Change the type of "len" to signed int to fix the error handling. size_t len; ... for (...) { len = audit_log_single_execve_arg(...); if (len <= 0) break; p += len; } Signed-off-by: NXi Wang <xi.wang@gmail.com> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Peter Moody 提交于
This allows audit to specify rules in which we compare two fields of a process. Such as is the running process uid != to the running process euid? Signed-off-by: NPeter Moody <pmoody@google.com> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Peter Moody 提交于
This completes the matrix of interfield comparisons between uid/gid information for the current task and the uid/gid information for inodes. aka I can audit based on differences between the euid of the process and the uid of fs objects. Signed-off-by: NPeter Moody <pmoody@google.com> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
Allow audit rules to compare the gid of the running task to the gid of the inode in question. Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
Rather than code the same loop over and over implement a helper function which uses some pointer magic to make it generic enough to be used numerous places as we implement more audit interfield comparisons Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
We wish to be able to audit when a uid=500 task accesses a file which is uid=0. Or vice versa. This patch introduces a new audit filter type AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields should be compared. At this point we only define the task->uid vs inode->uid, but other comparisons can be added. Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
Just a code cleanup really. We don't need to make a function call just for it to return on error. This also makes the VFS function even easier to follow and removes a conditional on a hot path. Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
At the moment we allow tasks to set their loginuid if they have CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they log in and it be impossible to ever reset. We had to make it mutable even after it was once set (with the CAP) because on update and admin might have to restart sshd. Now sshd would get his loginuid and the next user which logged in using ssh would not be able to set his loginuid. Systemd has changed how userspace works and allowed us to make the kernel work the way it should. With systemd users (even admins) are not supposed to restart services directly. The system will restart the service for them. Thus since systemd is going to loginuid==-1, sshd would get -1, and sshd would be allowed to set a new loginuid without special permissions. If an admin in this system were to manually start an sshd he is inserting himself into the system chain of trust and thus, logically, it's his loginuid that should be used! Since we have old systems I make this a Kconfig option. Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
The function always deals with current. Don't expose an option pretending one can use it for something. You can't. Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
Much like the ability to filter audit on the uid of an inode collected, we should be able to filter on the gid of the inode. Signed-off-by: NEric Paris <eparis@redhat.com>
-