- 22 4月, 2015 18 次提交
-
-
由 Thomas Gleixner 提交于
The evaluation of the next timer in the nohz code is based on jiffies while all the tick internals are nano seconds based. We have also to convert hrtimer nanoseconds to jiffies in the !highres case. That's just wrong and introduces interesting corner cases. Turn it around and convert the next timer wheel timer expiry and the rcu event to clock monotonic and base all calculations on nanoseconds. That identifies the case where no timer is pending clearly with an absolute expiry value of KTIME_MAX. Makes the code more readable and gets rid of the jiffies magic in the nohz code. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Get rid of one indentation level. Preparatory patch for a major rework. No functional change. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.101563235@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
We already got rid of the hrtimer reprogramming loops and hoops as hrtimer now enforces an interrupt if the enqueued time is in the past. Do the same for the nohz non highres mode. That gets rid of the need to raise the softirq which only serves the purpose of getting the machine out of the inner idle loop. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.023464878@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
hrtimer_start() enforces a timer interrupt if the timer is already expired. Get rid of the checks and the forward loop. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203501.943658239@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
hrtimer softirq is a leftover from the initial implementation and serves only the purpose to handle the enqueueing of already expired timers in the high resolution timer mode. We discussed whether we change the return value and force all start sites to handle that the timer is already expired, but that would be a Herculean task and I'm not sure whether its a good idea to enforce that handling on everyone. A simpler solution is to enforce a timer interrupt instead of raising and scheduling a softirq. Just use the existing infrastructure to do so and remove all the softirq leftovers. The HRTIMER softirq enum is now unused, but kept around because trace parsers rely on the existing numbering. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203501.840834708@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
__remove_hrtimer() needs to evaluate the expiry time to figure out whether the timer which is removed is eventually the first expiring timer on the cpu. Keep a pointer to it, which is lazily updated, so we can avoid the evaluation dance and retrieve the information from there. Generates slightly better code. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203501.752838019@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Use the return value instead of reevaluating the information. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203501.658152945@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
The active_bases field is guaranteed to be in sync with the timerqueue of the corresponding clock base. So we can use it for iterating over the clock bases. This allows to break out early if no more active clock bases are available and avoids touching the cache lines of inactive clock bases. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203501.322887675@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
No point in wasting 12 byte storage space. Generates better code as well. Text size reduction: x8664 -64, i386 -16, ARM -132, ARM64 -0, power64 -48 Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203501.227955358@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
On every tick/hrtimer interrupt we update the offset variables of the clock bases. That's silly because these offsets change very seldom. Add a sequence counter to the time keeping code which keeps track of the offset updates (clock_was_set()). Have a sequence cache in the hrtimer cpu bases to evaluate whether the offsets must be updated or not. This allows us later to avoid pointless cacheline pollution. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20150414203501.132820245@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
The softirq time field in the clock bases is an optimization from the early days of hrtimers. It provides a coarse "jiffies" like time mostly for self rearming timers. But that comes with a price: - Larger code size - Extra storage space - Duplicated functions with really small differences The benefit of this is optimization is marginal for contemporary systems. Consolidate everything on the high resolution timer implementation. This makes further optimizations possible. Text size reduction: x8664 -95, i386 -356, ARM -148, ARM64 -40, power64 -16 Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203501.039977424@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
No point in having usigned long for /proc/timer_list statistics. Make them unsigned int. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203500.959773467@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
The resolution is directly accessible now. So its simpler just to fill in the values of the timespec and be done with it. Text size reduction (combined with "hrtimer: Get rid of the resolution field in hrtimer_clock_base"): x8664 -61, i386 -221, ARM -60, power64 -48 Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203500.879888080@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
The field has no value because all clock bases have the same resolution. The resolution only changes when we switch to high resolution timer mode. We can evaluate that from a single static variable as well. In the !HIGHRES case its simply a constant. Export the variable, so we can simplify the usage sites. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203500.645454122@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Viresh Kumar 提交于
'active_bases' indicates which clock-base have active timer. The intention of this bit field was to avoid evaluating inactive bases. It was introduced with the introduction of the BOOTTIME and TAI clock bases, but it was never brought into full use. We want to use it now, but in __remove_hrtimer() the update happens after the calling hrtimer_force_reprogram() which has to evaluate all clock bases for the next expiring timer. So in case the last timer of a clock base got removed we still see the active bit and therefor evaluate the clock base for no value. There are further optimizations possible when active_bases is updated in the right place. Move the update before the call to hrtimer_force_reprogram() [ tglx: Massaged changelog ] Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/20150414203500.533438642@linutronix.de Link: http://lkml.kernel.org/r/c7c8ebcd9ed88bb09d76059c745a1fafb48314e7.1428039899.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Document the calling context conditions. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150413210035.178751779@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
commit 61edec81 "timekeeping: Simplify timekeeping_clocktai()" implemented timekeeping_clocktai() as an inline function, but left the old extern prototype in the header file. Remove it. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Joe Perches 提交于
This macro can be converted to a static function to reduce object size. (x86-64 defconfig) $ size kernel/time/timer_list.o* text data bss dec hex filename 6583 8 0 6591 19bf kernel/time/timer_list.o.old 4647 8 0 4655 122f kernel/time/timer_list.o.new Signed-off-by: NJoe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1429295958.2850.104.camel@perches.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 20 4月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
Commit 8053871d ("smp: Fix smp_call_function_single_async() locking") fixed the locking for the asynchronous smp-call case, but in the process of moving the lock handling around, one of the error cases ended up not unlocking the call data at all. This went unnoticed on x86, because this is a "caller is buggy" case, where the caller is trying to call a non-existent CPU. But apparently ARM does that (at least under qemu-arm). Bindly doing cross-cpu calls to random CPU's that aren't even online seems a bit fishy, but the error handling was clearly not correct. Simply add the missing "csd_unlock()" to the error path. Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net> Analyzed-by: NRabin Vincent <rabin@rab.in> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 4月, 2015 17 次提交
-
-
由 Davidlohr Bueso 提交于
sync_buffer() needs the mmap_sem for two distinct operations, both only occurring upon user context switch handling: 1) Dealing with the exe_file. 2) Adding the dcookie data as we need to lookup the vma that backs it. This is done via add_sample() and add_data(). This patch isolates 1), for it will no longer need the mmap_sem for serialization. However, for now, make of the more standard get_mm_exe_file(), requiring only holding the mmap_sem to read the value, and relying on reference counting to make sure that the exe file won't dissappear underneath us while doing the get dcookie. As a consequence, for 2) we move the mmap_sem locking into where we really need it, in lookup_dcookie(). The benefits are twofold: reduce mmap_sem hold times, and cleaner code. [akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko] Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Robert Richter <rric@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
gcov profiling if enabled with other heavy compile-time instrumentation like KASan could trigger following softlockups: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] Modules linked in: irq event stamp: 22823276 hardirqs last enabled at (22823275): [<ffffffff86e8d10d>] mutex_lock_nested+0x7d9/0x930 hardirqs last disabled at (22823276): [<ffffffff86e9521d>] apic_timer_interrupt+0x6d/0x80 softirqs last enabled at (22823172): [<ffffffff811ed969>] __do_softirq+0x4db/0x729 softirqs last disabled at (22823167): [<ffffffff811edfcf>] irq_exit+0x7d/0x15b CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-05245-gbb33326-dirty #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 task: ffff88006cba8000 ti: ffff88006cbb0000 task.ti: ffff88006cbb0000 RIP: kasan_mem_to_shadow+0x1e/0x1f Call Trace: strcmp+0x28/0x70 get_node_by_name+0x66/0x99 gcov_event+0x4f/0x69e gcov_enable_events+0x54/0x7b gcov_fs_init+0xf8/0x134 do_one_initcall+0x1b2/0x288 kernel_init_freeable+0x467/0x580 kernel_init+0x15/0x18b ret_from_fork+0x7c/0xb0 Kernel panic - not syncing: softlockup: hung tasks Fix this by sticking cond_resched() in gcov_enable_events(). Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Reported-by: NFengguang Wu <fengguang.wu@intel.com> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
When converting unsigned long to int overflows may occur. These currently are not detected when writing to the sysctl file system. E.g. on a system where int has 32 bits and long has 64 bits echo 0x800001234 > /proc/sys/kernel/threads-max has the same effect as echo 0x1234 > /proc/sys/kernel/threads-max The patch adds the missing check in do_proc_dointvec_conv. With the patch an overflow will result in an error EINVAL when writing to the the sysctl file system. Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Oleg cleverly suggested using xchg() to set the new mm->exe_file instead of calling set_mm_exe_file() which requires some form of serialization -- mmap_sem in this case. For archs that do not have atomic rmw instructions we still fallback to a spinlock alternative, so this should always be safe. As such, we only need the mmap_sem for looking up the backing vm_file, which can be done sharing the lock. Naturally, this means we need to manually deal with both the new and old file reference counting, and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can probably be deleted in the future anyway. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Suggested-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
This patch removes mm->mmap_sem from mm->exe_file read side. Also it kills dup_mm_exe_file() and moves exe_file duplication into dup_mmap() where both mmap_sems are locked. [akpm@linux-foundation.org: fix comment typo] Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
Users can change the maximum number of threads by writing to /proc/sys/kernel/threads-max. With the patch the value entered is checked against the same limits that apply when fork_init is called. Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the THREAD_SIZE. E.g. architecture hexagon may have page size 1M and thread size 4096. This would lead to a division by zero in the calculation of max_threads. With 32-bit calculation there is no solution which delivers valid results for all possible combinations of the parameters. The code is only called once. Hence a 64-bit calculation can be used as solution. [akpm@linux-foundation.org: use clamp_t(), per Oleg] Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the THREAD_SIZE. E.g. architecture hexagon may have page size 1M and thread size 4096. This would lead to a division by zero in the calculation of max_threads. With this patch the buggy code is moved to a separate function set_max_threads. The error is not fixed. After fixing the problem in a separate patch the new function can be reused to adjust max_threads after adding or removing memory. Argument mempages of function fork_init() is removed as totalram_pages is an exported symbol. The creation of separate patches for refactoring to a new function and for fixing the logic was suggested by Ingo Molnar. Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jean Delvare 提交于
The comment explaining what value max_threads is set to is outdated. The maximum memory consumption ratio for thread structures was 1/2 until February 2002, then it was briefly changed to 1/16 before being set to 1/8 which we still use today. The comment was never updated to reflect that change, it's about time. Signed-off-by: NJean Delvare <jdelvare@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
copy_process will report any failure in alloc_pid as ENOMEM currently which is misleading because the pid allocation might fail not only when the memory is short but also when the pid space is consumed already. The current man page even mentions this case: : EAGAIN : : A system-imposed limit on the number of threads was encountered. : There are a number of limits that may trigger this error: the : RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which : limits the number of processes and threads for a real user ID, was : reached; the kernel's system-wide limit on the number of processes : and threads, /proc/sys/kernel/threads-max, was reached (see : proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, : was reached (see proc(5)). so the current behavior is also incorrect wrt. documentation. POSIX man page also suggest returing EAGAIN when the process count limit is reached. This patch simply propagates error code from alloc_pid and makes sure we return -EAGAIN due to reservation failure. This will make behavior of fork closer to both our documentation and POSIX. alloc_pid might alsoo fail when the reaper in the pid namespace is dead (the namespace basically disallows all new processes) and there is no good error code which would match documented ones. We have traditionally returned ENOMEM for this case which is misleading as well but as per Eric W. Biederman this behavior is documented in man pid_namespaces(7) : If the "init" process of a PID namespace terminates, the kernel : terminates all of the processes in the namespace via a SIGKILL signal. : This behavior reflects the fact that the "init" process is essential for : the correct operation of a PID namespace. In this case, a subsequent : fork(2) into this PID namespace will fail with the error ENOMEM; it is : not possible to create a new processes in a PID namespace whose "init" : process has terminated. and introducing a new error code would be too risky so let's stick to ENOMEM for this case. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
Sending SI_TKILL from rt_[tg]sigqueueinfo was deprecated, so now we issue a warning on the first attempt of doing it. We use WARN_ON_ONCE, which is not informative and, what is worse, taints the kernel, making the trinity syscall fuzzer complain false-positively from time to time. It does not look like we need this warning at all, because the behaviour changed quite a long time ago (2.6.39), and if an application relies on the old API, it gets EPERM anyway and can issue a warning by itself. So let us zap the warning in kernel. Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Richard Weinberger <richard@nod.at> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
ptrace_detach() re-checks ->ptrace under tasklist lock and calls release_task() if __ptrace_detach() returns true. This was needed because the __TASK_TRACED tracee could be killed/untraced, and it could even pass exit_notify() before we take tasklist_lock. But this is no longer possible after 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL". We can turn these checks into WARN_ON() and remove release_task(). While at it, document the setting of child->exit_code. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Pavel Labath <labath@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
ptrace_resume() is called when the tracee is still __TASK_TRACED. We set tracee->exit_code and then wake_up_state() changes tracee->state. If the tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T) wrongly looks like another report from tracee. This confuses debugger, and since wait_task_stopped() clears ->exit_code the tracee can miss a signal. Test-case: #include <stdio.h> #include <unistd.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <pthread.h> #include <assert.h> int pid; void *waiter(void *arg) { int stat; for (;;) { assert(pid == wait(&stat)); assert(WIFSTOPPED(stat)); if (WSTOPSIG(stat) == SIGHUP) continue; assert(WSTOPSIG(stat) == SIGCONT); printf("ERR! extra/wrong report:%x\n", stat); } } int main(void) { pthread_t thread; pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); for (;;) kill(getpid(), SIGHUP); } assert(pthread_create(&thread, NULL, waiter, NULL) == 0); for (;;) ptrace(PTRACE_CONT, pid, 0, SIGCONT); return 0; } Note for stable: the bug is very old, but without 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix should use lock_task_sighand(child). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NPavel Labath <labath@google.com> Tested-by: NPavel Labath <labath@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
The current smp_function_call code suffers a number of problems, most notably smp_call_function_single_async() is broken. The problem is that flush_smp_call_function_queue() does csd_unlock() _after_ calling csd->func(). This means that a caller cannot properly synchronize the csd usage as it has to. Change the code to release the csd before calling ->func() for the async case, and put a WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK) in smp_call_function_single_async() to warn us of improper serialization, because any waiting there can results in deadlocks when called with IRQs disabled. Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it such that we know what to do in flush_smp_call_function_queue(). Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release() to avoid some full barriers while more clearly providing lock semantics. Finally move the csd maintenance out of generic_exec_single() into its callers for clearer code. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> [ Added changelog. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Rafael David Tinoco <inaddy@ubuntu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CA+55aFz492bzLFhdbKN-Hygjcreup7CjMEYk3nTSfRWjppz-OA@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
During sysrq's show-held-locks command it is possible that hlock_class() returns NULL for a given lock. The result is then (after the warning): |BUG: unable to handle kernel NULL pointer dereference at 0000001c |IP: [<c1088145>] get_usage_chars+0x5/0x100 |Call Trace: | [<c1088263>] print_lock_name+0x23/0x60 | [<c1576b57>] print_lock+0x5d/0x7e | [<c1088314>] lockdep_print_held_locks+0x74/0xe0 | [<c1088652>] debug_show_all_locks+0x132/0x1b0 | [<c1315c48>] sysrq_handle_showlocks+0x8/0x10 This *might* happen because the thread on the other CPU drops the lock after we are looking ->lockdep_depth and ->held_locks points no longer to a lock that is held. The fix here is to simply ignore it and continue. Reported-by: NAndreas Messerschmid <andreas@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexei Starovoitov 提交于
1. first bug is a silly mistake. It broke tracing examples and prevented simple bpf programs from loading. In the following code: if (insn->imm == 0 && BPF_SIZE(insn->code) == BPF_W) { } else if (...) { // this part should have been executed when // insn->code == BPF_W and insn->imm != 0 } Obviously it's not doing that. So simple instructions like: r2 = *(u64 *)(r1 + 8) will be rejected. Note the comments in the code around these branches were and still valid and indicate the true intent. Replace it with: if (BPF_SIZE(insn->code) != BPF_W) continue; if (insn->imm == 0) { } else if (...) { // now this code will be executed when // insn->code == BPF_W and insn->imm != 0 } 2. second bug is more subtle. If malicious code is using the same dest register as source register, the checks designed to prevent the same instruction to be used with different pointer types will fail to trigger, since we were assigning src_reg_type when it was already overwritten by check_mem_access(). The fix is trivial. Just move line: src_reg_type = regs[insn->src_reg].type; before check_mem_access(). Add new 'access skb fields bad4' test to check this case. Fixes: 9bac3d6d ("bpf: allow extended BPF programs access skb fields") Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
Due to missing bounds check the DAG pass of the BPF verifier can corrupt the memory which can cause random crashes during program loading: [8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff [8.451293] IP: [<ffffffff811de33d>] kmem_cache_alloc_trace+0x8d/0x2f0 [8.452329] Oops: 0000 [#1] SMP [8.452329] Call Trace: [8.452329] [<ffffffff8116cc82>] bpf_check+0x852/0x2000 [8.452329] [<ffffffff8116b7e4>] bpf_prog_load+0x1e4/0x310 [8.452329] [<ffffffff811b190f>] ? might_fault+0x5f/0xb0 [8.452329] [<ffffffff8116c206>] SyS_bpf+0x806/0xa30 Fixes: f1bca824 ("bpf: add search pruning optimization to verifier") Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2015 4 次提交
-
-
由 Joe Perches 提交于
The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Miscellanea: o Remove unused return value from trace_lookup_stack Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joel Stanley 提交于
The kernel has orderly_poweroff which allows the kernel to initiate a graceful shutdown of userspace, by running /sbin/poweroff. This adds orderly_reboot that will cause userspace to shut itself down by calling /sbin/reboot. This will be used for shutdown initiated by a system controller on platforms that do not use ACPI. orderly_reboot() should be used when the system wants to allow userspace to gracefully shut itself down. For cases where the system may imminently catch on fire, the existing emergency_restart() provides an immediate reboot without involving userspace. Signed-off-by: NJoel Stanley <joel@jms.id.au> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aaron Tomlin 提交于
In check_hung_uninterruptible_tasks() avoid the use of deprecated while_each_thread(). The "max_count" logic will prevent a livelock - see commit 0c740d0a ("introduce for_each_thread() to replace the buggy while_each_thread()"). Having said this let's use for_each_process_thread(). Signed-off-by: NAaron Tomlin <atomlin@redhat.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Wysochanski <dwysocha@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-