- 28 5月, 2015 9 次提交
-
-
由 Luis R. Rodriguez 提交于
We're directly checking and modifying sig_enforce when needed instead of using the generic helpers. This prevents us from generalizing this helper so that others can use it. Use indirect helpers to allow us to generalize this code a bit and to make it a bit more clear what this is doing. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Luis R. Rodriguez 提交于
Most code already uses consts for the struct kernel_param_ops, sweep the kernel for the last offending stragglers. Other than include/linux/moduleparam.h and kernel/params.c all other changes were generated with the following Coccinelle SmPL patch. Merge conflicts between trees can be handled with Coccinelle. In the future git could get Coccinelle merge support to deal with patch --> fail --> grammar --> Coccinelle --> new patch conflicts automatically for us on patches where the grammar is available and the patch is of high confidence. Consider this a feature request. Test compiled on x86_64 against: * allnoconfig * allmodconfig * allyesconfig @ const_found @ identifier ops; @@ const struct kernel_param_ops ops = { }; @ const_not_found depends on !const_found @ identifier ops; @@ -struct kernel_param_ops ops = { +const struct kernel_param_ops ops = { }; Generated-by: Coccinelle SmPL Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Junio C Hamano <gitster@pobox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: cocci@systeme.lip6.fr Cc: linux-kernel@vger.kernel.org Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
__module_address() does an initial bound check before doing the {list/tree} iteration to find the actual module. The bound variables are nowhere near the mod_tree cacheline, in fact they're nowhere near one another. module_addr_min lives in .data while module_addr_max lives in .bss (smarty pants GCC thinks the explicit 0 assignment is a mistake). Rectify this by moving the two variables into a structure together with the latch_tree_root to guarantee they all share the same cacheline and avoid hitting two extra cachelines for the lookup. While reworking the bounds code, move the bound update from allocation to insertion time, this avoids updating the bounds for a few error paths. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
Use the generic __module_address() addr to struct module lookup instead of open coding it once more. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
Andrew worried about the overhead on small systems; only use the fancy code when either perf or tracing is enabled. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Requested-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
Currently __module_address() is using a linear search through all modules in order to find the module corresponding to the provided address. With a lot of modules this can take a lot of time. One of the users of this is kernel_text_address() which is employed in many stack unwinders; which in turn are used by perf-callchain and ftrace (possibly from NMI context). So by optimizing __module_address() we optimize many stack unwinders which are used by both perf and tracing in performance sensitive code. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
Because with latches there is a strict data dependency on the seq load we can avoid the rmb in favour of a read_barrier_depends. Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
Improve the documentation of the latch technique as used in the current timekeeping code, such that it can be readily employed elsewhere. Borrow from the comments in timekeeping and replace those with a reference to this more generic comment. Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
Currently the RCU usage in module is an inconsistent mess of RCU and RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu() does not imply synchronize_sched(). Most usage sites use preempt_{dis,en}able() which is RCU-sched, but (most of) the modification sites use synchronize_rcu(). With the exception of the module bug list, which actually uses RCU. Convert everything over to RCU-sched. Furthermore add lockdep asserts to all sites, because it's not at all clear to me the required locking is observed, esp. on exported functions. Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 27 5月, 2015 2 次提交
-
-
由 Peter Zijlstra 提交于
As per the module core lockdep annotations in the coming patch: [ 18.034047] ---[ end trace 9294429076a9c673 ]--- [ 18.047760] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 18.059228] ffffffff817d8676 ffff880036683c38 ffffffff8157e98b 0000000000000001 [ 18.067541] 0000000000000000 ffff880036683c78 ffffffff8105fbc7 ffff880036683c68 [ 18.075851] ffffffffa0046b08 0000000000000000 ffffffffa0046d00 ffffffffa0046cc8 [ 18.084173] Call Trace: [ 18.086906] [<ffffffff8157e98b>] dump_stack+0x4f/0x7b [ 18.092649] [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0 [ 18.099361] [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20 [ 18.105880] [<ffffffff810ee502>] __module_address+0x1d2/0x1e0 [ 18.112400] [<ffffffff81161153>] jump_label_module_notify+0x143/0x1e0 [ 18.119710] [<ffffffff810814bf>] notifier_call_chain+0x4f/0x70 [ 18.126326] [<ffffffff8108160e>] __blocking_notifier_call_chain+0x5e/0x90 [ 18.134009] [<ffffffff81081656>] blocking_notifier_call_chain+0x16/0x20 [ 18.141490] [<ffffffff810f0f00>] load_module+0x1b50/0x2660 [ 18.147720] [<ffffffff810f1ade>] SyS_init_module+0xce/0x100 [ 18.154045] [<ffffffff81587429>] system_call_fastpath+0x12/0x17 [ 18.160748] ---[ end trace 9294429076a9c674 ]--- Jump labels is not doing it right; fix this. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jason Baron <jbaron@akamai.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Peter Zijlstra 提交于
Due to the new lockdep checks in the coming patch, we go: [ 9.759380] ------------[ cut here ]------------ [ 9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130() [ 9.759391] Modules linked in: [ 9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65 [ 9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 9.759396] ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001 [ 9.759398] 0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8 [ 9.759400] 0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000 [ 9.759400] Call Trace: [ 9.759407] [<ffffffff8157e98b>] dump_stack+0x4f/0x7b [ 9.759410] [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0 [ 9.759412] [<ffffffff810ec160>] ? section_objs+0x60/0x60 [ 9.759414] [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20 [ 9.759415] [<ffffffff810ed9c1>] each_symbol_section+0x121/0x130 [ 9.759417] [<ffffffff810eda01>] find_symbol+0x31/0x70 [ 9.759420] [<ffffffff810ef5bf>] load_module+0x20f/0x2660 [ 9.759422] [<ffffffff8104ef10>] ? __do_page_fault+0x190/0x4e0 [ 9.759426] [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13 [ 9.759427] [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13 [ 9.759433] [<ffffffff810ae73d>] ? trace_hardirqs_on_caller+0x11d/0x1e0 [ 9.759437] [<ffffffff812fcc0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 9.759439] [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13 [ 9.759441] [<ffffffff810f1ade>] SyS_init_module+0xce/0x100 [ 9.759443] [<ffffffff81587429>] system_call_fastpath+0x12/0x17 [ 9.759445] ---[ end trace 9294429076a9c644 ]--- As per the comment this site should be fine, but lets wrap it in preempt_disable() anyhow to placate lockdep. Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 20 5月, 2015 1 次提交
-
-
由 Michal Hocko 提交于
Commit ab992dc3 ("watchdog: Fix merge 'conflict'") has introduced an obvious deadlock because of a typo. watchdog_proc_mutex should be unlocked on exit. Thanks to Miroslav Benes who was staring at the code with me and noticed this. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Duh-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 5月, 2015 2 次提交
-
-
由 Shaohua Li 提交于
block plug callback could sleep, so we introduce a parameter 'from_schedule' and corresponding drivers can use it to destinguish a schedule plug flush or a plug finish. Unfortunately io_schedule_out still uses blk_flush_plug(). This causes below output (Note, I added a might_sleep() in raid1_unplug to make it trigger faster, but the whole thing doesn't matter if I add might_sleep). In raid1/10, this can cause deadlock. This patch makes io_schedule_out always uses blk_schedule_flush_plug. This should only impact drivers (as far as I know, raid 1/10) which are sensitive to the 'from_schedule' parameter. [ 370.817949] ------------[ cut here ]------------ [ 370.817960] WARNING: CPU: 7 PID: 145 at ../kernel/sched/core.c:7306 __might_sleep+0x7f/0x90() [ 370.817969] do not call blocking ops when !TASK_RUNNING; state=2 set at [<ffffffff81092fcf>] prepare_to_wait+0x2f/0x90 [ 370.817971] Modules linked in: raid1 [ 370.817976] CPU: 7 PID: 145 Comm: kworker/u16:9 Tainted: G W 4.0.0+ #361 [ 370.817977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014 [ 370.817983] Workqueue: writeback bdi_writeback_workfn (flush-9:1) [ 370.817985] ffffffff81cd83be ffff8800ba8cb298 ffffffff819dd7af 0000000000000001 [ 370.817988] ffff8800ba8cb2e8 ffff8800ba8cb2d8 ffffffff81051afc ffff8800ba8cb2c8 [ 370.817990] ffffffffa00061a8 000000000000041e 0000000000000000 ffff8800ba8cba28 [ 370.817993] Call Trace: [ 370.817999] [<ffffffff819dd7af>] dump_stack+0x4f/0x7b [ 370.818002] [<ffffffff81051afc>] warn_slowpath_common+0x8c/0xd0 [ 370.818004] [<ffffffff81051b86>] warn_slowpath_fmt+0x46/0x50 [ 370.818006] [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90 [ 370.818008] [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90 [ 370.818010] [<ffffffff810776ef>] __might_sleep+0x7f/0x90 [ 370.818014] [<ffffffffa0000c03>] raid1_unplug+0xd3/0x170 [raid1] [ 370.818024] [<ffffffff81421d9a>] blk_flush_plug_list+0x8a/0x1e0 [ 370.818028] [<ffffffff819e3550>] ? bit_wait+0x50/0x50 [ 370.818031] [<ffffffff819e21b0>] io_schedule_timeout+0x130/0x140 [ 370.818033] [<ffffffff819e3586>] bit_wait_io+0x36/0x50 [ 370.818034] [<ffffffff819e31b5>] __wait_on_bit+0x65/0x90 [ 370.818041] [<ffffffff8125b67c>] ? ext4_read_block_bitmap_nowait+0xbc/0x630 [ 370.818043] [<ffffffff819e3550>] ? bit_wait+0x50/0x50 [ 370.818045] [<ffffffff819e3302>] out_of_line_wait_on_bit+0x72/0x80 [ 370.818047] [<ffffffff810935e0>] ? autoremove_wake_function+0x40/0x40 [ 370.818050] [<ffffffff811de744>] __wait_on_buffer+0x44/0x50 [ 370.818053] [<ffffffff8125ae80>] ext4_wait_block_bitmap+0xe0/0xf0 [ 370.818058] [<ffffffff812975d6>] ext4_mb_init_cache+0x206/0x790 [ 370.818062] [<ffffffff8114bc6c>] ? lru_cache_add+0x1c/0x50 [ 370.818064] [<ffffffff81297c7e>] ext4_mb_init_group+0x11e/0x200 [ 370.818066] [<ffffffff81298231>] ext4_mb_load_buddy+0x341/0x360 [ 370.818068] [<ffffffff8129a1a3>] ext4_mb_find_by_goal+0x93/0x2f0 [ 370.818070] [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0 [ 370.818072] [<ffffffff8129ab67>] ext4_mb_regular_allocator+0x67/0x460 [ 370.818074] [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0 [ 370.818076] [<ffffffff8129ca4b>] ext4_mb_new_blocks+0x4cb/0x620 [ 370.818079] [<ffffffff81290956>] ext4_ext_map_blocks+0x4c6/0x14d0 [ 370.818081] [<ffffffff812a4d4e>] ? ext4_es_lookup_extent+0x4e/0x290 [ 370.818085] [<ffffffff8126399d>] ext4_map_blocks+0x14d/0x4f0 [ 370.818088] [<ffffffff81266fbd>] ext4_writepages+0x76d/0xe50 [ 370.818094] [<ffffffff81149691>] do_writepages+0x21/0x50 [ 370.818097] [<ffffffff811d5c00>] __writeback_single_inode+0x60/0x490 [ 370.818099] [<ffffffff811d630a>] writeback_sb_inodes+0x2da/0x590 [ 370.818103] [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50 [ 370.818105] [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50 [ 370.818107] [<ffffffff811d665f>] __writeback_inodes_wb+0x9f/0xd0 [ 370.818109] [<ffffffff811d69db>] wb_writeback+0x34b/0x3c0 [ 370.818111] [<ffffffff811d70df>] bdi_writeback_workfn+0x23f/0x550 [ 370.818116] [<ffffffff8106bbd8>] process_one_work+0x1c8/0x570 [ 370.818117] [<ffffffff8106bb5b>] ? process_one_work+0x14b/0x570 [ 370.818119] [<ffffffff8106c09b>] worker_thread+0x11b/0x470 [ 370.818121] [<ffffffff8106bf80>] ? process_one_work+0x570/0x570 [ 370.818124] [<ffffffff81071868>] kthread+0xf8/0x110 [ 370.818126] [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210 [ 370.818129] [<ffffffff819e9322>] ret_from_fork+0x42/0x70 [ 370.818131] [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210 [ 370.818132] ---[ end trace 7b4deb71e68b6605 ]--- V2: don't change ->in_iowait Cc: NeilBrown <neilb@suse.de> Signed-off-by: NShaohua Li <shli@fb.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Peter Zijlstra 提交于
Two watchdog changes that came through different trees had a non conflicting conflict, that is, one changed the semantics of a variable but no actual code conflict happened. So the merge appeared fine, but the resulting code did not behave as expected. Commit 195daf66 ("watchdog: enable the new user interface of the watchdog mechanism") changes the semantics of watchdog_user_enabled, which thereafter is only used by the functions introduced by b3738d29 ("watchdog: Add watchdog enable/disable all functions"). There further appears to be a distinct lack of serialization between setting and using watchdog_enabled, so perhaps we should wrap the {en,dis}able_all() things in watchdog_proc_mutex. This patch fixes a s2r failure reported by Michal; which I cannot readily explain. But this does make the code internally consistent again. Reported-and-tested-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 5月, 2015 1 次提交
-
-
由 John Stultz 提交于
It was noted that the 32bit implementation of ktime_divns() was doing unsigned division and didn't properly handle negative values. And when a ktime helper was changed to utilize ktime_divns, it caused a regression on some IR blasters. See the following bugzilla for details: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 This patch fixes the problem in ktime_divns by checking and preserving the sign bit, and then reapplying it if appropriate after the division, it also changes the return type to a s64 to make it more obvious this is expected. Nicolas also pointed out that negative dividers would cause infinite loops on 32bit systems, negative dividers is unlikely for users of this function, but out of caution this patch adds checks for negative dividers for both 32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure no such use cases creep in. [ tglx: Hand an u64 to do_div() to avoid the compiler warning ] Fixes: 166afb64 'ktime: Sanitize ktime_to_us/ms conversion' Reported-and-tested-by: NTrevor Cordes <trevor@tecnopolis.ca> Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Acked-by: NNicolas Pitre <nicolas.pitre@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 08 5月, 2015 3 次提交
-
-
由 Peter Zijlstra 提交于
While fuzzing Sasha tripped over another ctx->mutex recursion lockdep splat. Annotate this. Reported-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Omar Sandoval 提交于
Commit 3c18d447 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that caused a regression in setting a CPU inactive during suspend. I ran into this when a program was failing pthread_setaffinity_np() with EINVAL after a suspend+wake up. A simple reproducer: $ ./a.out sched_setaffinity: Success $ systemctl suspend $ ./a.out sched_setaffinity: Invalid argument ... where ./a.out is: #define _GNU_SOURCE #include <errno.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> int main(void) { long num_cores; cpu_set_t cpu_set; int ret; num_cores = sysconf(_SC_NPROCESSORS_ONLN); CPU_ZERO(&cpu_set); CPU_SET(num_cores - 1, &cpu_set); errno = 0; ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set); perror("sched_setaffinity"); return ret ? EXIT_FAILURE : EXIT_SUCCESS; } The mistake is that suspend is handled in the action == CPU_DOWN_PREPARE_FROZEN case of the switch statement in cpuset_cpu_inactive(). However, the commit in question masked out CPU_TASKS_FROZEN from the action, making this case dead. The fix is straightforward. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3c18d447 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()") Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Ronny reported that the following scenario is not handled correctly: T1 (prio = 10) lock(rtmutex); T2 (prio = 20) lock(rtmutex) boost T1 T1 (prio = 20) sys_set_scheduler(prio = 30) T1 prio = 30 .... sys_set_scheduler(prio = 10) T1 prio = 30 The last step is wrong as T1 should now be back at prio 20. Commit c365c292 ("sched: Consider pi boosting in setscheduler()") only handles the case where a boosted tasks tries to lower its priority. Fix it by taking the new effective priority into account for the decision whether a change of the priority is required. Reported-by: NRonny Meeus <ronny.meeus@gmail.com> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: c365c292 ("sched: Consider pi boosting in setscheduler()") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 5月, 2015 1 次提交
-
-
由 Alex Bennée 提交于
The only caller to this function (__print_array) was getting it wrong by passing the array length instead of buffer length. As the element size was already being passed for other reasons it seems reasonable to push the calculation of buffer length into the function. Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.orgSigned-off-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 01 5月, 2015 1 次提交
-
-
由 David Howells 提交于
Change default key details to be more obviously unspecified. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <james.l.morris@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 4月, 2015 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Commit 335f4919 (sched/idle: Use explicit broadcast oneshot control function) replaced clockevents_notify() invocations in cpuidle_idle_call() with direct calls to tick_broadcast_enter() and tick_broadcast_exit(), but it overlooked the fact that interrupts were already enabled before calling the latter which led to functional breakage on systems using idle states with the CPUIDLE_FLAG_TIMER_STOP flag set. Fix that by moving the invocations of tick_broadcast_enter() and tick_broadcast_exit() down into cpuidle_enter_state() where interrupts are still disabled when tick_broadcast_exit() is called. Also ensure that interrupts will be disabled before running tick_broadcast_exit() even if they have been enabled by the idle state's ->enter callback. Trigger a WARN_ON_ONCE() in that case, as we generally don't want that to happen for states with CPUIDLE_FLAG_TIMER_STOP set. Fixes: 335f4919 (sched/idle: Use explicit broadcast oneshot control function) Reported-and-tested-by: NLinus Walleij <linus.walleij@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 28 4月, 2015 1 次提交
-
-
由 Alexei Starovoitov 提交于
ALU64_DIV instruction should be dividing 64-bit by 64-bit, whereas do_div() does 64-bit by 32-bit divide. x64 and arm64 JITs correctly implement 64 by 64 unsigned divide. llvm BPF backend emits code assuming that ALU64_DIV does 64 by 64. Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets") Reported-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 4月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
This reverts commits 0a4e6be9 and 80f7fdb1. The task migration notifier was originally introduced in order to support the pvclock vsyscall with non-synchronized TSC, but KVM only supports it with synchronized TSC. Hence, on KVM the race condition is only needed due to a bad implementation on the host side, and even then it's so rare that it's mostly theoretical. As far as KVM is concerned it's possible to fix the host, avoiding the additional complexity in the vDSO and the (re)introduction of the task migration notifier. Xen, on the other hand, hasn't yet implemented vsyscall support at all, so we do not care about its plans for non-synchronized TSC. Reported-by: NPeter Zijlstra <peterz@infradead.org> Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 4月, 2015 2 次提交
-
-
由 Viresh Kumar 提交于
A clockevent device is marked DETACHED when it is replaced by another clockevent device. The device is shutdown properly for drivers that implement legacy ->set_mode() callback, as we call ->set_mode() for CLOCK_EVT_MODE_UNUSED as well. But for the new per-state callback interface, we skip shutting down the device, as we thought its an internal state change. That wasn't correct. The effect is that the device is left programmed in oneshot or periodic mode. Fall-back to 'case CLOCK_EVT_STATE_SHUTDOWN', to shutdown the device. Fixes: bd624d75 "clockevents: Introduce mode specific callbacks" Reported-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/eef0a91c51b74d4e52c8e5a95eca27b5a0563f07.1428650683.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Roger Quadros 提交于
Without this system suspend is broken on systems that have drivers calling enable/disable_irq_wake() for interrupts based off the dummy irq hook. (e.g. drivers/gpio/gpio-pcf857x.c) Signed-off-by: NRoger Quadros <rogerq@ti.com> Cc: <cw00.choi@samsung.com> Cc: <balbi@ti.com> Cc: <tony@atomide.com> Cc: Gregory Clement <gregory.clement@free-electrons.com> Link: http://lkml.kernel.org/r/552E1DD3.4040106@ti.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 4月, 2015 1 次提交
-
-
由 Martin Schwidefsky 提交于
Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code to override the gfp flags of the allocation for the kexec control page. The loop in kimage_alloc_normal_control_pages allocates pages with GFP_KERNEL until a page is found that happens to have an address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT the loop will keep allocating memory until the oom killer steps in. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 20 4月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
Commit 8053871d ("smp: Fix smp_call_function_single_async() locking") fixed the locking for the asynchronous smp-call case, but in the process of moving the lock handling around, one of the error cases ended up not unlocking the call data at all. This went unnoticed on x86, because this is a "caller is buggy" case, where the caller is trying to call a non-existent CPU. But apparently ARM does that (at least under qemu-arm). Bindly doing cross-cpu calls to random CPU's that aren't even online seems a bit fishy, but the error handling was clearly not correct. Simply add the missing "csd_unlock()" to the error path. Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net> Analyzed-by: NRabin Vincent <rabin@rab.in> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 4月, 2015 13 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The code that replaces the enum names with the enum values in the tracepoints' format files could possible miss the end of string nul character. This was caused by processing things like backslashes, quotes and other tokens. After processing the tokens, a check for the nul character needed to be done before continuing the loop, because the loop incremented the pointer before doing the check, which could bypass the nul character. Link: http://lkml.kernel.org/r/552E661D.5060502@oracle.com Reported-by: Sasha Levin <sasha.levin@oracle.com> # via KASan Tested-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Fixes: 0c564a53 "tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values" Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Davidlohr Bueso 提交于
sync_buffer() needs the mmap_sem for two distinct operations, both only occurring upon user context switch handling: 1) Dealing with the exe_file. 2) Adding the dcookie data as we need to lookup the vma that backs it. This is done via add_sample() and add_data(). This patch isolates 1), for it will no longer need the mmap_sem for serialization. However, for now, make of the more standard get_mm_exe_file(), requiring only holding the mmap_sem to read the value, and relying on reference counting to make sure that the exe file won't dissappear underneath us while doing the get dcookie. As a consequence, for 2) we move the mmap_sem locking into where we really need it, in lookup_dcookie(). The benefits are twofold: reduce mmap_sem hold times, and cleaner code. [akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko] Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Robert Richter <rric@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
gcov profiling if enabled with other heavy compile-time instrumentation like KASan could trigger following softlockups: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] Modules linked in: irq event stamp: 22823276 hardirqs last enabled at (22823275): [<ffffffff86e8d10d>] mutex_lock_nested+0x7d9/0x930 hardirqs last disabled at (22823276): [<ffffffff86e9521d>] apic_timer_interrupt+0x6d/0x80 softirqs last enabled at (22823172): [<ffffffff811ed969>] __do_softirq+0x4db/0x729 softirqs last disabled at (22823167): [<ffffffff811edfcf>] irq_exit+0x7d/0x15b CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-05245-gbb33326-dirty #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 task: ffff88006cba8000 ti: ffff88006cbb0000 task.ti: ffff88006cbb0000 RIP: kasan_mem_to_shadow+0x1e/0x1f Call Trace: strcmp+0x28/0x70 get_node_by_name+0x66/0x99 gcov_event+0x4f/0x69e gcov_enable_events+0x54/0x7b gcov_fs_init+0xf8/0x134 do_one_initcall+0x1b2/0x288 kernel_init_freeable+0x467/0x580 kernel_init+0x15/0x18b ret_from_fork+0x7c/0xb0 Kernel panic - not syncing: softlockup: hung tasks Fix this by sticking cond_resched() in gcov_enable_events(). Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Reported-by: NFengguang Wu <fengguang.wu@intel.com> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
When converting unsigned long to int overflows may occur. These currently are not detected when writing to the sysctl file system. E.g. on a system where int has 32 bits and long has 64 bits echo 0x800001234 > /proc/sys/kernel/threads-max has the same effect as echo 0x1234 > /proc/sys/kernel/threads-max The patch adds the missing check in do_proc_dointvec_conv. With the patch an overflow will result in an error EINVAL when writing to the the sysctl file system. Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Oleg cleverly suggested using xchg() to set the new mm->exe_file instead of calling set_mm_exe_file() which requires some form of serialization -- mmap_sem in this case. For archs that do not have atomic rmw instructions we still fallback to a spinlock alternative, so this should always be safe. As such, we only need the mmap_sem for looking up the backing vm_file, which can be done sharing the lock. Naturally, this means we need to manually deal with both the new and old file reference counting, and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can probably be deleted in the future anyway. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Suggested-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
This patch removes mm->mmap_sem from mm->exe_file read side. Also it kills dup_mm_exe_file() and moves exe_file duplication into dup_mmap() where both mmap_sems are locked. [akpm@linux-foundation.org: fix comment typo] Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
Users can change the maximum number of threads by writing to /proc/sys/kernel/threads-max. With the patch the value entered is checked against the same limits that apply when fork_init is called. Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the THREAD_SIZE. E.g. architecture hexagon may have page size 1M and thread size 4096. This would lead to a division by zero in the calculation of max_threads. With 32-bit calculation there is no solution which delivers valid results for all possible combinations of the parameters. The code is only called once. Hence a 64-bit calculation can be used as solution. [akpm@linux-foundation.org: use clamp_t(), per Oleg] Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heinrich Schuchardt 提交于
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the THREAD_SIZE. E.g. architecture hexagon may have page size 1M and thread size 4096. This would lead to a division by zero in the calculation of max_threads. With this patch the buggy code is moved to a separate function set_max_threads. The error is not fixed. After fixing the problem in a separate patch the new function can be reused to adjust max_threads after adding or removing memory. Argument mempages of function fork_init() is removed as totalram_pages is an exported symbol. The creation of separate patches for refactoring to a new function and for fixing the logic was suggested by Ingo Molnar. Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jean Delvare 提交于
The comment explaining what value max_threads is set to is outdated. The maximum memory consumption ratio for thread structures was 1/2 until February 2002, then it was briefly changed to 1/16 before being set to 1/8 which we still use today. The comment was never updated to reflect that change, it's about time. Signed-off-by: NJean Delvare <jdelvare@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
copy_process will report any failure in alloc_pid as ENOMEM currently which is misleading because the pid allocation might fail not only when the memory is short but also when the pid space is consumed already. The current man page even mentions this case: : EAGAIN : : A system-imposed limit on the number of threads was encountered. : There are a number of limits that may trigger this error: the : RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which : limits the number of processes and threads for a real user ID, was : reached; the kernel's system-wide limit on the number of processes : and threads, /proc/sys/kernel/threads-max, was reached (see : proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, : was reached (see proc(5)). so the current behavior is also incorrect wrt. documentation. POSIX man page also suggest returing EAGAIN when the process count limit is reached. This patch simply propagates error code from alloc_pid and makes sure we return -EAGAIN due to reservation failure. This will make behavior of fork closer to both our documentation and POSIX. alloc_pid might alsoo fail when the reaper in the pid namespace is dead (the namespace basically disallows all new processes) and there is no good error code which would match documented ones. We have traditionally returned ENOMEM for this case which is misleading as well but as per Eric W. Biederman this behavior is documented in man pid_namespaces(7) : If the "init" process of a PID namespace terminates, the kernel : terminates all of the processes in the namespace via a SIGKILL signal. : This behavior reflects the fact that the "init" process is essential for : the correct operation of a PID namespace. In this case, a subsequent : fork(2) into this PID namespace will fail with the error ENOMEM; it is : not possible to create a new processes in a PID namespace whose "init" : process has terminated. and introducing a new error code would be too risky so let's stick to ENOMEM for this case. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
Sending SI_TKILL from rt_[tg]sigqueueinfo was deprecated, so now we issue a warning on the first attempt of doing it. We use WARN_ON_ONCE, which is not informative and, what is worse, taints the kernel, making the trinity syscall fuzzer complain false-positively from time to time. It does not look like we need this warning at all, because the behaviour changed quite a long time ago (2.6.39), and if an application relies on the old API, it gets EPERM anyway and can issue a warning by itself. So let us zap the warning in kernel. Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Richard Weinberger <richard@nod.at> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
ptrace_detach() re-checks ->ptrace under tasklist lock and calls release_task() if __ptrace_detach() returns true. This was needed because the __TASK_TRACED tracee could be killed/untraced, and it could even pass exit_notify() before we take tasklist_lock. But this is no longer possible after 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL". We can turn these checks into WARN_ON() and remove release_task(). While at it, document the setting of child->exit_code. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Pavel Labath <labath@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-