- 29 11月, 2013 2 次提交
-
-
由 Thomas Gleixner 提交于
If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ. If CONFIG_NO_HZ=y and the nohz functionality is disabled via the command line option "nohz=off" or not enabled due to missing hardware support, then tick_nohz_get_sleep_length() returns 0. That happens because ts->sleep_length is never set in that case. Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive. Reported-by: NMichal Hocko <mhocko@suse.cz> Reported-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Helge Deller 提交于
The init_kernel_text() and core_kernel_text() functions should not include the labels _einittext and _etext when checking if an address is inside the .text or .init sections. Signed-off-by: NHelge Deller <deller@gmx.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 11月, 2013 2 次提交
-
-
由 Tejun Heo 提交于
If a cgroup file implements either read_map() or read_seq_string(), such file is served using seq_file by overriding file->f_op to cgroup_seqfile_operations, which also overrides the release method to single_release() from cgroup_file_release(). Because cgroup_file_open() didn't use to acquire any resources, this used to be fine, but since f7d58818 ("cgroup: pin cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open() pins the css (cgroup_subsys_state) which is put by cgroup_file_release(). The patch forgot to update the release path for seq_files and each open/release cycle leaks a css reference. Fix it by updating cgroup_file_release() to also handle seq_files and using it for seq_file release path too. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v3.12
-
由 Peter Zijlstra 提交于
Juri hit the below lockdep report: [ 4.303391] ====================================================== [ 4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 4.303394] 3.12.0-dl-peterz+ #144 Not tainted [ 4.303395] ------------------------------------------------------ [ 4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 4.303399] (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290 [ 4.303417] [ 4.303417] and this task is already holding: [ 4.303418] (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100 [ 4.303431] which would create a new lock dependency: [ 4.303432] (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...} [ 4.303436] [ 4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: [ 4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 { [ 4.303922] HARDIRQ-ON-W at: [ 4.303923] [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0 [ 4.303926] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303929] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303931] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303933] SOFTIRQ-ON-W at: [ 4.303933] [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0 [ 4.303935] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303940] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303955] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303959] INITIAL USE at: [ 4.303960] [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0 [ 4.303963] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303966] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303969] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303972] } Which reports that we take mems_allowed_seq with interrupts enabled. A little digging found that this can only be from cpuset_change_task_nodemask(). This is an actual deadlock because an interrupt doing an allocation will hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin forever waiting for the write side to complete. Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Reported-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Tested-by: NJuri Lelli <juri.lelli@gmail.com> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
-
- 27 11月, 2013 1 次提交
-
-
由 Thomas Gleixner 提交于
Tony reported that aa0d5326 ("ia64: Use preempt_schedule_irq") broke PREEMPT=n builds on ia64. Ok, wrapped my brain around it. I tripped over the magic asm foo which has a single need_resched check and schedule point for both sys call return and interrupt return. So you need the schedule_preempt_irq() for kernel preemption from interrupt return while on a normal syscall preemption a schedule would be sufficient. But using schedule_preempt_irq() is not harmful here in any way. It just sets the preempt_active bit also in cases where it would not be required. Even on preempt=n kernels adding the preempt_active bit is completely harmless. So instead of having an extra function, moving the existing one out of the ifdef PREEMPT looks like the sanest thing to do. It would also allow getting rid of various other sti/schedule/cli asm magic in other archs. Reported-and-Tested-by: NTony Luck <tony.luck@gmail.com> Fixes: aa0d5326 ("ia64: Use preempt_schedule_irq") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> [slightly edited Changelog] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311211230030.30673@ionos.tec.linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 11月, 2013 2 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Commit 8c4f3c3f "ftrace: Check module functions being traced on reload" fixed module loading and unloading with respect to function tracing, but it missed the function graph tracer. If you perform the following # cd /sys/kernel/debug/tracing # echo function_graph > current_tracer # modprobe nfsd # echo nop > current_tracer You'll get the following oops message: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9() Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000 0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668 ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000 Call Trace: [<ffffffff814fe193>] dump_stack+0x4f/0x7c [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364 [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78 [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10 [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e [<ffffffff81122aed>] vfs_write+0xab/0xf6 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff81122dbd>] SyS_write+0x59/0x82 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b ---[ end trace 940358030751eafb ]--- The above mentioned commit didn't go far enough. Well, it covered the function tracer by adding checks in __register_ftrace_function(). The problem is that the function graph tracer circumvents that (for a slight efficiency gain when function graph trace is running with a function tracer. The gain was not worth this). The problem came with ftrace_startup() which should always be called after __register_ftrace_function(), if you want this bug to be completely fixed. Anyway, this solution moves __register_ftrace_function() inside of ftrace_startup() and removes the need to call them both. Reported-by: NDave Wysochanski <dwysocha@redhat.com> Fixes: ed926f9b ("ftrace: Use counters to enable functions to trace") Cc: stable@vger.kernel.org # 3.0+ Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Laxman Dewangan 提交于
When the system enters suspend, it disables all interrupts in suspend_device_irqs(), including the interrupts marked EARLY_RESUME. On the resume side things are different. The EARLY_RESUME interrupts are reenabled in sys_core_ops->resume and the non EARLY_RESUME interrupts are reenabled in the normal system resume path. When suspend_noirq() failed or suspend is aborted for any other reason, we might omit the resume side call to sys_core_ops->resume() and therefor the interrupts marked EARLY_RESUME are not reenabled and stay disabled forever. To solve this, enable all irqs unconditionally in irq_resume() regardless whether interrupts marked EARLY_RESUMEhave been already enabled or not. This might try to reenable already enabled interrupts in the non failure case, but the only affected platform is XEN and it has been confirmed that it does not cause any side effects. [ tglx: Massaged changelog. ] Signed-off-by: NLaxman Dewangan <ldewangan@nvidia.com> Acked-by-and-tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NHeiko Stuebner <heiko@sntech.de> Reviewed-by: NPavel Machek <pavel@ucw.cz> Cc: <ian.campbell@citrix.com> Cc: <rjw@rjwysocki.net> Cc: <len.brown@intel.com> Cc: <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1385388587-16442-1-git-send-email-ldewangan@nvidia.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 11月, 2013 6 次提交
-
-
由 Li Bin 提交于
When one work starts execution, the high bits of work's data contain pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID is assigned WORK_OFFQ_POOL_NONE when the work being initialized indicating that no pool is associated and get_work_pool() uses it to check the associated pool. So if worker_pool_assign_id() assigns a ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers leakage, and it may break the non-reentrance guarantee. This patch fix this issue by modifying the worker_pool_assign_id() function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE. Furthermore, in the current implementation, the BUILD_BUG_ON() in init_workqueues makes no sense. The number of worker pools needed cannot be determined at compile time, because the number of backing pools for UNBOUND workqueues is dynamic based on the assigned custom attributes. So remove it. tj: Minor comment and indentation updates. Signed-off-by: NLi Bin <huawei.libin@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Li Bin 提交于
It seems the "dying" should be "draining" here. Signed-off-by: NLi Bin <huawei.libin@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
An ordered workqueue implements execution ordering by using single pool_workqueue with max_active == 1. On a given pool_workqueue, work items are processed in FIFO order and limiting max_active to 1 enforces the queued work items to be processed one by one. Unfortunately, 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues") accidentally broke this guarantee by applying NUMA affinity to ordered workqueues too. On NUMA setups, an ordered workqueue would end up with separate pool_workqueues for different nodes. Each pool_workqueue still limits max_active to 1 but multiple work items may be executed concurrently and out of order depending on which node they are queued to. Fix it by using dedicated ordered_wq_attrs[] when creating ordered workqueues. The new attrs match the unbound ones except that no_numa is always set thus forcing all NUMA nodes to share the default pool_workqueue. While at it, add sanity check in workqueue creation path which verifies that an ordered workqueues has only the default pool_workqueue. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NLibin <huawei.libin@huawei.com> Cc: stable@vger.kernel.org Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
-
由 Oleg Nesterov 提交于
Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed() in create_worker(). Otherwise userland can change ->cpus_allowed in between. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Since be445626 ("cgroup: remove synchronize_rcu() from cgroup_diput()"), cgroup destruction path makes use of workqueue. css freeing is performed from a work item from that point on and a later commit, ea15f8cc ("cgroup: split cgroup destruction into two steps"), moves css offlining to workqueue too. As cgroup destruction isn't depended upon for memory reclaim, the destruction work items were put on the system_wq; unfortunately, some controller may block in the destruction path for considerable duration while holding cgroup_mutex. As large part of destruction path is synchronized through cgroup_mutex, when combined with high rate of cgroup removals, this has potential to fill up system_wq's max_active of 256. Also, it turns out that memcg's css destruction path ends up queueing and waiting for work items on system_wq through work_on_cpu(). If such operation happens while system_wq is fully occupied by cgroup destruction work items, work_on_cpu() can't make forward progress because system_wq is full and other destruction work items on system_wq can't make forward progress because the work item waiting for work_on_cpu() is holding cgroup_mutex, leading to deadlock. This can be fixed by queueing destruction work items on a separate workqueue. This patch creates a dedicated workqueue - cgroup_destroy_wq - for this purpose. As these work items shouldn't have inter-dependencies and mostly serialized by cgroup_mutex anyway, giving high concurrency level doesn't buy anything and the workqueue's @max_active is set to 1 so that destruction work items are executed one by one on each CPU. Hugh Dickins: Because cgroup_init() is run before init_workqueues(), cgroup_destroy_wq can't be allocated from cgroup_init(). Do it from a separate core_initcall(). In the future, we probably want to reorder so that workqueue init happens before cgroup_init(). Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NHugh Dickins <hughd@google.com> Reported-by: NShawn Bohrer <shawn.bohrer@gmail.com> Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils Cc: stable@vger.kernel.org # v3.9+
-
由 Martin Schwidefsky 提交于
Since commit 1e75fa8b (time: Condense timekeeper.xtime into xtime_sec - merged in v3.6), there has been an problem with the error accounting in the timekeeping code, such that when truncating to nanoseconds, we round up to the next nsec, but the balancing adjustment to the ntp_error value was dropped. This causes 1ns per tick drift forward of the clock. In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD architectures (s390, ia64, powerpc). The fix is simply to balance the accounting and to subtract the added nanosecond from ntp_error. This allows the internal long-term clock steering to keep the clock accurate. While this fix removes the regression added in 1e75fa8b, the ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD and use the new VSYSCALL method, which avoids entirely the nanosecond granular rounding, and the resulting short-term clock adjustment oscillation needed to keep long term accurate time. [ jstultz: Many thanks to Martin for his efforts identifying this subtle bug, and providing the fix. ] Originally-from: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable <stable@vger.kernel.org> #v3.6+ Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 20 11月, 2013 5 次提交
-
-
由 Kirill A. Shutemov 提交于
<linux/spinlock.h> has heavy dependencies on other header files. It triggers circular dependencies in generated headers on IA64, at least: CC kernel/bounds.s In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0, from include/linux/thread_info.h:54, from include/asm-generic/preempt.h:4, from arch/ia64/include/generated/asm/preempt.h:1, from include/linux/preempt.h:18, from include/linux/spinlock.h:50, from kernel/bounds.c:14: /home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory compilation terminated. Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's enough to find out size of spinlock_t. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-and-Tested-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Johannes Berg 提交于
As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shigeru Yoshida 提交于
Fix a trivial typo in rq_attach_root(). Signed-off-by: NShigeru Yoshida <shigeru.yoshida@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131117.121236.1990617639803941055.shigeru.yoshida@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Commit 37dc6b50 ("sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some conditions leading to a possible NULL deref in set_cpu_sd_state_idle(). Reported-by: NAnton Blanchard <anton@samba.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
After commit 863bffc8 ("sched/fair: Fix group power_orig computation"), we can dereference rq->sd before it is set. Fix this by falling back to power_of() in this case and add a comment explaining things. Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> [ Added comment and tweaked patch. ] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: mikey@neuling.org Link: http://lkml.kernel.org/r/20131113151718.GN21461@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 11月, 2013 6 次提交
-
-
由 Vince Weaver 提交于
The 64-bit attr.config value for perf trace events was being copied into an "int" before doing a comparison, meaning the top 32 bits were being truncated. As far as I can tell this didn't cause any errors, but it did mean it was possible to create valid aliases for all the tracepoint ids which I don't think was intended. (For example, 0xffffffff00000018 and 0x18 both enable the same tracepoint). Signed-off-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1311151236100.11932@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Currently we only allocate a single cpu hashtable for per-cpu swevents; do away with this optimization for it is fragile in the face of things like perf_pmu_migrate_context(). The easiest thing is to make sure all CPUs are consistent wrt state. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130913111447.GN31370@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Vince's perf-trinity fuzzer found yet another 'interesting' problem. When we sample the irq_work_exit tracepoint with period==1 (or PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an infinite event generation loop: ,-> <IPI> | irq_work_exit() -> | trace_irq_work_exit() -> | ... | __perf_event_overflow() -> (due to fasync) | irq_work_queue() -> (irq_work_list must be empty) '--------- arch_irq_work_raise() Similar things can happen due to regular poll() wakeups if we exceed the ring-buffer wakeup watermark, or have an event_limit. To avoid this, dis-allow sampling this particular tracepoint. In order to achieve this, create a special perf_perm function pointer for each event and call this (when set) on trying to create a tracepoint perf event. [ roasted: use expr... to allow for ',' in your expression ] Reported-by: NVince Weaver <vincent.weaver@maine.edu> Tested-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jones <davej@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andrew Morton 提交于
Taken straight from a tglx email ;) Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Joe Perches 提交于
Use the helper function instead of __GFP_ZERO. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the "fine" grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-and-tested-by: NMatthew Whitehead <tedheadster@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: john.stultz@linaro.org Cc: mwhitehe@redhat.com Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.de
-
- 16 11月, 2013 1 次提交
-
-
由 Al Viro 提交于
Rename simple_delete_dentry() to always_delete_dentry() and export it. Export simple_dentry_operations, while we are at it, and get rid of their duplicates Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 11月, 2013 10 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
We've switched over every architecture that supports SMP to it, so remove the new useless config variable. Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
This commit was incomplete in that code to remove items from the per-cpu lists was missing and never acquired a user in the 5 years it has been in the tree. We're going to implement what it seems to try to archive in a simpler way, and this code is in the way of doing so. Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Use kernel/bounds.c to convert build-time spinlock_t size check into a preprocessor symbol and apply that to properly separate the page::ptl situation. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
The basic idea is the same as with PTE level: the lock is embedded into struct page of table's page. We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't take mm->page_table_lock anymore. Let's reuse page->lru of table's page for that. pgtable_pmd_page_ctor() returns true, if initialization is successful and false otherwise. Current implementation never fails, but assumption that constructor can fail will help to port it to -rt where spinlock_t is rather huge and cannot be embedded into struct page -- dynamic allocation is required. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NAlex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
With split page table lock for PMD level we can't hold mm->page_table_lock while updating nr_ptes. Let's convert it to atomic_long_t to avoid races. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NAlex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rafael J. Wysocki 提交于
I have received a report about the BUG_ON() in free_basic_memory_bitmaps() triggering mysteriously during an aborted s2disk hibernation attempt. The only way I can explain that is that /dev/snapshot was first opened for writing (resume mode), then closed and then opened again for reading and closed again without freezing tasks. In that case the first invocation of snapshot_open() would set the free_bitmaps flag in snapshot_state, which is a static variable. That flag wouldn't be cleared later and the second invocation of snapshot_open() would just leave it like that, so the subsequent snapshot_release() would see data->frozen set and free_basic_memory_bitmaps() would be called unnecessarily. To prevent that from happening clear data->free_bitmaps in snapshot_open() when the file is being opened for reading (hibernate mode). In addition to that, replace the BUG_ON() in free_basic_memory_bitmaps() with a WARN_ON() as the kernel can continue just fine if the condition checked by that macro occurs. Fixes: aab17289 (PM / hibernate: Fix user space driven resume regression) Reported-by: NOliver Lorenz <olli@olorenz.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
-
由 Johannes Berg 提交于
Now that genl_ops are no longer modified in place when registering, they can be made const. This patch was done mostly with spatch: @@ identifier ops; @@ +const struct genl_ops ops[] = { ... }; (except the struct thing in net/openvswitch/datapath.c) Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johannes Berg 提交于
This simplifies the code since there's no longer a need to have error handling in the registration. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 11月, 2013 5 次提交
-
-
由 Thomas Gleixner 提交于
On a 68k platform a couple of interrupts are demultiplexed and "polled" from a top level interrupt. Unfortunately there is no way to determine which of the sub interrupts raised the top level interrupt, so all of the demultiplexed interrupt handlers need to be invoked. Given a high enough frequency this can trigger the spurious interrupt detection mechanism, if one of the demultiplex interrupts returns IRQ_NONE continuously. But this is a false positive as the polling causes this behaviour and not buggy hardware/software. Introduce IRQ_POLLED which can be set at interrupt chip setup time via irq_set_status_flags(). The flag excludes the interrupt from the spurious detector and from all core polling activities. Reported-and-tested-by: NMichael Schmitz <schmitzmic@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-m68k@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311061149250.23353@ionos.tec.linutronix.de
-
由 Fengguang Wu 提交于
There are new Sparse warnings: >> kernel/locking/lockdep.c:1235:15: sparse: symbol '__lockdep_count_forward_deps' was not declared. Should it be static? >> kernel/locking/lockdep.c:1261:15: sparse: symbol '__lockdep_count_backward_deps' was not declared. Should it be static? Please consider folding the attached diff :-) Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/527d1787.ThzXGoUspZWehFDl\%fengguang.wu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Michal Nazarewicz 提交于
sa->runnable_avg_sum is of type u32 but after shifting it by NICE_0_SHIFT bits it is promoted to u64. This of course makes no sense, since the result will never be more then 32-bit long. Casting sa->runnable_avg_sum to u64 before it is shifted, fixes this problem. Reviewed-by: NBen Segall <bsegall@google.com> Signed-off-by: NMichal Nazarewicz <mina86@mina86.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1384112521-25177-1-git-send-email-mpn@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Large multi-threaded apps like to hit this using do_sys_times() and then queue up on the rq->lock. Avoid when possible. Larry reported ~20% performance increase his test case. Reported-by: NLarry Woodman <lwoodman@redhat.com> Suggested-by: NPaul Turner <pjt@google.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131111172925.GG26898@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Because we're completely unserialized against hotplug its well possible to try and generate numa stats for an offlined node. Bail out early (and avoid a /0) in this case. The resulting stats are all 0 which should result in an undesirable balance target -- not to mention that actually trying to migrate to an offline CPU will fail. Reported-by: NPrarit Bhargava <prarit@redhat.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/n/tip-orja0qylcvyhxfsuebcyL5sI@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-