- 13 12月, 2013 3 次提交
-
-
由 David Howells 提交于
Fix the gathering of certificates from both the source tree and the build tree to correctly calculate the pathnames of all the certificates. The problem was that if the default generated cert, signing_key.x509, didn't exist then it would not have a path attached and if it did, it would have a path attached. This means that the contents of kernel/.x509.list would change between the first compilation in a directory and the second. After the second it would remain stable because the signing_key.x509 file exists. The consequence was that the kernel would get relinked unconditionally on the second recompilation. The second recompilation would also show something like this: X.509 certificate list changed CERTS kernel/x509_certificate_list - Including cert /home/torvalds/v2.6/linux/signing_key.x509 AS kernel/system_certificates.o LD kernel/built-in.o which is why the relink would happen. Unfortunately, it isn't a simple matter of just sticking a path on the front of the filename of the certificate in the build directory as make can't then work out how to build it. So the path has to be prepended to the name for sorting and duplicate elimination and then removed for the make rule if it is in the build tree. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Linus Torvalds 提交于
When debugging the read-only hugepage case, I was confused by the fact that get_futex_key() did an access_ok() only for the non-shared futex case, since the user address checking really isn't in any way specific to the private key handling. Now, it turns out that the shared key handling does effectively do the equivalent checks inside get_user_pages_fast() (it doesn't actually check the address range on x86, but does check the page protections for being a user page). So it wasn't actually a bug, but the fact that we treat the address differently for private and shared futexes threw me for a loop. Just move the check up, so that it gets done for both cases. Also, use the 'rw' parameter for the type, even if it doesn't actually matter any more (it's a historical artifact of the old racy i386 "page faults from kernel space don't check write protections"). Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
The hugepage code had the exact same bug that regular pages had in commit 7485d0d3 ("futexes: Remove rw parameter from get_futex_key()"). The regular page case was fixed by commit 9ea71503 ("futex: Fix regression with read only mappings"), but the transparent hugepage case (added in a5b338f2: "thp: update futex compound knowledge") case remained broken. Found by Dave Jones and his trinity tool. Reported-and-tested-by: NDave Jones <davej@fedoraproject.org> Cc: stable@kernel.org # v2.6.38+ Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 12月, 2013 2 次提交
-
-
由 Hendrik Brueckner 提交于
Apart from data-type specific alignment constraints, there are also architecture-specific alignment requirements. For example, on s390 symbols must be on even addresses implying a 2-byte alignment. If the system_certificate_list_end symbol is on an odd address and if this address is loaded, the least-significant bit is ignored. As a result, the load_system_certificate_list() fails to load the certificates because of a wrong certificate length calculation. To be safe, align system_certificate_list on an 8-byte boundary. Also improve the length calculation of the system_certificate_list content. Introduce a system_certificate_list_size (8-byte aligned because of unsigned long) variable that stores the length. Let the linker calculate this size by introducing a start and end label for the certificate content. Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Rusty Russell 提交于
$ git status # On branch pending-rebases # Untracked files: # (use "git add <file>..." to include in what will be committed) # # kernel/x509_certificate_list nothing added to commit but untracked files present (use "git add" to track) $ Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 06 12月, 2013 1 次提交
-
-
由 Steven Rostedt 提交于
It has been reported that boot up with FTRACE_SELFTEST enabled can take a very long time. There can be stalls of over a minute. This was tracked down to the synchronize_sched() called when a system call event is disabled. As the self tests enable and disable thousands of events, this makes the synchronize_sched() get called thousands of times. The synchornize_sched() was added with d562aff9 "tracing: Add support for SOFT_DISABLE to syscall events" which caused this regression (added in 3.13-rc1). The synchronize_sched() is to protect against the events being accessed when a tracer instance is being deleted. When an instance is being deleted all the events associated to it are unregistered. The synchronize_sched() makes sure that no more users are running when it finishes. Instead of calling synchronize_sched() for all syscall events, we only need to call it once, after the events are unregistered and before the instance is deleted. The event_mutex is held during this action to prevent new users from enabling events. Link: http://lkml.kernel.org/r/20131203124120.427b9661@gandalf.local.homeReported-by: NPetr Mladek <pmladek@suse.cz> Acked-by: NTom Zanussi <tom.zanussi@linux.intel.com> Acked-by: NPetr Mladek <pmladek@suse.cz> Tested-by: NPetr Mladek <pmladek@suse.cz> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 29 11月, 2013 2 次提交
-
-
由 Thomas Gleixner 提交于
If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ. If CONFIG_NO_HZ=y and the nohz functionality is disabled via the command line option "nohz=off" or not enabled due to missing hardware support, then tick_nohz_get_sleep_length() returns 0. That happens because ts->sleep_length is never set in that case. Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive. Reported-by: NMichal Hocko <mhocko@suse.cz> Reported-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Helge Deller 提交于
The init_kernel_text() and core_kernel_text() functions should not include the labels _einittext and _etext when checking if an address is inside the .text or .init sections. Signed-off-by: NHelge Deller <deller@gmx.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 11月, 2013 2 次提交
-
-
由 Tejun Heo 提交于
If a cgroup file implements either read_map() or read_seq_string(), such file is served using seq_file by overriding file->f_op to cgroup_seqfile_operations, which also overrides the release method to single_release() from cgroup_file_release(). Because cgroup_file_open() didn't use to acquire any resources, this used to be fine, but since f7d58818 ("cgroup: pin cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open() pins the css (cgroup_subsys_state) which is put by cgroup_file_release(). The patch forgot to update the release path for seq_files and each open/release cycle leaks a css reference. Fix it by updating cgroup_file_release() to also handle seq_files and using it for seq_file release path too. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v3.12
-
由 Peter Zijlstra 提交于
Juri hit the below lockdep report: [ 4.303391] ====================================================== [ 4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 4.303394] 3.12.0-dl-peterz+ #144 Not tainted [ 4.303395] ------------------------------------------------------ [ 4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 4.303399] (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290 [ 4.303417] [ 4.303417] and this task is already holding: [ 4.303418] (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100 [ 4.303431] which would create a new lock dependency: [ 4.303432] (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...} [ 4.303436] [ 4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: [ 4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 { [ 4.303922] HARDIRQ-ON-W at: [ 4.303923] [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0 [ 4.303926] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303929] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303931] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303933] SOFTIRQ-ON-W at: [ 4.303933] [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0 [ 4.303935] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303940] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303955] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303959] INITIAL USE at: [ 4.303960] [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0 [ 4.303963] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303966] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303969] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303972] } Which reports that we take mems_allowed_seq with interrupts enabled. A little digging found that this can only be from cpuset_change_task_nodemask(). This is an actual deadlock because an interrupt doing an allocation will hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin forever waiting for the write side to complete. Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Reported-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Tested-by: NJuri Lelli <juri.lelli@gmail.com> Acked-by: NLi Zefan <lizefan@huawei.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
-
- 27 11月, 2013 1 次提交
-
-
由 Thomas Gleixner 提交于
Tony reported that aa0d5326 ("ia64: Use preempt_schedule_irq") broke PREEMPT=n builds on ia64. Ok, wrapped my brain around it. I tripped over the magic asm foo which has a single need_resched check and schedule point for both sys call return and interrupt return. So you need the schedule_preempt_irq() for kernel preemption from interrupt return while on a normal syscall preemption a schedule would be sufficient. But using schedule_preempt_irq() is not harmful here in any way. It just sets the preempt_active bit also in cases where it would not be required. Even on preempt=n kernels adding the preempt_active bit is completely harmless. So instead of having an extra function, moving the existing one out of the ifdef PREEMPT looks like the sanest thing to do. It would also allow getting rid of various other sti/schedule/cli asm magic in other archs. Reported-and-Tested-by: NTony Luck <tony.luck@gmail.com> Fixes: aa0d5326 ("ia64: Use preempt_schedule_irq") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> [slightly edited Changelog] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311211230030.30673@ionos.tec.linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 11月, 2013 2 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Commit 8c4f3c3f "ftrace: Check module functions being traced on reload" fixed module loading and unloading with respect to function tracing, but it missed the function graph tracer. If you perform the following # cd /sys/kernel/debug/tracing # echo function_graph > current_tracer # modprobe nfsd # echo nop > current_tracer You'll get the following oops message: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9() Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000 0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668 ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000 Call Trace: [<ffffffff814fe193>] dump_stack+0x4f/0x7c [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364 [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78 [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10 [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e [<ffffffff81122aed>] vfs_write+0xab/0xf6 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff81122dbd>] SyS_write+0x59/0x82 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b ---[ end trace 940358030751eafb ]--- The above mentioned commit didn't go far enough. Well, it covered the function tracer by adding checks in __register_ftrace_function(). The problem is that the function graph tracer circumvents that (for a slight efficiency gain when function graph trace is running with a function tracer. The gain was not worth this). The problem came with ftrace_startup() which should always be called after __register_ftrace_function(), if you want this bug to be completely fixed. Anyway, this solution moves __register_ftrace_function() inside of ftrace_startup() and removes the need to call them both. Reported-by: NDave Wysochanski <dwysocha@redhat.com> Fixes: ed926f9b ("ftrace: Use counters to enable functions to trace") Cc: stable@vger.kernel.org # 3.0+ Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Laxman Dewangan 提交于
When the system enters suspend, it disables all interrupts in suspend_device_irqs(), including the interrupts marked EARLY_RESUME. On the resume side things are different. The EARLY_RESUME interrupts are reenabled in sys_core_ops->resume and the non EARLY_RESUME interrupts are reenabled in the normal system resume path. When suspend_noirq() failed or suspend is aborted for any other reason, we might omit the resume side call to sys_core_ops->resume() and therefor the interrupts marked EARLY_RESUME are not reenabled and stay disabled forever. To solve this, enable all irqs unconditionally in irq_resume() regardless whether interrupts marked EARLY_RESUMEhave been already enabled or not. This might try to reenable already enabled interrupts in the non failure case, but the only affected platform is XEN and it has been confirmed that it does not cause any side effects. [ tglx: Massaged changelog. ] Signed-off-by: NLaxman Dewangan <ldewangan@nvidia.com> Acked-by-and-tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NHeiko Stuebner <heiko@sntech.de> Reviewed-by: NPavel Machek <pavel@ucw.cz> Cc: <ian.campbell@citrix.com> Cc: <rjw@rjwysocki.net> Cc: <len.brown@intel.com> Cc: <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1385388587-16442-1-git-send-email-ldewangan@nvidia.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 11月, 2013 6 次提交
-
-
由 Li Bin 提交于
When one work starts execution, the high bits of work's data contain pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID is assigned WORK_OFFQ_POOL_NONE when the work being initialized indicating that no pool is associated and get_work_pool() uses it to check the associated pool. So if worker_pool_assign_id() assigns a ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers leakage, and it may break the non-reentrance guarantee. This patch fix this issue by modifying the worker_pool_assign_id() function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE. Furthermore, in the current implementation, the BUILD_BUG_ON() in init_workqueues makes no sense. The number of worker pools needed cannot be determined at compile time, because the number of backing pools for UNBOUND workqueues is dynamic based on the assigned custom attributes. So remove it. tj: Minor comment and indentation updates. Signed-off-by: NLi Bin <huawei.libin@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Li Bin 提交于
It seems the "dying" should be "draining" here. Signed-off-by: NLi Bin <huawei.libin@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
An ordered workqueue implements execution ordering by using single pool_workqueue with max_active == 1. On a given pool_workqueue, work items are processed in FIFO order and limiting max_active to 1 enforces the queued work items to be processed one by one. Unfortunately, 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues") accidentally broke this guarantee by applying NUMA affinity to ordered workqueues too. On NUMA setups, an ordered workqueue would end up with separate pool_workqueues for different nodes. Each pool_workqueue still limits max_active to 1 but multiple work items may be executed concurrently and out of order depending on which node they are queued to. Fix it by using dedicated ordered_wq_attrs[] when creating ordered workqueues. The new attrs match the unbound ones except that no_numa is always set thus forcing all NUMA nodes to share the default pool_workqueue. While at it, add sanity check in workqueue creation path which verifies that an ordered workqueues has only the default pool_workqueue. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NLibin <huawei.libin@huawei.com> Cc: stable@vger.kernel.org Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
-
由 Oleg Nesterov 提交于
Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed() in create_worker(). Otherwise userland can change ->cpus_allowed in between. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Since be445626 ("cgroup: remove synchronize_rcu() from cgroup_diput()"), cgroup destruction path makes use of workqueue. css freeing is performed from a work item from that point on and a later commit, ea15f8cc ("cgroup: split cgroup destruction into two steps"), moves css offlining to workqueue too. As cgroup destruction isn't depended upon for memory reclaim, the destruction work items were put on the system_wq; unfortunately, some controller may block in the destruction path for considerable duration while holding cgroup_mutex. As large part of destruction path is synchronized through cgroup_mutex, when combined with high rate of cgroup removals, this has potential to fill up system_wq's max_active of 256. Also, it turns out that memcg's css destruction path ends up queueing and waiting for work items on system_wq through work_on_cpu(). If such operation happens while system_wq is fully occupied by cgroup destruction work items, work_on_cpu() can't make forward progress because system_wq is full and other destruction work items on system_wq can't make forward progress because the work item waiting for work_on_cpu() is holding cgroup_mutex, leading to deadlock. This can be fixed by queueing destruction work items on a separate workqueue. This patch creates a dedicated workqueue - cgroup_destroy_wq - for this purpose. As these work items shouldn't have inter-dependencies and mostly serialized by cgroup_mutex anyway, giving high concurrency level doesn't buy anything and the workqueue's @max_active is set to 1 so that destruction work items are executed one by one on each CPU. Hugh Dickins: Because cgroup_init() is run before init_workqueues(), cgroup_destroy_wq can't be allocated from cgroup_init(). Do it from a separate core_initcall(). In the future, we probably want to reorder so that workqueue init happens before cgroup_init(). Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NHugh Dickins <hughd@google.com> Reported-by: NShawn Bohrer <shawn.bohrer@gmail.com> Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils Cc: stable@vger.kernel.org # v3.9+
-
由 Martin Schwidefsky 提交于
Since commit 1e75fa8b (time: Condense timekeeper.xtime into xtime_sec - merged in v3.6), there has been an problem with the error accounting in the timekeeping code, such that when truncating to nanoseconds, we round up to the next nsec, but the balancing adjustment to the ntp_error value was dropped. This causes 1ns per tick drift forward of the clock. In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD architectures (s390, ia64, powerpc). The fix is simply to balance the accounting and to subtract the added nanosecond from ntp_error. This allows the internal long-term clock steering to keep the clock accurate. While this fix removes the regression added in 1e75fa8b, the ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD and use the new VSYSCALL method, which avoids entirely the nanosecond granular rounding, and the resulting short-term clock adjustment oscillation needed to keep long term accurate time. [ jstultz: Many thanks to Martin for his efforts identifying this subtle bug, and providing the fix. ] Originally-from: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable <stable@vger.kernel.org> #v3.6+ Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 20 11月, 2013 5 次提交
-
-
由 Kirill A. Shutemov 提交于
<linux/spinlock.h> has heavy dependencies on other header files. It triggers circular dependencies in generated headers on IA64, at least: CC kernel/bounds.s In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0, from include/linux/thread_info.h:54, from include/asm-generic/preempt.h:4, from arch/ia64/include/generated/asm/preempt.h:1, from include/linux/preempt.h:18, from include/linux/spinlock.h:50, from kernel/bounds.c:14: /home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory compilation terminated. Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's enough to find out size of spinlock_t. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-and-Tested-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Johannes Berg 提交于
As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shigeru Yoshida 提交于
Fix a trivial typo in rq_attach_root(). Signed-off-by: NShigeru Yoshida <shigeru.yoshida@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131117.121236.1990617639803941055.shigeru.yoshida@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Commit 37dc6b50 ("sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some conditions leading to a possible NULL deref in set_cpu_sd_state_idle(). Reported-by: NAnton Blanchard <anton@samba.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
After commit 863bffc8 ("sched/fair: Fix group power_orig computation"), we can dereference rq->sd before it is set. Fix this by falling back to power_of() in this case and add a comment explaining things. Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> [ Added comment and tweaked patch. ] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: mikey@neuling.org Link: http://lkml.kernel.org/r/20131113151718.GN21461@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 11月, 2013 6 次提交
-
-
由 Vince Weaver 提交于
The 64-bit attr.config value for perf trace events was being copied into an "int" before doing a comparison, meaning the top 32 bits were being truncated. As far as I can tell this didn't cause any errors, but it did mean it was possible to create valid aliases for all the tracepoint ids which I don't think was intended. (For example, 0xffffffff00000018 and 0x18 both enable the same tracepoint). Signed-off-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1311151236100.11932@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Currently we only allocate a single cpu hashtable for per-cpu swevents; do away with this optimization for it is fragile in the face of things like perf_pmu_migrate_context(). The easiest thing is to make sure all CPUs are consistent wrt state. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130913111447.GN31370@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Vince's perf-trinity fuzzer found yet another 'interesting' problem. When we sample the irq_work_exit tracepoint with period==1 (or PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an infinite event generation loop: ,-> <IPI> | irq_work_exit() -> | trace_irq_work_exit() -> | ... | __perf_event_overflow() -> (due to fasync) | irq_work_queue() -> (irq_work_list must be empty) '--------- arch_irq_work_raise() Similar things can happen due to regular poll() wakeups if we exceed the ring-buffer wakeup watermark, or have an event_limit. To avoid this, dis-allow sampling this particular tracepoint. In order to achieve this, create a special perf_perm function pointer for each event and call this (when set) on trying to create a tracepoint perf event. [ roasted: use expr... to allow for ',' in your expression ] Reported-by: NVince Weaver <vincent.weaver@maine.edu> Tested-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jones <davej@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andrew Morton 提交于
Taken straight from a tglx email ;) Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Joe Perches 提交于
Use the helper function instead of __GFP_ZERO. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the "fine" grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-and-tested-by: NMatthew Whitehead <tedheadster@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: john.stultz@linaro.org Cc: mwhitehe@redhat.com Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.de
-
- 16 11月, 2013 1 次提交
-
-
由 Al Viro 提交于
Rename simple_delete_dentry() to always_delete_dentry() and export it. Export simple_dentry_operations, while we are at it, and get rid of their duplicates Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 11月, 2013 9 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
We've switched over every architecture that supports SMP to it, so remove the new useless config variable. Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
This commit was incomplete in that code to remove items from the per-cpu lists was missing and never acquired a user in the 5 years it has been in the tree. We're going to implement what it seems to try to archive in a simpler way, and this code is in the way of doing so. Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Use kernel/bounds.c to convert build-time spinlock_t size check into a preprocessor symbol and apply that to properly separate the page::ptl situation. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
The basic idea is the same as with PTE level: the lock is embedded into struct page of table's page. We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't take mm->page_table_lock anymore. Let's reuse page->lru of table's page for that. pgtable_pmd_page_ctor() returns true, if initialization is successful and false otherwise. Current implementation never fails, but assumption that constructor can fail will help to port it to -rt where spinlock_t is rather huge and cannot be embedded into struct page -- dynamic allocation is required. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NAlex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
With split page table lock for PMD level we can't hold mm->page_table_lock while updating nr_ptes. Let's convert it to atomic_long_t to avoid races. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NAlex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rafael J. Wysocki 提交于
I have received a report about the BUG_ON() in free_basic_memory_bitmaps() triggering mysteriously during an aborted s2disk hibernation attempt. The only way I can explain that is that /dev/snapshot was first opened for writing (resume mode), then closed and then opened again for reading and closed again without freezing tasks. In that case the first invocation of snapshot_open() would set the free_bitmaps flag in snapshot_state, which is a static variable. That flag wouldn't be cleared later and the second invocation of snapshot_open() would just leave it like that, so the subsequent snapshot_release() would see data->frozen set and free_basic_memory_bitmaps() would be called unnecessarily. To prevent that from happening clear data->free_bitmaps in snapshot_open() when the file is being opened for reading (hibernate mode). In addition to that, replace the BUG_ON() in free_basic_memory_bitmaps() with a WARN_ON() as the kernel can continue just fine if the condition checked by that macro occurs. Fixes: aab17289 (PM / hibernate: Fix user space driven resume regression) Reported-by: NOliver Lorenz <olli@olorenz.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
-
由 Johannes Berg 提交于
Now that genl_ops are no longer modified in place when registering, they can be made const. This patch was done mostly with spatch: @@ identifier ops; @@ +const struct genl_ops ops[] = { ... }; (except the struct thing in net/openvswitch/datapath.c) Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-