- 16 12月, 2016 1 次提交
-
-
由 Geert Uytterhoeven 提交于
If CONFIG_PRINTK=n: kernel/printk/printk.c:1893: warning: ‘cont’ defined but not used Note that there are actually two different struct cont definitions and objects: the first one is used if CONFIG_PRINTK=y, the second one became unused by removing console_cont_flush(). Fixes: 5c2992ee ("printk: remove console flushing special cases for partial buffered lines") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NPetr Mladek <pmladek@suse.com> [ I do the occasional "allnoconfig" builds, but apparently not often enough - Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 12月, 2016 28 次提交
-
-
由 Linus Torvalds 提交于
It actively hurts proper merging, and makes for a lot of special cases. There was a good(ish) reason for doing it originally, but it's getting too painful to maintain. And most of the original reasons for it are long gone. So instead of having special code to flush partial lines to the console (as opposed to the record buffers), do _all_ the console writing from the record buffer, and be done with it. If an oops happens (or some other synchronous event), we will flush the partial lines due to the oops printing activity, so this does not affect that. It does mean that if you have a completely hung machine, a partial preceding line may not have been printed out. That was some of the original reason for this complexity, in fact, back when we used to test for the historical i386 "halt" instruction problem by doing pr_info("Checking 'hlt' instruction... "); if (!boot_cpu_data.hlt_works_ok) { pr_cont("disabled\n"); return; } halt(); halt(); halt(); halt(); pr_cont("OK\n"); and that model no longer works (it the 'hlt' instruction kills the machine, the partial line won't have been flushed, so you won't even see it). Of course, that was also back in the days when people actually had textual console output rather than a graphical splash-screen at bootup. How times change.. Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Steven Rostedt <rostedt@goodmis.org> Tested-by: NPetr Mladek <pmladek@suse.com> Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org> Tested-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
The record logging code looks at the previous record flags in various ways, and they are all wrong. You can't use the previous record flags to determine anything about the next record, because they may simply not be related. In particular, the reason the previous record was a continuation record may well be exactly _because_ the new record was printed by a different process, which is why the previous record was flushed. So all those games are simply wrong, and make the code hard to understand (because the code fundamentally cdoes not make sense). So remove it. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lorenzo Stoakes 提交于
Patch series "mm: unexport __get_user_pages_unlocked()". This patch series continues the cleanup of get_user_pages*() functions taking advantage of the fact we can now pass gup_flags as we please. It firstly adds an additional 'locked' parameter to get_user_pages_remote() to allow for its callers to utilise VM_FAULT_RETRY functionality. This is necessary as the invocation of __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of this and no other existing higher level function would allow it to do so. Secondly existing callers of __get_user_pages_unlocked() are replaced with the appropriate higher-level replacement - get_user_pages_unlocked() if the current task and memory descriptor are referenced, or get_user_pages_remote() if other task/memory descriptors are referenced (having acquiring mmap_sem.) This patch (of 2): Add a int *locked parameter to get_user_pages_remote() to allow VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked(). Taking into account the previous adjustments to get_user_pages*() functions allowing for the passing of gup_flags, we are now in a position where __get_user_pages_unlocked() need only be exported for his ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport __get_user_pages_unlocked() as well as allowing for future flexibility in the use of get_user_pages_remote(). [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change] Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.comSigned-off-by: NLorenzo Stoakes <lstoakes@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Babu Moger 提交于
Separate hardlockup code from watchdog.c and move it to watchdog_hld.c. It is mostly straight forward. Remove everything inside CONFIG_HARDLOCKUP_DETECTORS. This code will go to file watchdog_hld.c. Also update the makefile accordigly. Link: http://lkml.kernel.org/r/1478034826-43888-3-git-send-email-babu.moger@oracle.comSigned-off-by: NBabu Moger <babu.moger@oracle.com> Acked-by: NDon Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Josh Hunt <johunt@akamai.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Babu Moger 提交于
Patch series "Clean up watchdog handlers", v2. This is an attempt to cleanup watchdog handlers. Right now, kernel/watchdog.c implements both softlockup and hardlockup detectors. Softlockup code is generic. Hardlockup code is arch specific. Some architectures don't use hardlockup detectors. They use their own watchdog detectors. To make both these combination work, we have numerous #ifdefs in kernel/watchdog.c. We are trying here to make these handlers independent of each other. Also provide an interface for architectures to implement their own handlers. watchdog_nmi_enable and watchdog_nmi_disable will be defined as weak such that architectures can override its definitions. Thanks to Don Zickus for his suggestions. Here are our previous discussions http://www.spinics.net/lists/sparclinux/msg16543.html http://www.spinics.net/lists/sparclinux/msg16441.html This patch (of 3): Move shared macros and definitions to nmi.h so that watchdog.c, new file watchdog_hld.c or any other architecture specific handler can use those definitions. Link: http://lkml.kernel.org/r/1478034826-43888-2-git-send-email-babu.moger@oracle.comSigned-off-by: NBabu Moger <babu.moger@oracle.com> Acked-by: NDon Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Josh Hunt <johunt@akamai.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicolas Pitre 提交于
The OpenRISC compiler (so far) fails to optimize away a large portion of code containing a reference to posix_timer_event in alarmtimer.c when CONFIG_POSIX_TIMERS is unset. Let's give it a direct clue to let the build succeed. This fixes [linux-next:master 6682/7183] alarmtimer.c:undefined reference to `posix_timer_event' reported by kbuild test robot. Signed-off-by: NNicolas Pitre <nico@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
kdb_trap_printk allows to pass normal printk() messages to kdb via vkdb_printk(). For example, it is used to get backtrace using the classic show_stack(), see kdb_show_stack(). vkdb_printf() tries to avoid a potential infinite loop by disabling the trap. But this approach is racy, for example: CPU1 CPU2 vkdb_printf() // assume that kdb_trap_printk == 0 saved_trap_printk = kdb_trap_printk; kdb_trap_printk = 0; kdb_show_stack() kdb_trap_printk++; Problem1: Now, a nested printk() on CPU0 calls vkdb_printf() even when it should have been disabled. It will not cause a deadlock but... // using the outdated saved value: 0 kdb_trap_printk = saved_trap_printk; kdb_trap_printk--; Problem2: Now, kdb_trap_printk == -1 and will stay like this. It means that all messages will get passed to kdb from now on. This patch removes the racy saved_trap_printk handling. Instead, the recursion is prevented by a check for the locked CPU. The solution is still kind of racy. A non-related printk(), from another process, might get trapped by vkdb_printf(). And the wanted printk() might not get trapped because kdb_printf_cpu is assigned. But this problem existed even with the original code. A proper solution would be to get_cpu() before setting kdb_trap_printk and trap messages only from this CPU. I am not sure if it is worth the effort, though. In fact, the race is very theoretical. When kdb is running any of the commands that use kdb_trap_printk there is a single active CPU and the other CPUs should be in a holding pen inside kgdb_cpu_enter(). The only time this is violated is when there is a timeout waiting for the other CPUs to report to the holding pen. Finally, note that the situation is a bit schizophrenic. vkdb_printf() explicitly allows recursion but only from KDB code that calls kdb_printf() directly. On the other hand, the generic printk() recursion is not allowed because it might cause an infinite loop. This is why we could not hide the decision inside vkdb_printf() easily. Link: http://lkml.kernel.org/r/1480412276-16690-4-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
kdb_printf_lock does not prevent other CPUs from entering the critical section because it is ignored when KDB_STATE_PRINTF_LOCK is set. The problematic situation might look like: CPU0 CPU1 vkdb_printf() if (!KDB_STATE(PRINTF_LOCK)) KDB_STATE_SET(PRINTF_LOCK); spin_lock_irqsave(&kdb_printf_lock, flags); vkdb_printf() if (!KDB_STATE(PRINTF_LOCK)) BANG: The PRINTF_LOCK state is set and CPU1 is entering the critical section without spinning on the lock. The problem is that the code tries to implement locking using two state variables that are not handled atomically. Well, we need a custom locking because we want to allow reentering the critical section on the very same CPU. Let's use solution from Petr Zijlstra that was proposed for a similar scenario, see https://lkml.kernel.org/r/20161018171513.734367391@infradead.org This patch uses the same trick with cmpxchg(). The only difference is that we want to handle only recursion from the same context and therefore we disable interrupts. In addition, KDB_STATE_PRINTF_LOCK is removed. In fact, we are not able to set it a non-racy way. Link: http://lkml.kernel.org/r/1480412276-16690-3-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com> Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
kdb_event state variable is only set but never checked in the kernel code. http://www.spinics.net/lists/kdb/msg01733.html suggests that this variable affected WARN_CONSOLE_UNLOCKED() in the original implementation. But this check never went upstream. The semantic is unclear and racy. The value is updated after the kdb_printf_lock is acquired and after it is released. It should be symmetric at minimum. The value should be manipulated either inside or outside the locked area. Fortunately, it seems that the original function is gone and we could simply remove the state variable. Link: http://lkml.kernel.org/r/1480412276-16690-2-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com> Suggested-by: NDaniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Douglas Anderson 提交于
We've got a delay loop waiting for secondary CPUs. That loop uses loops_per_jiffy. However, loops_per_jiffy doesn't actually mean how many tight loops make up a jiffy on all architectures. It is quite common to see things like this in the boot log: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=24000) In my case I was seeing lots of cases where other CPUs timed out entering the debugger only to print their stack crawls shortly after the kdb> prompt was written. Elsewhere in kgdb we already use udelay(), so that should be safe enough to use to implement our timeout. We'll delay 1 ms for 1000 times, which should give us a full second of delay (just like the old code wanted) but allow us to notice that we're done every 1 ms. [akpm@linux-foundation.org: simplifications, per Daniel] Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.orgSigned-off-by: NDouglas Anderson <dianders@chromium.org> Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Brian Norris <briannorris@chromium.org> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kefeng Wang 提交于
It is fragile that some definitions acquired via transitive dependencies, as shown in below: atomic_* (<linux/atomic.h>) ENOMEM/EN* (<linux/errno.h>) EXPORT_SYMBOL (<linux/export.h>) device_initcall (<linux/init.h>) preempt_* (<linux/preempt.h>) Include them to prevent possible issues. Link: http://lkml.kernel.org/r/1481163221-40170-1-git-send-email-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Suggested-by: NMark Rutland <mark.rutland@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Carpenter 提交于
Smatch complains that we started using the array offset before we checked that it was valid. Fixes: 017c59c0 ('relay: Use per CPU constructs for the relay channel buffer pointers') Link: http://lkml.kernel.org/r/20161013084947.GC16198@mwandaSigned-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tetsuo Handa 提交于
Do not break lines while printk()ing values. kernel: warning: process `tomoyo_file_tes' used the deprecated sysctl system call with kernel: 3. kernel: 5. kernel: 56. kernel: Link: http://lkml.kernel.org/r/1480814833-4976-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhong jiang 提交于
A soft lookup will occur when I run trinity in syscall kexec_load. the corresponding stack information is as follows. BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859] Kernel panic - not syncing: softlockup: hung tasks CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G O L ----V------- 3.10.0-327.28.3.35.zhongjiang.x86_64 #1 Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 V2/BC01SRSA0, BIOS RMIBV386 06/30/2014 Call Trace: <IRQ> dump_stack+0x19/0x1b panic+0xd8/0x214 watchdog_timer_fn+0x1cc/0x1e0 __hrtimer_run_queues+0xd2/0x260 hrtimer_interrupt+0xb0/0x1e0 ? call_softirq+0x1c/0x30 local_apic_timer_interrupt+0x37/0x60 smp_apic_timer_interrupt+0x3f/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> ? kimage_alloc_control_pages+0x80/0x270 ? kmem_cache_alloc_trace+0x1ce/0x1f0 ? do_kimage_alloc_init+0x1f/0x90 kimage_alloc_init+0x12a/0x180 SyS_kexec_load+0x20a/0x260 system_call_fastpath+0x16/0x1b the first time allocation of control pages may take too much time because crash_res.end can be set to a higher value. we need to add cond_resched to avoid the issue. The patch have been tested and above issue is not appear. Link: http://lkml.kernel.org/r/1481164674-42775-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
Currently in x86_64, the symbol address of phys_base is exported to vmcoreinfo. Dave Anderson complained this is really useless for his Crash implementation. Because in user-space utility Crash and Makedumpfile which exported vmcore information is mainly used for, value of phys_base is needed to covert virtual address of exported kernel symbol to physical address. Especially init_level4_pgt, if we want to access and go over the page table to look up a PA corresponding to VA, firstly we need calculate page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base; Now in Crash and Makedumpfile, we have to analyze the vmcore elf program header to get value of phys_base. As Dave said, it would be preferable if it were readily availabl in vmcoreinfo rather than depending upon the PT_LOAD semantics. Hence in this patch change to export the value of phys_base instead of its virtual address. And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64 only, should be moved into arch dependent function arch_crash_save_vmcoreinfo. Do the moving in this patch. Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
I was amused to find "unsafe core_pattern" warning having these lines in /etc/sysctl.conf: fs.suid_dumpable=2 kernel.core_pattern=/core/core-%e-%p-%E kernel.core_uses_pid=0 Turns out kernel is formally right. Default core_pattern is just "core", which doesn't qualify for secure path while setting suid.dumpable. Hint admins about solution, clarify sysctl names, delete unnecessary '\' characters (string literals are concatenated regardless) and reformat for easier grepping. Link: http://lkml.kernel.org/r/20161029152124.GA1258@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
When running certain database workload on a high-end system with many CPUs, it was found that spinlock contention in the sigprocmask syscalls became a significant portion of the overall CPU cycles as shown below. 9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2 [k] _raw_spin_lock_irq | ---_raw_spin_lock_irq | |--99.34%-- __set_current_blocked | sigprocmask | sys_rt_sigprocmask | system_call_fastpath | | | |--50.63%-- __swapcontext | | | | | |--99.91%-- upsleepgeneric | | | |--49.36%-- __setcontext | | ktskRun Looking further into the swapcontext function in glibc, it was found that the function always call sigprocmask() without checking if there are changes in the signal mask. A check was added to the __set_current_blocked() function to avoid taking the sighand->siglock spinlock if there is no change in the signal mask. This will prevent unneeded spinlock contention when many threads are trying to call sigprocmask(). With this patch applied, the spinlock contention in sigprocmask() was gone. Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.comSigned-off-by: NWaiman Long <Waiman.Long@hpe.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stas Sergeev <stsp@list.ru> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ. Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP pointing to the code interrupted by IRQ which was interrupted by NMI. NULL isn't a problem: in this case watchdog calls dump_stack() and prints full stack trace including NMI. But if we're stuck in IRQ handler then NMI watchlog will print stack trace without IRQ part at all. This patch uses registers snapshot passed into NMI handler as arguments: these registers point exactly to the instruction interrupted by NMI. Fixes: 55537871 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup") Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard Guy Briggs 提交于
Resetting audit_sock appears to be racy. audit_sock was being copied and dereferenced without using a refcount on the source sock. Bump the refcount on the underlying sock when we store a refrence in audit_sock and release it when we reset audit_sock. audit_sock modification needs the audit_cmd_mutex. See: https://lkml.org/lkml/2016/11/26/232 Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang <xiyou.wangcong@gmail.com> on ideas how to fix it. Signed-off-by: NRichard Guy Briggs <rgb@redhat.com> Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> [PM: fixed the comment block text formatting for auditd_reset()] Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
Sleeping on a command record/message in audit_log_start() could slow something, e.g. auditd, from doing something important, e.g. clean shutdown, which could present problems on a heavily loaded system. This patch allows tasks to bypass any queue restrictions if they are logging a command record/message. Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
When auditd stops cleanly it sets 'auditd_pid' to 0 with an AUDIT_SET message, in this case we should reset our backlog queues via the auditd_reset() function. This patch also adds a 'auditd_pid' check to the top of kauditd_send_unicast_skb() so we can fail quicker. Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
This patch was suggested by Richard Briggs back in 2015, see the link to the mail archive below. Unfortunately, that patch is no longer even remotely valid due to other changes to the code. * https://www.redhat.com/archives/linux-audit/2015-October/msg00075.htmlSuggested-by: NRichard Guy Briggs <rgb@redhat.com> Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
The backlog queue handling in audit_log_start() is a little odd with some questionable design decisions, this patch attempts to rectify this with the following changes: * Never make auditd wait, ignore any backlog limits as we need auditd awake so it can drain the backlog queue. * When we hit a backlog limit and start dropping records, don't wake all the tasks sleeping on the backlog, that's silly. Instead, let kauditd_thread() take care of waking everyone once it has had a chance to drain the backlog queue. * Don't keep a global backlog timeout countdown, make it per-task. A per-task timer means we won't have all the sleeping tasks waking at the same time and hammering on an already stressed backlog queue. Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
The audit record backlog queue has always been a bit of a mess, and the moving the multicast send into kauditd_thread() from audit_log_end() only makes things worse. This patch attempts to fix the backlog queue with a better design that should hold up better under load and have less of a performance impact at syscall invocation time. While it looks like there is a log going on in this patch, the main change is the move from a single backlog queue to three queues: * A queue for holding records generated from audit_log_end() that haven't been consumed by kauditd_thread() (audit_queue). * A queue for holding records that have been sent via multicast but had a temporary failure when sending via unicast and need a resend (audit_retry_queue). * A queue for holding records that haven't been sent via unicast because no one is listening (audit_hold_queue). Special care is taken in this patch to ensure that the proper record ordering is preserved, e.g. we send everything in the hold queue first, then the retry queue, and finally the main queue. Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
The audit queue names can be shortened and the record sending helpers associated with the kauditd task could be named better, do these small cleanups now to make life easier once we start reworking the queues and kauditd code. Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
Sending audit netlink multicast messages is bad for all the same reasons that sending audit netlink unicast messages is bad, so this patch reworks things so that we don't do the multicast send in audit_log_end(), we do it from the dedicated kauditd_thread thread just as we do for unicast messages. See the GitHub issues below for more information/history: * https://github.com/linux-audit/audit-kernel/issues/23 * https://github.com/linux-audit/audit-kernel/issues/22Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Paul Moore 提交于
Make sure everything is initialized before we start the kauditd_thread and don't emit the "initialized" record until everything is finished. We also panic with a descriptive message if we can't start the kauditd_thread. Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Richard Guy Briggs 提交于
Richard made this change some time ago but Eric backed it out because the rest of the supporting code wasn't ready. In order to move the netlink multicast send to kauditd_thread we need to ensure the kauditd_thread is always running, so restore commit 6ff5e459 ("audit: move kaudit thread start from auditd registration to kaudit init"). Signed-off-by: NRichard Guy Briggs <rbriggs@redhat.com> [PM: brought forward and merged based on Richard's old patch] Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
- 14 12月, 2016 1 次提交
-
-
由 Paul Bolle 提交于
The build system stopped generating ikconfig.h in v2.6.8. Remove an entry for it in dontdiff. There's also a reference to it in a small comment. Remove that comment too, as it is of little help in any case. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 13 12月, 2016 7 次提交
-
-
由 Petr Mladek 提交于
Commit 4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines") allows to define more message headers for a single message. The motivation is that continuous lines might get mixed. Therefore it make sense to define the right log level for every piece of a cont line. This patch introduces printk_skip_headers() that will skip all headers and uses it in the kdb code instead of printk_skip_level(). This approach helps to fix other printk_skip_level() users independently. Link: http://lkml.kernel.org/r/1478695291-12169-3-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
Commit 4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines") added back KERN_CONT message header. As a result it might appear in the middle of the line when the parts are squashed via the temporary NMI buffer. A reasonable solution seems to be to split the text in the NNI temporary not only by newlines but also by the message headers. Another solution would be to filter out KERN_CONT when writing to the temporary buffer. But this would complicate the lockless handling. Also it would not solve problems with a missing newline that was there even before the KERN_CONT stuff. This patch moves the temporary buffer handling into separate function. I played with it and it seems that using the char pointers make the code easier to read. Also it prints the final newline as a continuous line. Finally, it moves handling of the s->len overflow into the paranoid check. And allows to recover from the disaster. Link: http://lkml.kernel.org/r/1478695291-12169-2-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
vsnprintf() adds the trailing '\0' but it does not count it into the number of printed characters. The result is that there is one byte less space for the real characters in the buffer. The broken check for the free space might cause that we will repeatedly try to print 1 character into the buffer, never reach the full buffer, and do not count the messages as missed. Also vsnprintf() returns the number of characters that would be printed if the buffer was big enough. As a result, s->len might be bigger than the size of the buffer[*]. And the printk() function might return bigger len than it really printed. Both problems are fixed by using vscnprintf() instead. Note that I though about increasing the number of missed messages even when the message was shrunken. But it made the code even more complicated. I think that it is not worth it. Shrunken messages are usually easy to recognize. And it should be a corner case. [*] The overflown s->len value is crazy and unexpected. I "made a mistake" and reported this situation as an internal error when fixed handling of PR_CONT headers in some other patch. Link: http://lkml.kernel.org/r/20161208174912.GA17042@linux.suseSigned-off-by: NPetr Mladek <pmladek@suse.com> CcL Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Joe Perches <joe@perches.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Takashi Iwai <tiwai@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tetsuo Handa 提交于
Since sysctl_hung_task_warnings == -1 is allowed (infinite warnings), commit 48a6d64e ("hung_task: allow hung_task_panic when hung_task_warnings is 0") should decrement it only when it is not -1. This prevents the kernel from ceasing warnings after the first 4294967295 ;) Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: John Siddle <jsiddle@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
vfree() is going to use sleeping lock. Thread stack freed in atomic context, therefore we must use vfree_atomic() here. Link: http://lkml.kernel.org/r/1479474236-4139-6-git-send-email-hch@lst.deSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stanislav Kinsburskiy 提交于
This limitation came with the reason to remove "another way for malicious code to obscure a compromised program and masquerade as a benign process" by allowing "security-concious program can use this prctl once during its early initialization to ensure the prctl cannot later be abused for this purpose": http://marc.info/?l=linux-kernel&m=133160684517468&w=2 This explanation doesn't look sufficient. The only thing "exe" link is indicating is the file, used to execve, which is basically nothing and not reliable immediately after process has returned from execve system call. Moreover, to use this feture, all the mappings to previous exe file have to be unmapped and all the new exe file permissions must be satisfied. Which means, that changing exe link is very similar to calling execve on the binary. The need to remove this limitations comes from migration of NFS mount point, which is not accessible during restore and replaced by other file system. Because of this exe link has to be changed twice. [akpm@linux-foundation.org: fix up comment] Link: http://lkml.kernel.org/r/20160927153755.9337.69650.stgit@localhost.localdomainSigned-off-by: NStanislav Kinsburskiy <skinsbursky@virtuozzo.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: John Stultz <john.stultz@linaro.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicolas Iooss 提交于
When commit fbae2d44 ("kthread: add kthread_create_worker*()") introduced some kthread_create_...() functions which were taking printf-like parametter, it introduced __printf attributes to some functions (e.g. kthread_create_worker()). Nevertheless some new functions were forgotten (they have been detected thanks to -Wmissing-format-attribute warning flag). Add the missing __printf attributes to the newly-introduced functions in order to detect formatting issues at build-time with -Wformat flag. Link: http://lkml.kernel.org/r/20161126193543.22672-1-nicolas.iooss_linux@m4x.orgSigned-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org> Reviewed-by: NPetr Mladek <pmladek@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 12月, 2016 2 次提交
-
-
由 Vincent Guittot 提交于
find_idlest_group() only compares the runnable_load_avg when looking for the least loaded group. But on fork intensive use case like hackbench where tasks blocked quickly after the fork, this can lead to selecting the same CPU instead of other CPUs, which have similar runnable load but a lower load_avg. When the runnable_load_avg of 2 CPUs are close, we now take into account the amount of blocked load as a 2nd selection factor. There is now 3 zones for the runnable_load of the rq: - [0 .. (runnable_load - imbalance)]: Select the new rq which has significantly less runnable_load - [(runnable_load - imbalance) .. (runnable_load + imbalance)]: The runnable loads are close so we use load_avg to chose between the 2 rq - [(runnable_load + imbalance) .. ULONG_MAX]: Keep the current rq which has significantly less runnable_load The scale factor that is currently used for comparing runnable_load, doesn't work well with small value. As an example, the use of a scaling factor fails as soon as this_runnable_load == 0 because we always select local rq even if min_runnable_load is only 1, which doesn't really make sense because they are just the same. So instead of scaling factor, we use an absolute margin for runnable_load to detect CPUs with similar runnable_load and we keep using scaling factor for blocked load. For use case like hackbench, this enable the scheduler to select different CPUs during the fork sequence and to spread tasks across the system. Tests have been done on a Hikey board (ARM based octo cores) for several kernel. The result below gives min, max, avg and stdev values of 18 runs with each configuration. The patches depend on the "no missing update_rq_clock()" work. hackbench -P -g 1 ea86cb4b 7dc603c9 v4.8 v4.8+patches min 0.049 0.050 0.051 0,048 avg 0.057 0.057(0%) 0.057(0%) 0,055(+5%) max 0.066 0.068 0.070 0,063 stdev +/-9% +/-9% +/-8% +/-9% More performance numbers here: https://lkml.kernel.org/r/20161203214707.GI20785@codeblueprint.co.ukTested-by: NMatt Fleming <matt@codeblueprint.co.uk> Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: kernellwp@gmail.com Cc: umgwanakikbuti@gmail.com Cc: yuyang.du@intel.comc Link: http://lkml.kernel.org/r/1481216215-24651-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vincent Guittot 提交于
During fork, the utilization of a task is init once the rq has been selected because the current utilization level of the rq is used to set the utilization of the fork task. As the task's utilization is still 0 at this step of the fork sequence, it doesn't make sense to look for some spare capacity that can fit the task's utilization. Furthermore, I can see perf regressions for the test: hackbench -P -g 1 because the least loaded policy is always bypassed and tasks are not spread during fork. With this patch and the fix below, we are back to same performances as for v4.8. The fix below is only a temporary one used for the test until a smarter solution is found because we can't simply remove the test which is useful for others benchmarks | @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t | | avg_cost = this_sd->avg_scan_cost; | | - /* | - * Due to large variance we need a large fuzz factor; hackbench in | - * particularly is sensitive here. | - */ | - if ((avg_idle / 512) < avg_cost) | - return -1; | - | time = local_clock(); | | for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) { Tested-by: NMatt Fleming <matt@codeblueprint.co.uk> Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk> Acked-by: NMorten Rasmussen <morten.rasmussen@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: kernellwp@gmail.com Cc: umgwanakikbuti@gmail.com Cc: yuyang.du@intel.comc Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
The resume code must deal with a clocksource delta which is potentially big enough to overflow the 64bit mult. Replace the open coded handling with the proper function. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Liav Rehana <liavr@mellanox.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-