- 26 12月, 2016 1 次提交
-
-
由 Nicholas Piggin 提交于
Add a new page flag, PageWaiters, to indicate the page waitqueue has tasks waiting. This can be tested rather than testing waitqueue_active which requires another cacheline load. This bit is always set when the page has tasks on page_waitqueue(page), and is set and cleared under the waitqueue lock. It may be set when there are no tasks on the waitqueue, which will cause a harmless extra wakeup check that will clears the bit. The generic bit-waitqueue infrastructure is no longer used for pages. Instead, waitqueues are used directly with a custom key type. The generic code was not flexible enough to have PageWaiters manipulation under the waitqueue lock (which simplifies concurrency). This improves the performance of page lock intensive microbenchmarks by 2-3%. Putting two bits in the same word opens the opportunity to remove the memory barrier between clearing the lock bit and testing the waiters bit, after some work on the arch primitives (e.g., ensuring memory operand widths match and cover both bits). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 12月, 2016 1 次提交
-
-
由 Andy Lutomirski 提交于
CGROUP_BPF depended on SOCK_CGROUP_DATA which can't be manually enabled, making it rather challenging to turn CGROUP_BPF on. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 12月, 2016 1 次提交
-
-
由 Jungseung Lee 提交于
For several devices, the rootwait time is sensitive because it directly affects booting time. The polling interval of rootwait is currently 100ms. To save unnessesary waiting time, reduce the polling interval to 5 ms. [akpm@linux-foundation.org: remove used-once #define] Link: http://lkml.kernel.org/r/20161207060743.1728-1-js07.lee@samsung.comSigned-off-by: NJungseung Lee <js07.lee@samsung.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
AMD CPUs affected by the E400 erratum suffer from the issue that the local APIC timer stops when the CPU goes into C1E. Unfortunately there is no way to detect the affected CPUs on early boot. It's only possible to determine the range of possibly affected CPUs from the family/model range. The actual decision whether to enter C1E and thus cause the bug is done by the firmware and we need to detect that case late, after ACPI has been initialized. The current solution is to check in the idle routine whether the CPU is affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is affected and the system is switched into forced broadcast mode. This is ineffective and on non-affected CPUs every entry to idle does the extra RDMSR. After doing some research it turns out that the bits are visible on the boot CPU right after the ACPI subsystem is initialized in the early boot process. So instead of polling for the bits in the idle loop, add a detection function after acpi_subsystem_init() and check for the MSR bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it will stop in C1E state as well. The switch to broadcast mode cannot be done at this point because the boot CPU still uses HPET as a clockevent device and the local APIC timer is not yet calibrated and installed. The switch to broadcast mode on the affected CPUs needs to be done when the local APIC timer is actually set up. This allows to cleanup the amd_e400_idle() function in the next step. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 30 11月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC information in order to work around the issue that newer binutils versions seem to occasionally drop the CRC on the floor. binutils 2.26 seems to work fine, while binutils 2.27 seems to break MODVERSIONS of symbols that have been defined in assembler files. [ We've had random missing CRC's before - it may be an old problem that just is now reliably triggered with the weak asm symbols and a new version of binutils ] Some day I really do want to remove MODVERSIONS entirely. Sadly, today does not appear to be that day: Debian people apparently do want the option to enable MODVERSIONS to make it easier to have external modules across kernel versions, and this seems to be a fairly minimal fix for the annoying problem. Cc: Ben Hutchings <ben@decadent.org.uk> Acked-by: NMichal Marek <mmarek@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 11月, 2016 1 次提交
-
-
由 Arnd Bergmann 提交于
The newly added 'rodata_enabled' global variable is protected by the wrong #ifdef, leading to a link error when CONFIG_DEBUG_SET_MODULE_RONX is turned on: kernel/module.o: In function `disable_ro_nx': module.c:(.text.unlikely.disable_ro_nx+0x88): undefined reference to `rodata_enabled' kernel/module.o: In function `module_disable_ro': module.c:(.text.module_disable_ro+0x8c): undefined reference to `rodata_enabled' kernel/module.o: In function `module_enable_ro': module.c:(.text.module_enable_ro+0xb0): undefined reference to `rodata_enabled' CONFIG_SET_MODULE_RONX does not exist, so use the correct one instead. Fixes: 39290b38 ("module: extend 'rodata=off' boot cmdline parameter to module mappings") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJessica Yu <jeyu@redhat.com>
-
- 28 11月, 2016 1 次提交
-
-
由 AKASHI Takahiro 提交于
The current "rodata=off" parameter disables read-only kernel mappings under CONFIG_DEBUG_RODATA: commit d2aa1aca ("mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings") This patch is a logical extension to module mappings ie. read-only mappings at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX (mainly for debug use). Please note, however, that it only affects RO/RW permissions, keeping NX set. This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64. Suggested-by: Nand Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Link: http://lkml.kernel.org/r/20161114061505.15238-1-takahiro.akashi@linaro.orgSigned-off-by: NJessica Yu <jeyu@redhat.com>
-
- 26 11月, 2016 2 次提交
-
-
由 Linus Torvalds 提交于
CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series, and quite frankly, nobody has cared very deeply. We absolutely know how to fix it, and it's not _complicated_, but it's not exactly pretty either. This oneliner fixes it without the ugliness, and allows for further future cleanups. "We've secretly replaced their regular MODVERSIONS with nothing at all, let's see if they notice" Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Mack 提交于
This patch adds two sets of eBPF program pointers to struct cgroup. One for such that are directly pinned to a cgroup, and one for such that are effective for it. To illustrate the logic behind that, assume the following example cgroup hierarchy. A - B - C \ D - E If only B has a program attached, it will be effective for B, C, D and E. If D then attaches a program itself, that will be effective for both D and E, and the program in B will only affect B and C. Only one program of a given type is effective for a cgroup. Attaching and detaching programs will be done through the bpf(2) syscall. For now, ingress and egress inet socket filtering are the only supported use-cases. Signed-off-by: NDaniel Mack <daniel@zonque.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2016 1 次提交
-
-
由 Nicolas Schichan 提交于
Otherwise each individual rotator char would be printed in a new line: (...) [ 0.642350] - [ 0.644374] | [ 0.646367] - (...) Signed-off-by: NNicolas Schichan <nicolas.schichan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 11月, 2016 1 次提交
-
-
由 Nicolas Pitre 提交于
Some embedded systems have no use for them. This removes about 25KB from the kernel binary size when configured out. Corresponding syscalls are routed to a stub logging the attempt to use those syscalls which should be enough of a clue if they were disabled without proper consideration. They are: timer_create, timer_gettime: timer_getoverrun, timer_settime, timer_delete, clock_adjtime, setitimer, getitimer, alarm. The clock_settime, clock_gettime, clock_getres and clock_nanosleep syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME, CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast majority of use cases with very little code. Signed-off-by: NNicolas Pitre <nico@linaro.org> Acked-by: NRichard Cochran <richardcochran@gmail.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: linux-kbuild@vger.kernel.org Cc: netdev@vger.kernel.org Cc: Michal Marek <mmarek@suse.com> Cc: Edward Cree <ecree@solarflare.com> Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 24 10月, 2016 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
The previous patch renamed several files that are cross-referenced along the Kernel documentation. Adjust the links to point to the right places. Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
-
- 12 10月, 2016 1 次提交
-
-
由 Peter Zijlstra 提交于
Relay avoids calling wake_up_interruptible() for doing the wakeup of readers/consumers, waiting for the generation of new data, from the context of a process which produced the data. This is apparently done to prevent the possibility of a deadlock in case Scheduler itself is is generating data for the relay, after acquiring rq->lock. The following patch used a timer (to be scheduled at next jiffy), for delegating the wakeup to another context. commit 7c9cb383 Author: Tom Zanussi <zanussi@comcast.net> Date: Wed May 9 02:34:01 2007 -0700 relay: use plain timer instead of delayed work relay doesn't need to use schedule_delayed_work() for waking readers when a simple timer will do. Scheduling a plain timer, at next jiffies boundary, to do the wakeup causes a significant wakeup latency for the Userspace client, which makes relay less suitable for the high-frequency low-payload use cases where the data gets generated at a very high rate, like multiple sub buffers getting filled within a milli second. Moreover the timer is re-scheduled on every newly produced sub buffer so the timer keeps getting pushed out if sub buffers are filled in a very quick succession (less than a jiffy gap between filling of 2 sub buffers). As a result relay runs out of sub buffers to store the new data. By using irq_work it is ensured that wakeup of userspace client, blocked in the poll call, is done at earliest (through self IPI or next timer tick) enabling it to always consume the data in time. Also this makes relay consistent with printk & ring buffers (trace), as they too use irq_work for deferred wake up of readers. [arnd@arndb.de: select CONFIG_IRQ_WORK] Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.comSigned-off-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NAkash Goel <akash.goel@intel.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 10月, 2016 1 次提交
-
-
由 Emese Revfy 提交于
This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals. The need for very-early boot entropy tends to be very architecture or system design specific, so this plugin is more suited for those sorts of special cases. The existing kernel RNG already attempts to extract entropy from reliable runtime variation, but this plugin takes the idea to a logical extreme by permuting a global variable based on any variation in code execution (e.g. a different value (and permutation function) is used to permute the global based on loop count, case statement, if/then/else branching, etc). To do this, the plugin starts by inserting a local variable in every marked function. The plugin then adds logic so that the value of this variable is modified by randomly chosen operations (add, xor and rol) and random values (gcc generates separate static values for each location at compile time and also injects the stack pointer at runtime). The resulting value depends on the control flow path (e.g., loops and branches taken). Before the function returns, the plugin mixes this local variable into the latent_entropy global variable. The value of this global variable is added to the kernel entropy pool in do_one_initcall() and _do_fork(), though it does not credit any bytes of entropy to the pool; the contents of the global are just used to mix the pool. Additionally, the plugin can pre-initialize arrays with build-time random contents, so that two different kernel builds running on identical hardware will not have the same starting values. Signed-off-by: NEmese Revfy <re.emese@gmail.com> [kees: expanded commit message and code comments] Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 21 9月, 2016 1 次提交
-
-
由 Helge Deller 提交于
PARISC was the only architecture which selected the BROKEN_RODATA config option. Drop it and remove the special handling from init.h as well. Signed-off-by: NHelge Deller <deller@gmx.de>
-
- 18 9月, 2016 1 次提交
-
-
由 Tejun Heo 提交于
Workqueue is currently initialized in an early init call; however, there are cases where early boot code has to be split and reordered to come after workqueue initialization or the same code path which makes use of workqueues is used both before workqueue initailization and after. The latter cases have to gate workqueue usages with keventd_up() tests, which is nasty and easy to get wrong. Workqueue usages have become widespread and it'd be a lot more convenient if it can be used very early from boot. This patch splits workqueue initialization into two steps. workqueue_init_early() which sets up the basic data structures so that workqueues can be created and work items queued, and workqueue_init() which actually brings up workqueues online and starts executing queued work items. The former step can be done very early during boot once memory allocation, cpumasks and idr are initialized. The latter right after kthreads become available. This allows work item queueing and canceling from very early boot which is what most of these use cases want. * As systemd_wq being initialized doesn't indicate that workqueue is fully online anymore, update keventd_up() to test wq_online instead. The follow-up patches will get rid of all its usages and the function itself. * Flushing doesn't make sense before workqueue is fully initialized. The flush functions trigger WARN and return immediately before fully online. * Work items are never in-flight before fully online. Canceling can always succeed by skipping the flush step. * Some code paths can no longer assume to be called with irq enabled as irq is disabled during early boot. Use irqsave/restore operations instead. v2: Watchdog init, which requires timer to be running, moved from workqueue_init_early() to workqueue_init(). Signed-off-by: NTejun Heo <tj@kernel.org> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
-
- 16 9月, 2016 1 次提交
-
-
由 Andy Lutomirski 提交于
There are a few places in the kernel that access stack memory belonging to a different task. Before we can start freeing task stacks before the task_struct is freed, we need a way for those code paths to pin the stack. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/17a434f50ad3d77000104f21666575e10a9c1fbd.1474003868.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 9月, 2016 1 次提交
-
-
由 Andy Lutomirski 提交于
If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT, then thread_info is defined as a single 'u32 flags' and is the first entry of task_struct. thread_info::task is removed (it serves no purpose if thread_info is embedded in task_struct), and thread_info::cpu gets its own slot in task_struct. This is heavily based on a patch written by Linus. Originally-from: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 9月, 2016 1 次提交
-
-
由 Nicholas Piggin 提交于
Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to select to build with -ffunction-sections, -fdata-sections, and link with --gc-sections. It requires some work (documented) to ensure all unreferenced entrypoints are live, and requires toolchain and build verification, so it is made a per-arch option for now. On a random powerpc64le build, this yelds a significant size saving, it boots and runs fine, but there is a lot I haven't tested as yet, so these savings may be reduced if there are bugs in the link. text data bss dec filename 11169741 1180744 1923176 14273661 vmlinux 10445269 1004127 1919707 13369103 vmlinux.dce ~700K text, ~170K data, 6% removed from kernel image size. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichal Marek <mmarek@suse.com>
-
- 03 8月, 2016 6 次提交
-
-
由 Valdis Kletnieks 提交于
It doesn't trim just symbols that are totally unused in-tree - it trims the symbols unused by any in-tree modules actually built. If you've done a 'make localmodconfig' and only build a hundred or so modules, it's pretty likely that your out-of-tree module will come up lacking something... Hopefully this will save the next guy from a Homer Simpson "D'oh!" moment. Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.eduSigned-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Doing patches with allmodconfig kernel compiled and committing stuff into local tree have unfortunate consequence: kernel version changes (as it should) leading to recompiling and relinking of several files even if they weren't touched (or interesting at all). This and "git-whatever" figuring out current version slow down compilation for no good reason. But lets face it, "allmodconfig" kernels don't care about kernel version, they are simply compile check guinea pigs. Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into allmodconfig .config. Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Prarit Bhargava 提交于
sprint_symbol_no_offset() returns the string "function_name [module_name]" where [module_name] is not printed for built in kernel functions. This means that the blacklisting code will fail when comparing module function names with the extended string. This patch adds the functionality to block a module's module_init() function by finding the space in the string and truncating the comparison to that length. Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.comSigned-off-by: NPrarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shi <yang.shi@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fabian Frederick 提交于
There was only one use of __initdata_refok and __exit_refok __init_refok was used 46 times against 82 for __ref. Those definitions are obsolete since commit 312b1485 ("Introduce new section reference annotations tags: __ref, __refdata, __refconst") This patch removes the following compatibility definitions and replaces them treewide. /* compatibility defines */ #define __init_refok __ref #define __initdata_refok __refdata #define __exit_refok __ref I can also provide separate patches if necessary. (One patch per tree and check in 1 month or 2 to remove old definitions) [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard Weinberger 提交于
UML is a bit special since it does not have iomem nor dma. That means a lot of drivers will not build if they miss a dependency on HAS_IOMEM. s390 used to have the same issues but since it gained PCI support UML is the only stranger. We are tired of patching dozens of new drivers after every merge window just to un-break allmod/yesconfig UML builds. One could argue that a decent driver has to know on what it depends and therefore a missing HAS_IOMEM dependency is a clear driver bug. But the dependency not obvious and not everyone does UML builds with COMPILE_TEST enabled when developing a device driver. A possible solution to make these builds succeed on UML would be providing stub functions for ioremap() and friends which fail upon runtime. Another one is simply disabling COMPILE_TEST for UML. Since it is the least hassle and does not force use to fake iomem support let's do the latter. Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.atSigned-off-by: NRichard Weinberger <richard@nod.at> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 seokhoon.yoon 提交于
cgroup's document path is changed to "cgroup-v1". update it. Link: http://lkml.kernel.org/r/1470148443-6509-1-git-send-email-iamyooon@gmail.comSigned-off-by: Nseokhoon.yoon <iamyooon@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 7月, 2016 3 次提交
-
-
由 Thomas Garnier 提交于
Implements freelist randomization for the SLUB allocator. It was previous implemented for the SLAB allocator. Both use the same configuration option (CONFIG_SLAB_FREELIST_RANDOM). The list is randomized during initialization of a new set of pages. The order on different freelist sizes is pre-computed at boot for performance. Each kmem_cache has its own randomized freelist. This security feature reduces the predictability of the kernel SLUB allocator against heap overflows rendering attacks much less stable. For example these attacks exploit the predictability of the heap: - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) Performance results: slab_test impact is between 3% to 4% on average for 100000 attempts without smp. It is a very focused testing, kernbench show the overall impact on the system is way lower. Before: Single thread testing ===================== 1. Kmalloc: Repeatedly allocate then free test 100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles 100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles 100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles 100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles 100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles 100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles 100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles 100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles 100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles 100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles 2. Kmalloc: alloc/free test 100000 times kmalloc(8)/kfree -> 70 cycles 100000 times kmalloc(16)/kfree -> 70 cycles 100000 times kmalloc(32)/kfree -> 70 cycles 100000 times kmalloc(64)/kfree -> 70 cycles 100000 times kmalloc(128)/kfree -> 70 cycles 100000 times kmalloc(256)/kfree -> 69 cycles 100000 times kmalloc(512)/kfree -> 70 cycles 100000 times kmalloc(1024)/kfree -> 73 cycles 100000 times kmalloc(2048)/kfree -> 72 cycles 100000 times kmalloc(4096)/kfree -> 71 cycles After: Single thread testing ===================== 1. Kmalloc: Repeatedly allocate then free test 100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles 100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles 100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles 100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles 100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles 100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles 100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles 100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles 100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles 100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles 2. Kmalloc: alloc/free test 100000 times kmalloc(8)/kfree -> 66 cycles 100000 times kmalloc(16)/kfree -> 66 cycles 100000 times kmalloc(32)/kfree -> 66 cycles 100000 times kmalloc(64)/kfree -> 66 cycles 100000 times kmalloc(128)/kfree -> 65 cycles 100000 times kmalloc(256)/kfree -> 67 cycles 100000 times kmalloc(512)/kfree -> 67 cycles 100000 times kmalloc(1024)/kfree -> 64 cycles 100000 times kmalloc(2048)/kfree -> 67 cycles 100000 times kmalloc(4096)/kfree -> 67 cycles Kernbench, before: Average Optimal load -j 12 Run (std deviation): Elapsed Time 101.873 (1.16069) User Time 1045.22 (1.60447) System Time 88.969 (0.559195) Percent CPU 1112.9 (13.8279) Context Switches 189140 (2282.15) Sleeps 99008.6 (768.091) After: Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.47 (0.562732) User Time 1045.3 (1.34263) System Time 88.311 (0.342554) Percent CPU 1105.8 (6.49444) Context Switches 189081 (2355.78) Sleeps 99231.5 (800.358) Link: http://lkml.kernel.org/r/1464295031-26375-3-git-send-email-thgarnie@google.comSigned-off-by: NThomas Garnier <thgarnie@google.com> Reviewed-by: NKees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the SLUB allocator to catch any copies that may span objects. Includes a redzone handling fix discovered by Michael Ellerman. Based on code from PaX and grsecurity. Signed-off-by: NKees Cook <keescook@chromium.org> Tested-by: NMichael Ellerman <mpe@ellerman.id.au> Reviwed-by: NLaura Abbott <labbott@redhat.com>
-
由 Kees Cook 提交于
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the SLAB allocator to catch any copies that may span objects. Based on code from PaX and grsecurity. Signed-off-by: NKees Cook <keescook@chromium.org> Tested-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
-
- 14 7月, 2016 1 次提交
-
-
由 Rik van Riel 提交于
The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not appear to currently work right. On CPUs without nohz_full=, only tick based irq time sampling is done, which breaks down when dealing with a nohz_idle CPU. On firewalls and similar systems, no ticks may happen on a CPU for a while, and the irq time spent may never get accounted properly. This can cause issues with capacity planning and power saving, which use the CPU statistics as inputs in decision making. Remove the VTIME_GEN vtime irq time code, and replace it with the IRQ_TIME_ACCOUNTING code, when selected as a config option by the user. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 7月, 2016 1 次提交
-
-
由 Randy Dunlap 提交于
The "expert" menu was broken (split) such that all entries in it after KALLSYMS were displayed in the "General setup" area instead of in the "Expert users" area. Fix this by adding one kconfig dependency. Yes, the Expert users menu is fragile. Problems like this have happened several times in the past. I will attempt to isolate the Expert users menu if there is interest in that. Fixes: 4d5d5664 ("x86: kallsyms: disable absolute percpu symbols on !SMP") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: stable@vger.kernel.org # 4.6 Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 6月, 2016 2 次提交
-
-
由 Rasmus Villemoes 提交于
When I replaced kasprintf("%pf") with a direct call to sprint_symbol_no_offset I must have broken the initcall blacklisting feature on the arches where dereference_function_descriptor() is non-trivial. Fixes: c8cdd2be (init/main.c: simplify initcall_blacklisted()) Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Yang Shi <yang.shi@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
We've had the thread info allocated together with the thread stack for most architectures for a long time (since the thread_info was split off from the task struct), but that is about to change. But the patches that move the thread info to be off-stack (and a part of the task struct instead) made it clear how confused the allocator and freeing functions are. Because the common case was that we share an allocation with the thread stack and the thread_info, the two pointers were identical. That identity then meant that we would have things like ti = alloc_thread_info_node(tsk, node); ... tsk->stack = ti; which certainly _worked_ (since stack and thread_info have the same value), but is rather confusing: why are we assigning a thread_info to the stack? And if we move the thread_info away, the "confusing" code just gets to be entirely bogus. So remove all this confusion, and make it clear that we are doing the stack allocation by renaming and clarifying the function names to be about the stack. The fact that the thread_info then shares the allocation is an implementation detail, and not really about the allocation itself. This is a pure renaming and type fix: we pass in the same pointer, it's just that we clarify what the pointer means. The ia64 code that actually only has one single allocation (for all of task_struct, thread_info and kernel thread stack) now looks a bit odd, but since "tsk->stack" is actually not even used there, that oddity doesn't matter. It would be a separate thing to clean that up, I intentionally left the ia64 changes as a pure brute-force renaming and type change. Acked-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 6月, 2016 1 次提交
-
-
由 Geert Uytterhoeven 提交于
[jkosina@suse.cz: folded another fix on top on the same line as spotted by Randy Dunlap] Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 16 6月, 2016 1 次提交
-
-
由 Paul E. McKenney 提交于
Usermode Linux currently does not implement arch_irqs_disabled_flags(), which results in a build failure in TASKS_RCU. Therefore, this commit disables the TASKS_RCU Kconfig option in usermode Linux builds. The usermode Linux maintainers expect to merge arch_irqs_disabled_flags() into 4.8, at which point this commit may be reverted. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Acked-by: NRichard Weinberger <richard@nod.at>
-
- 28 5月, 2016 1 次提交
-
-
由 Yang Shi 提交于
page_ext_init() checks suitable pages with pfn_to_nid(), but pfn_to_nid() depends on memmap which will not be setup fully until page_alloc_init_late() is done. Use early_pfn_to_nid() instead of pfn_to_nid() so that page extension could be still used early even though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page allocation call sites. Suggested by Joonsoo Kim [1], this fix basically undoes the change introduced by commit b8f1a75d ("mm: call page_ext_init() after all struct pages are initialized") and fixes the same problem with a better approach. [1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 5月, 2016 4 次提交
-
-
由 Rasmus Villemoes 提交于
Using kasprintf to get the function name makes us look up the name twice, along with all the vsnprintf overhead of parsing the format string etc. It also means there is an allocation failure case to deal with. Since symbol_string in vsprintf.c would anyway allocate an array of size KSYM_SYMBOL_LEN on the stack, that might as well be done up here. Moreover, since this is a debug feature and the blacklisted_initcalls list is usually empty, we might as well test that and thus avoid looking up the symbol name even once in the common case. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
Testing has shown that the backtrace sometimes does not fit into the 4kB temporary buffer that is used in NMI context. The warnings are gone when I double the temporary buffer size. This patch doubles the buffer size and makes it configurable. Note that this problem existed even in the x86-specific implementation that was added by the commit a9edc880 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). Nobody noticed it because it did not print any warnings. Signed-off-by: NPetr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Mladek 提交于
printk() takes some locks and could not be used a safe way in NMI context. The chance of a deadlock is real especially when printing stacks from all CPUs. This particular problem has been addressed on x86 by the commit a9edc880 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). The patchset brings two big advantages. First, it makes the NMI backtraces safe on all architectures for free. Second, it makes all NMI messages almost safe on all architectures (the temporary buffer is limited. We still should keep the number of messages in NMI context at minimum). Note that there already are several messages printed in NMI context: WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE handlers. These are not easy to avoid. This patch reuses most of the code and makes it generic. It is useful for all messages and architectures that support NMI. The alternative printk_func is set when entering and is reseted when leaving NMI context. It queues IRQ work to copy the messages into the main ring buffer in a safe context. __printk_nmi_flush() copies all available messages and reset the buffer. Then we could use a simple cmpxchg operations to get synchronized with writers. There is also used a spinlock to get synchronized with other flushers. We do not longer use seq_buf because it depends on external lock. It would be hard to make all supported operations safe for a lockless use. It would be confusing and error prone to make only some operations safe. The code is put into separate printk/nmi.c as suggested by Steven Rostedt. It needs a per-CPU buffer and is compiled only on architectures that call nmi_enter(). This is achieved by the new HAVE_NMI Kconfig flag. The are MN10300 and Xtensa architectures. We need to clean up NMI handling there first. Let's do it separately. The patch is heavily based on the draft from Peter Zijlstra, see https://lkml.org/lkml/2015/6/10/327 [arnd@arndb.de: printk-nmi: use %zu format string for size_t] [akpm@linux-foundation.org: min_t->min - all types are size_t here] Signed-off-by: NPetr Mladek <pmladek@suse.com> Suggested-by: NPeter Zijlstra <peterz@infradead.org> Suggested-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> [arm part] Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Shi 提交于
When DEFERRED_STRUCT_PAGE_INIT is enabled, just a subset of memmap at boot are initialized, then the rest are initialized in parallel by starting one-off "pgdatinitX" kernel thread for each node X. If page_ext_init is called before it, some pages will not have valid extension, this may lead the below kernel oops when booting up kernel: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0 PGD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 11 PID: 106 Comm: pgdatinit1 Not tainted 4.6.0-rc5-next-20160427 #26 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009 task: ffff88017c080040 ti: ffff88017c084000 task.ti: ffff88017c084000 RIP: 0010:[<ffffffff8118d982>] [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0 RSP: 0000:ffff88017c087c48 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000980 RSI: 0000000000000080 RDI: 0000000000660401 RBP: ffff88017c087cd0 R08: 0000000000000401 R09: 0000000000000009 R10: ffff88017c080040 R11: 000000000000000a R12: 0000000000000400 R13: ffffea0019810000 R14: ffffea0019810040 R15: ffff88066cfe6080 FS: 0000000000000000(0000) GS:ffff88066cd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002406000 CR4: 00000000000006e0 Call Trace: free_hot_cold_page+0x192/0x1d0 __free_pages+0x5c/0x90 __free_pages_boot_core+0x11a/0x14e deferred_free_range+0x50/0x62 deferred_init_memmap+0x220/0x3c3 kthread+0xf8/0x110 ret_from_fork+0x22/0x40 Code: 49 89 d4 48 c1 e0 06 49 01 c5 e9 de fe ff ff 4c 89 f7 44 89 4d b8 4c 89 45 c0 44 89 5d c8 48 89 4d d0 e8 62 c7 07 00 48 8b 4d d0 <48> 8b 00 44 8b 5d c8 4c 8b 45 c0 44 8b 4d b8 a8 02 0f 84 05 ff RIP [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0 RSP <ffff88017c087c48> CR2: 0000000000000000 Move page_ext_init() after page_alloc_init_late() to make sure page extension is setup for all pages. Link: http://lkml.kernel.org/r/1463696006-31360-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-