1. 26 11月, 2016 1 次提交
    • D
      cgroup: add support for eBPF programs · 30070984
      Daniel Mack 提交于
      This patch adds two sets of eBPF program pointers to struct cgroup.
      One for such that are directly pinned to a cgroup, and one for such
      that are effective for it.
      
      To illustrate the logic behind that, assume the following example
      cgroup hierarchy.
      
        A - B - C
              \ D - E
      
      If only B has a program attached, it will be effective for B, C, D
      and E. If D then attaches a program itself, that will be effective for
      both D and E, and the program in B will only affect B and C. Only one
      program of a given type is effective for a cgroup.
      
      Attaching and detaching programs will be done through the bpf(2)
      syscall. For now, ingress and egress inet socket filtering are the
      only supported use-cases.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30070984
  2. 12 10月, 2016 1 次提交
    • P
      relay: Use irq_work instead of plain timer for deferred wakeup · 26b5679e
      Peter Zijlstra 提交于
      Relay avoids calling wake_up_interruptible() for doing the wakeup of
      readers/consumers, waiting for the generation of new data, from the
      context of a process which produced the data.  This is apparently done to
      prevent the possibility of a deadlock in case Scheduler itself is is
      generating data for the relay, after acquiring rq->lock.
      
      The following patch used a timer (to be scheduled at next jiffy), for
      delegating the wakeup to another context.
      	commit 7c9cb383
      	Author: Tom Zanussi <zanussi@comcast.net>
      	Date:   Wed May 9 02:34:01 2007 -0700
      
      	relay: use plain timer instead of delayed work
      
      	relay doesn't need to use schedule_delayed_work() for waking readers
      	when a simple timer will do.
      
      Scheduling a plain timer, at next jiffies boundary, to do the wakeup
      causes a significant wakeup latency for the Userspace client, which makes
      relay less suitable for the high-frequency low-payload use cases where the
      data gets generated at a very high rate, like multiple sub buffers getting
      filled within a milli second.  Moreover the timer is re-scheduled on every
      newly produced sub buffer so the timer keeps getting pushed out if sub
      buffers are filled in a very quick succession (less than a jiffy gap
      between filling of 2 sub buffers).  As a result relay runs out of sub
      buffers to store the new data.
      
      By using irq_work it is ensured that wakeup of userspace client, blocked
      in the poll call, is done at earliest (through self IPI or next timer
      tick) enabling it to always consume the data in time.  Also this makes
      relay consistent with printk & ring buffers (trace), as they too use
      irq_work for deferred wake up of readers.
      
      [arnd@arndb.de: select CONFIG_IRQ_WORK]
       Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.comSigned-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26b5679e
  3. 11 10月, 2016 1 次提交
    • E
      gcc-plugins: Add latent_entropy plugin · 38addce8
      Emese Revfy 提交于
      This adds a new gcc plugin named "latent_entropy". It is designed to
      extract as much possible uncertainty from a running system at boot time as
      possible, hoping to capitalize on any possible variation in CPU operation
      (due to runtime data differences, hardware differences, SMP ordering,
      thermal timing variation, cache behavior, etc).
      
      At the very least, this plugin is a much more comprehensive example for
      how to manipulate kernel code using the gcc plugin internals.
      
      The need for very-early boot entropy tends to be very architecture or
      system design specific, so this plugin is more suited for those sorts
      of special cases. The existing kernel RNG already attempts to extract
      entropy from reliable runtime variation, but this plugin takes the idea to
      a logical extreme by permuting a global variable based on any variation
      in code execution (e.g. a different value (and permutation function)
      is used to permute the global based on loop count, case statement,
      if/then/else branching, etc).
      
      To do this, the plugin starts by inserting a local variable in every
      marked function. The plugin then adds logic so that the value of this
      variable is modified by randomly chosen operations (add, xor and rol) and
      random values (gcc generates separate static values for each location at
      compile time and also injects the stack pointer at runtime). The resulting
      value depends on the control flow path (e.g., loops and branches taken).
      
      Before the function returns, the plugin mixes this local variable into
      the latent_entropy global variable. The value of this global variable
      is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
      though it does not credit any bytes of entropy to the pool; the contents
      of the global are just used to mix the pool.
      
      Additionally, the plugin can pre-initialize arrays with build-time
      random contents, so that two different kernel builds running on identical
      hardware will not have the same starting values.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message and code comments]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      38addce8
  4. 21 9月, 2016 1 次提交
  5. 16 9月, 2016 1 次提交
  6. 15 9月, 2016 1 次提交
  7. 09 9月, 2016 1 次提交
    • N
      kbuild: allow archs to select link dead code/data elimination · b67067f1
      Nicholas Piggin 提交于
      Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
      select to build with -ffunction-sections, -fdata-sections, and link
      with --gc-sections. It requires some work (documented) to ensure all
      unreferenced entrypoints are live, and requires toolchain and build
      verification, so it is made a per-arch option for now.
      
      On a random powerpc64le build, this yelds a significant size saving,
      it boots and runs fine, but there is a lot I haven't tested as yet, so
      these savings may be reduced if there are bugs in the link.
      
          text      data        bss        dec   filename
      11169741   1180744    1923176	14273661   vmlinux
      10445269   1004127    1919707	13369103   vmlinux.dce
      
      ~700K text, ~170K data, 6% removed from kernel image size.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      b67067f1
  8. 03 8月, 2016 6 次提交
  9. 27 7月, 2016 3 次提交
    • T
      mm: SLUB freelist randomization · 210e7a43
      Thomas Garnier 提交于
      Implements freelist randomization for the SLUB allocator.  It was
      previous implemented for the SLAB allocator.  Both use the same
      configuration option (CONFIG_SLAB_FREELIST_RANDOM).
      
      The list is randomized during initialization of a new set of pages.  The
      order on different freelist sizes is pre-computed at boot for
      performance.  Each kmem_cache has its own randomized freelist.
      
      This security feature reduces the predictability of the kernel SLUB
      allocator against heap overflows rendering attacks much less stable.
      
      For example these attacks exploit the predictability of the heap:
       - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
       - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)
      
      Performance results:
      
      slab_test impact is between 3% to 4% on average for 100000 attempts
      without smp.  It is a very focused testing, kernbench show the overall
      impact on the system is way lower.
      
      Before:
      
        Single thread testing
        =====================
        1. Kmalloc: Repeatedly allocate then free test
        100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
        100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
        100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
        100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
        100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
        100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
        100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
        100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
        100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
        100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
        2. Kmalloc: alloc/free test
        100000 times kmalloc(8)/kfree -> 70 cycles
        100000 times kmalloc(16)/kfree -> 70 cycles
        100000 times kmalloc(32)/kfree -> 70 cycles
        100000 times kmalloc(64)/kfree -> 70 cycles
        100000 times kmalloc(128)/kfree -> 70 cycles
        100000 times kmalloc(256)/kfree -> 69 cycles
        100000 times kmalloc(512)/kfree -> 70 cycles
        100000 times kmalloc(1024)/kfree -> 73 cycles
        100000 times kmalloc(2048)/kfree -> 72 cycles
        100000 times kmalloc(4096)/kfree -> 71 cycles
      
      After:
      
        Single thread testing
        =====================
        1. Kmalloc: Repeatedly allocate then free test
        100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
        100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
        100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
        100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
        100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
        100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
        100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
        100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
        100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
        100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
        2. Kmalloc: alloc/free test
        100000 times kmalloc(8)/kfree -> 66 cycles
        100000 times kmalloc(16)/kfree -> 66 cycles
        100000 times kmalloc(32)/kfree -> 66 cycles
        100000 times kmalloc(64)/kfree -> 66 cycles
        100000 times kmalloc(128)/kfree -> 65 cycles
        100000 times kmalloc(256)/kfree -> 67 cycles
        100000 times kmalloc(512)/kfree -> 67 cycles
        100000 times kmalloc(1024)/kfree -> 64 cycles
        100000 times kmalloc(2048)/kfree -> 67 cycles
        100000 times kmalloc(4096)/kfree -> 67 cycles
      
      Kernbench, before:
      
        Average Optimal load -j 12 Run (std deviation):
        Elapsed Time 101.873 (1.16069)
        User Time 1045.22 (1.60447)
        System Time 88.969 (0.559195)
        Percent CPU 1112.9 (13.8279)
        Context Switches 189140 (2282.15)
        Sleeps 99008.6 (768.091)
      
      After:
      
        Average Optimal load -j 12 Run (std deviation):
        Elapsed Time 102.47 (0.562732)
        User Time 1045.3 (1.34263)
        System Time 88.311 (0.342554)
        Percent CPU 1105.8 (6.49444)
        Context Switches 189081 (2355.78)
        Sleeps 99231.5 (800.358)
      
      Link: http://lkml.kernel.org/r/1464295031-26375-3-git-send-email-thgarnie@google.comSigned-off-by: NThomas Garnier <thgarnie@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      210e7a43
    • K
      mm: SLUB hardened usercopy support · ed18adc1
      Kees Cook 提交于
      Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
      SLUB allocator to catch any copies that may span objects. Includes a
      redzone handling fix discovered by Michael Ellerman.
      
      Based on code from PaX and grsecurity.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviwed-by: NLaura Abbott <labbott@redhat.com>
      ed18adc1
    • K
      mm: SLAB hardened usercopy support · 04385fc5
      Kees Cook 提交于
      Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
      SLAB allocator to catch any copies that may span objects.
      
      Based on code from PaX and grsecurity.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Tested-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      04385fc5
  10. 14 7月, 2016 1 次提交
  11. 07 7月, 2016 1 次提交
    • R
      init/Kconfig: keep Expert users menu together · 076501ff
      Randy Dunlap 提交于
      The "expert" menu was broken (split) such that all entries in it after
      KALLSYMS were displayed in the "General setup" area instead of in the
      "Expert users" area.  Fix this by adding one kconfig dependency.
      
      Yes, the Expert users menu is fragile.  Problems like this have happened
      several times in the past.  I will attempt to isolate the Expert users
      menu if there is interest in that.
      
      Fixes: 4d5d5664 ("x86: kallsyms: disable absolute percpu symbols on !SMP")
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: stable@vger.kernel.org  # 4.6
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      076501ff
  12. 25 6月, 2016 2 次提交
    • R
      init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64 · 0fd5ed8d
      Rasmus Villemoes 提交于
      When I replaced kasprintf("%pf") with a direct call to
      sprint_symbol_no_offset I must have broken the initcall blacklisting
      feature on the arches where dereference_function_descriptor() is
      non-trivial.
      
      Fixes: c8cdd2be (init/main.c: simplify initcall_blacklisted())
      Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fd5ed8d
    • L
      Clarify naming of thread info/stack allocators · b235beea
      Linus Torvalds 提交于
      We've had the thread info allocated together with the thread stack for
      most architectures for a long time (since the thread_info was split off
      from the task struct), but that is about to change.
      
      But the patches that move the thread info to be off-stack (and a part of
      the task struct instead) made it clear how confused the allocator and
      freeing functions are.
      
      Because the common case was that we share an allocation with the thread
      stack and the thread_info, the two pointers were identical.  That
      identity then meant that we would have things like
      
      	ti = alloc_thread_info_node(tsk, node);
      	...
      	tsk->stack = ti;
      
      which certainly _worked_ (since stack and thread_info have the same
      value), but is rather confusing: why are we assigning a thread_info to
      the stack? And if we move the thread_info away, the "confusing" code
      just gets to be entirely bogus.
      
      So remove all this confusion, and make it clear that we are doing the
      stack allocation by renaming and clarifying the function names to be
      about the stack.  The fact that the thread_info then shares the
      allocation is an implementation detail, and not really about the
      allocation itself.
      
      This is a pure renaming and type fix: we pass in the same pointer, it's
      just that we clarify what the pointer means.
      
      The ia64 code that actually only has one single allocation (for all of
      task_struct, thread_info and kernel thread stack) now looks a bit odd,
      but since "tsk->stack" is actually not even used there, that oddity
      doesn't matter.  It would be a separate thing to clean that up, I
      intentionally left the ia64 changes as a pure brute-force renaming and
      type change.
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b235beea
  13. 21 6月, 2016 1 次提交
  14. 16 6月, 2016 1 次提交
  15. 28 5月, 2016 1 次提交
  16. 21 5月, 2016 4 次提交
    • R
      init/main.c: simplify initcall_blacklisted() · c8cdd2be
      Rasmus Villemoes 提交于
      Using kasprintf to get the function name makes us look up the name
      twice, along with all the vsnprintf overhead of parsing the format
      string etc.  It also means there is an allocation failure case to deal
      with.  Since symbol_string in vsprintf.c would anyway allocate an array
      of size KSYM_SYMBOL_LEN on the stack, that might as well be done up
      here.
      
      Moreover, since this is a debug feature and the blacklisted_initcalls
      list is usually empty, we might as well test that and thus avoid looking
      up the symbol name even once in the common case.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8cdd2be
    • P
      printk/nmi: increase the size of NMI buffer and make it configurable · 427934b8
      Petr Mladek 提交于
      Testing has shown that the backtrace sometimes does not fit into the 4kB
      temporary buffer that is used in NMI context.  The warnings are gone
      when I double the temporary buffer size.
      
      This patch doubles the buffer size and makes it configurable.
      
      Note that this problem existed even in the x86-specific implementation
      that was added by the commit a9edc880 ("x86/nmi: Perform a safe NMI
      stack trace on all CPUs").  Nobody noticed it because it did not print
      any warnings.
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      427934b8
    • P
      printk/nmi: generic solution for safe printk in NMI · 42a0bb3f
      Petr Mladek 提交于
      printk() takes some locks and could not be used a safe way in NMI
      context.
      
      The chance of a deadlock is real especially when printing stacks from
      all CPUs.  This particular problem has been addressed on x86 by the
      commit a9edc880 ("x86/nmi: Perform a safe NMI stack trace on all
      CPUs").
      
      The patchset brings two big advantages.  First, it makes the NMI
      backtraces safe on all architectures for free.  Second, it makes all NMI
      messages almost safe on all architectures (the temporary buffer is
      limited.  We still should keep the number of messages in NMI context at
      minimum).
      
      Note that there already are several messages printed in NMI context:
      WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
      handlers.  These are not easy to avoid.
      
      This patch reuses most of the code and makes it generic.  It is useful
      for all messages and architectures that support NMI.
      
      The alternative printk_func is set when entering and is reseted when
      leaving NMI context.  It queues IRQ work to copy the messages into the
      main ring buffer in a safe context.
      
      __printk_nmi_flush() copies all available messages and reset the buffer.
      Then we could use a simple cmpxchg operations to get synchronized with
      writers.  There is also used a spinlock to get synchronized with other
      flushers.
      
      We do not longer use seq_buf because it depends on external lock.  It
      would be hard to make all supported operations safe for a lockless use.
      It would be confusing and error prone to make only some operations safe.
      
      The code is put into separate printk/nmi.c as suggested by Steven
      Rostedt.  It needs a per-CPU buffer and is compiled only on
      architectures that call nmi_enter().  This is achieved by the new
      HAVE_NMI Kconfig flag.
      
      The are MN10300 and Xtensa architectures.  We need to clean up NMI
      handling there first.  Let's do it separately.
      
      The patch is heavily based on the draft from Peter Zijlstra, see
      
        https://lkml.org/lkml/2015/6/10/327
      
      [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
      [akpm@linux-foundation.org: min_t->min - all types are size_t here]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>	[arm part]
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42a0bb3f
    • Y
      mm: call page_ext_init() after all struct pages are initialized · b8f1a75d
      Yang Shi 提交于
      When DEFERRED_STRUCT_PAGE_INIT is enabled, just a subset of memmap at
      boot are initialized, then the rest are initialized in parallel by
      starting one-off "pgdatinitX" kernel thread for each node X.
      
      If page_ext_init is called before it, some pages will not have valid
      extension, this may lead the below kernel oops when booting up kernel:
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
        PGD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in:
        CPU: 11 PID: 106 Comm: pgdatinit1 Not tainted 4.6.0-rc5-next-20160427 #26
        Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009
        task: ffff88017c080040 ti: ffff88017c084000 task.ti: ffff88017c084000
        RIP: 0010:[<ffffffff8118d982>]  [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
        RSP: 0000:ffff88017c087c48  EFLAGS: 00010046
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
        RDX: 0000000000000980 RSI: 0000000000000080 RDI: 0000000000660401
        RBP: ffff88017c087cd0 R08: 0000000000000401 R09: 0000000000000009
        R10: ffff88017c080040 R11: 000000000000000a R12: 0000000000000400
        R13: ffffea0019810000 R14: ffffea0019810040 R15: ffff88066cfe6080
        FS:  0000000000000000(0000) GS:ffff88066cd40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000002406000 CR4: 00000000000006e0
        Call Trace:
          free_hot_cold_page+0x192/0x1d0
          __free_pages+0x5c/0x90
          __free_pages_boot_core+0x11a/0x14e
          deferred_free_range+0x50/0x62
          deferred_init_memmap+0x220/0x3c3
          kthread+0xf8/0x110
          ret_from_fork+0x22/0x40
        Code: 49 89 d4 48 c1 e0 06 49 01 c5 e9 de fe ff ff 4c 89 f7 44 89 4d b8 4c 89 45 c0 44 89 5d c8 48 89 4d d0 e8 62 c7 07 00 48 8b 4d d0 <48> 8b 00 44 8b 5d c8 4c 8b 45 c0 44 8b 4d b8 a8 02 0f 84 05 ff
        RIP  [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
         RSP <ffff88017c087c48>
        CR2: 0000000000000000
      
      Move page_ext_init() after page_alloc_init_late() to make sure page extension
      is setup for all pages.
      
      Link: http://lkml.kernel.org/r/1463696006-31360-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8f1a75d
  17. 20 5月, 2016 1 次提交
    • T
      mm: SLAB freelist randomization · c7ce4f60
      Thomas Garnier 提交于
      Provides an optional config (CONFIG_SLAB_FREELIST_RANDOM) to randomize
      the SLAB freelist.  The list is randomized during initialization of a
      new set of pages.  The order on different freelist sizes is pre-computed
      at boot for performance.  Each kmem_cache has its own randomized
      freelist.  Before pre-computed lists are available freelists are
      generated dynamically.  This security feature reduces the predictability
      of the kernel SLAB allocator against heap overflows rendering attacks
      much less stable.
      
      For example this attack against SLUB (also applicable against SLAB)
      would be affected:
      
        https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
      
      Also, since v4.6 the freelist was moved at the end of the SLAB.  It
      means a controllable heap is opened to new attacks not yet publicly
      discussed.  A kernel heap overflow can be transformed to multiple
      use-after-free.  This feature makes this type of attack harder too.
      
      To generate entropy, we use get_random_bytes_arch because 0 bits of
      entropy is available in the boot stage.  In the worse case this function
      will fallback to the get_random_bytes sub API.  We also generate a shift
      random number to shift pre-computed freelist for each new set of pages.
      
      The config option name is not specific to the SLAB as this approach will
      be extended to other allocators like SLUB.
      
      Performance results highlighted no major changes:
      
      Hackbench (running 90 10 times):
      
        Before average: 0.0698
        After average: 0.0663 (-5.01%)
      
      slab_test 1 run on boot.  Difference only seen on the 2048 size test
      being the worse case scenario covered by freelist randomization.  New
      slab pages are constantly being created on the 10000 allocations.
      Variance should be mainly due to getting new pages every few
      allocations.
      
      Before:
      
        Single thread testing
        =====================
        1. Kmalloc: Repeatedly allocate then free test
        10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
        10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
        10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
        10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
        10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
        10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
        10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
        10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
        10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
        10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
        10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
        10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
        2. Kmalloc: alloc/free test
        10000 times kmalloc(8)/kfree -> 121 cycles
        10000 times kmalloc(16)/kfree -> 121 cycles
        10000 times kmalloc(32)/kfree -> 121 cycles
        10000 times kmalloc(64)/kfree -> 121 cycles
        10000 times kmalloc(128)/kfree -> 121 cycles
        10000 times kmalloc(256)/kfree -> 119 cycles
        10000 times kmalloc(512)/kfree -> 119 cycles
        10000 times kmalloc(1024)/kfree -> 119 cycles
        10000 times kmalloc(2048)/kfree -> 119 cycles
        10000 times kmalloc(4096)/kfree -> 121 cycles
        10000 times kmalloc(8192)/kfree -> 119 cycles
        10000 times kmalloc(16384)/kfree -> 119 cycles
      
      After:
      
        Single thread testing
        =====================
        1. Kmalloc: Repeatedly allocate then free test
        10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
        10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
        10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
        10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
        10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
        10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
        10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
        10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
        10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
        10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
        10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
        10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
        2. Kmalloc: alloc/free test
        10000 times kmalloc(8)/kfree -> 121 cycles
        10000 times kmalloc(16)/kfree -> 121 cycles
        10000 times kmalloc(32)/kfree -> 123 cycles
        10000 times kmalloc(64)/kfree -> 142 cycles
        10000 times kmalloc(128)/kfree -> 121 cycles
        10000 times kmalloc(256)/kfree -> 119 cycles
        10000 times kmalloc(512)/kfree -> 119 cycles
        10000 times kmalloc(1024)/kfree -> 119 cycles
        10000 times kmalloc(2048)/kfree -> 119 cycles
        10000 times kmalloc(4096)/kfree -> 119 cycles
        10000 times kmalloc(8192)/kfree -> 119 cycles
        10000 times kmalloc(16384)/kfree -> 119 cycles
      
      [akpm@linux-foundation.org: propagate gfp_t into cache_random_seq_create()]
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7ce4f60
  18. 10 5月, 2016 1 次提交
    • A
      Kbuild: change CC_OPTIMIZE_FOR_SIZE definition · 877417e6
      Arnd Bergmann 提交于
      CC_OPTIMIZE_FOR_SIZE disables the often useful -Wmaybe-unused warning,
      because that causes a ridiculous amount of false positives when combined
      with -Os.
      
      This means a lot of warnings don't show up in testing by the developers
      that should see them with an 'allmodconfig' kernel that has
      CC_OPTIMIZE_FOR_SIZE enabled, but only later in randconfig builds
      that don't.
      
      This changes the Kconfig logic around CC_OPTIMIZE_FOR_SIZE to make
      it a 'choice' statement defaulting to CC_OPTIMIZE_FOR_PERFORMANCE
      that gets added for this purpose. The allmodconfig and allyesconfig
      kernels now default to -O2 with the maybe-unused warning enabled.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      877417e6
  19. 02 4月, 2016 1 次提交
    • A
      Make CONFIG_FHANDLE default y · f76be617
      Andi Kleen 提交于
      Newer Fedora and OpenSUSE didn't boot with my standard configuration.
      It took me some time to figure out why, in fact I had to write a script
      to try different config options systematically.
      
      The problem is that something (systemd) in dracut depends on
      CONFIG_FHANDLE, which adds open by file handle syscalls.
      
      While it is set in defconfigs it is very easy to miss when updating
      older configs because it is not default y.
      
      Make it default y and also depend on EXPERT, as dracut use is likely
      widespread.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f76be617
  20. 30 3月, 2016 1 次提交
  21. 16 3月, 2016 3 次提交
  22. 05 3月, 2016 1 次提交
  23. 04 3月, 2016 1 次提交
    • D
      akcipher: Move the RSA DER encoding check to the crypto layer · d43de6c7
      David Howells 提交于
      Move the RSA EMSA-PKCS1-v1_5 encoding from the asymmetric-key public_key
      subtype to the rsa crypto module's pkcs1pad template.  This means that the
      public_key subtype no longer has any dependencies on public key type.
      
      To make this work, the following changes have been made:
      
       (1) The rsa pkcs1pad template is now used for RSA keys.  This strips off the
           padding and returns just the message hash.
      
       (2) In a previous patch, the pkcs1pad template gained an optional second
           parameter that, if given, specifies the hash used.  We now give this,
           and pkcs1pad checks the encoded message E(M) for the EMSA-PKCS1-v1_5
           encoding and verifies that the correct digest OID is present.
      
       (3) The crypto driver in crypto/asymmetric_keys/rsa.c is now reduced to
           something that doesn't care about what the encryption actually does
           and and has been merged into public_key.c.
      
       (4) CONFIG_PUBLIC_KEY_ALGO_RSA is gone.  Module signing must set
           CONFIG_CRYPTO_RSA=y instead.
      
      Thoughts:
      
       (*) Should the encoding style (eg. raw, EMSA-PKCS1-v1_5) also be passed to
           the padding template?  Should there be multiple padding templates
           registered that share most of the code?
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      d43de6c7
  24. 02 3月, 2016 2 次提交
    • T
      cpu/hotplug: Unpark smpboot threads from the state machine · 931ef163
      Thomas Gleixner 提交于
      Handle the smpboot threads in the state machine.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.295777684@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      931ef163
    • T
      cpu/hotplug: Convert to a state machine for the control processor · cff7d378
      Thomas Gleixner 提交于
      Move the split out steps into a callback array and let the cpu_up/down
      code iterate through the array functions. For now most of the
      callbacks are asymmetric to resemble the current hotplug maze.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.671816690@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cff7d378
  25. 22 2月, 2016 1 次提交
    • K
      mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings · d2aa1aca
      Kees Cook 提交于
      It may be useful to debug writes to the readonly sections of memory,
      so provide a cmdline "rodata=off" to allow for this. This can be
      expanded in the future to support "log" and "write" modes, but that
      will need to be architecture-specific.
      
      This also makes KDB software breakpoints more usable, as read-only
      mappings can now be disabled on any kernel.
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Brown <david.brown@linaro.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d2aa1aca
  26. 09 2月, 2016 1 次提交