1. 24 4月, 2017 2 次提交
  2. 19 4月, 2017 1 次提交
  3. 04 4月, 2017 1 次提交
    • S
      ftrace: Have init/main.c call ftrace directly to free init memory · b80f0f6c
      Steven Rostedt (VMware) 提交于
      Relying on free_reserved_area() to call ftrace to free init memory proved to
      not be sufficient. The issue is that on x86, when debug_pagealloc is
      enabled, the init memory is not freed, but simply set as not present. Since
      ftrace was uninformed of this, starting function tracing still tries to
      update pages that are not present according to the page tables, causing
      ftrace to bug, as well as killing the kernel itself.
      
      Instead of relying on free_reserved_area(), have init/main.c call ftrace
      directly just before it frees the init memory. Then it needs to use
      __init_begin and __init_end to know where the init memory location is.
      Looking at all archs (and testing what I can), it appears that this should
      work for each of them.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b80f0f6c
  4. 01 4月, 2017 1 次提交
    • M
      mm: move mm_percpu_wq initialization earlier · 597b7305
      Michal Hocko 提交于
      Yang Li has reported that drain_all_pages triggers a WARN_ON which means
      that this function is called earlier than the mm_percpu_wq is
      initialized on arm64 with CMA configured:
      
        WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:2423 drain_all_pages+0x244/0x25c
        Modules linked in:
        CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1-next-20170310-00027-g64dfbc5 #127
        Hardware name: Freescale Layerscape 2088A RDB Board (DT)
        task: ffffffc07c4a6d00 task.stack: ffffffc07c4a8000
        PC is at drain_all_pages+0x244/0x25c
        LR is at start_isolate_page_range+0x14c/0x1f0
        [...]
         drain_all_pages+0x244/0x25c
         start_isolate_page_range+0x14c/0x1f0
         alloc_contig_range+0xec/0x354
         cma_alloc+0x100/0x1fc
         dma_alloc_from_contiguous+0x3c/0x44
         atomic_pool_init+0x7c/0x208
         arm64_dma_init+0x44/0x4c
         do_one_initcall+0x38/0x128
         kernel_init_freeable+0x1a0/0x240
         kernel_init+0x10/0xfc
         ret_from_fork+0x10/0x20
      
      Fix this by moving the whole setup_vmstat which is an initcall right now
      to init_mm_internals which will be called right after the WQ subsystem
      is initialized.
      
      Link: http://lkml.kernel.org/r/20170315164021.28532-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NYang Li <pku.leo@gmail.com>
      Tested-by: NYang Li <pku.leo@gmail.com>
      Tested-by: NXiaolong Ye <xiaolong.ye@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      597b7305
  5. 25 3月, 2017 2 次提交
  6. 02 3月, 2017 5 次提交
  7. 28 2月, 2017 2 次提交
  8. 14 2月, 2017 1 次提交
    • M
      Reimplement IDR and IDA using the radix tree · 0a835c4f
      Matthew Wilcox 提交于
      The IDR is very similar to the radix tree.  It has some functionality that
      the radix tree did not have (alloc next free, cyclic allocation, a
      callback-based for_each, destroy tree), which is readily implementable on
      top of the radix tree.  A few small changes were needed in order to use a
      tag to represent nodes with free space below them.  More extensive
      changes were needed to support storing NULL as a valid entry in an IDR.
      Plain radix trees still interpret NULL as a not-present entry.
      
      The IDA is reimplemented as a client of the newly enhanced radix tree.  As
      in the current implementation, it uses a bitmap at the last level of the
      tree.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      0a835c4f
  9. 10 2月, 2017 1 次提交
    • P
      core: migrate exception table users off module.h and onto extable.h · 8a293be0
      Paul Gortmaker 提交于
      These files were including module.h for exception table related
      functions.  We've now separated that content out into its own file
      "extable.h" so now move over to that and where possible, avoid all
      the extra header content in module.h that we don't really need to
      compile these non-modular files.
      
      Note:
         init/main.c still needs module.h for __init_or_module
         kernel/extable.c still needs module.h for is_module_text_address
      
      ...and so we don't get the benefit of removing module.h from the cpp
      feed for these two files, unlike the almost universal 1:1 exchange
      of module.h for extable.h we were able to do in the arch dirs.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NJessica Yu <jeyu@redhat.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      8a293be0
  10. 08 2月, 2017 2 次提交
  11. 01 2月, 2017 1 次提交
  12. 28 1月, 2017 1 次提交
    • J
      random: use chacha20 for get_random_int/long · f5b98461
      Jason A. Donenfeld 提交于
      Now that our crng uses chacha20, we can rely on its speedy
      characteristics for replacing MD5, while simultaneously achieving a
      higher security guarantee. Before the idea was to use these functions if
      you wanted random integers that aren't stupidly insecure but aren't
      necessarily secure either, a vague gray zone, that hopefully was "good
      enough" for its users. With chacha20, we can strengthen this claim,
      since either we're using an rdrand-like instruction, or we're using the
      same crng as /dev/urandom. And it's faster than what was before.
      
      We could have chosen to replace this with a SipHash-derived function,
      which might be slightly faster, but at the cost of having yet another
      RNG construction in the kernel. By moving to chacha20, we have a single
      RNG to analyze and verify, and we also already get good performance
      improvements on all platforms.
      
      Implementation-wise, rather than use a generic buffer for both
      get_random_int/long and memcpy based on the size needs, we use a
      specific buffer for 32-bit reads and for 64-bit reads. This way, we're
      guaranteed to always have aligned accesses on all platforms. While
      slightly more verbose in C, the assembly this generates is a lot
      simpler than otherwise.
      
      Finally, on 32-bit platforms where longs and ints are the same size,
      we simply alias get_random_int to get_random_long.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Suggested-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f5b98461
  13. 14 1月, 2017 1 次提交
    • P
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra 提交于
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9881b024
  14. 26 12月, 2016 1 次提交
    • N
      mm: add PageWaiters indicating tasks are waiting for a page bit · 62906027
      Nicholas Piggin 提交于
      Add a new page flag, PageWaiters, to indicate the page waitqueue has
      tasks waiting. This can be tested rather than testing waitqueue_active
      which requires another cacheline load.
      
      This bit is always set when the page has tasks on page_waitqueue(page),
      and is set and cleared under the waitqueue lock. It may be set when
      there are no tasks on the waitqueue, which will cause a harmless extra
      wakeup check that will clears the bit.
      
      The generic bit-waitqueue infrastructure is no longer used for pages.
      Instead, waitqueues are used directly with a custom key type. The
      generic code was not flexible enough to have PageWaiters manipulation
      under the waitqueue lock (which simplifies concurrency).
      
      This improves the performance of page lock intensive microbenchmarks by
      2-3%.
      
      Putting two bits in the same word opens the opportunity to remove the
      memory barrier between clearing the lock bit and testing the waiters
      bit, after some work on the arch primitives (e.g., ensuring memory
      operand widths match and cover both bits).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62906027
  15. 10 12月, 2016 1 次提交
    • T
      x86/amd: Check for the C1E bug post ACPI subsystem init · e7ff3a47
      Thomas Gleixner 提交于
      AMD CPUs affected by the E400 erratum suffer from the issue that the
      local APIC timer stops when the CPU goes into C1E. Unfortunately there
      is no way to detect the affected CPUs on early boot. It's only possible
      to determine the range of possibly affected CPUs from the family/model
      range.
      
      The actual decision whether to enter C1E and thus cause the bug is done
      by the firmware and we need to detect that case late, after ACPI has
      been initialized.
      
      The current solution is to check in the idle routine whether the CPU is
      affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the
      K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is
      affected and the system is switched into forced broadcast mode.
      
      This is ineffective and on non-affected CPUs every entry to idle does
      the extra RDMSR.
      
      After doing some research it turns out that the bits are visible on the
      boot CPU right after the ACPI subsystem is initialized in the early
      boot process. So instead of polling for the bits in the idle loop, add
      a detection function after acpi_subsystem_init() and check for the MSR
      bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and
      the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it
      will stop in C1E state as well.
      
      The switch to broadcast mode cannot be done at this point because the
      boot CPU still uses HPET as a clockevent device and the local APIC timer
      is not yet calibrated and installed. The switch to broadcast mode on the
      affected CPUs needs to be done when the local APIC timer is actually set
      up.
      
      This allows to cleanup the amd_e400_idle() function in the next step.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e7ff3a47
  16. 29 11月, 2016 1 次提交
    • A
      module: fix DEBUG_SET_MODULE_RONX typo · 4d217a5a
      Arnd Bergmann 提交于
      The newly added 'rodata_enabled' global variable is protected by
      the wrong #ifdef, leading to a link error when CONFIG_DEBUG_SET_MODULE_RONX
      is turned on:
      
      kernel/module.o: In function `disable_ro_nx':
      module.c:(.text.unlikely.disable_ro_nx+0x88): undefined reference to `rodata_enabled'
      kernel/module.o: In function `module_disable_ro':
      module.c:(.text.module_disable_ro+0x8c): undefined reference to `rodata_enabled'
      kernel/module.o: In function `module_enable_ro':
      module.c:(.text.module_enable_ro+0xb0): undefined reference to `rodata_enabled'
      
      CONFIG_SET_MODULE_RONX does not exist, so use the correct one instead.
      
      Fixes: 39290b38 ("module: extend 'rodata=off' boot cmdline parameter to module mappings")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJessica Yu <jeyu@redhat.com>
      4d217a5a
  17. 28 11月, 2016 1 次提交
  18. 24 10月, 2016 1 次提交
  19. 11 10月, 2016 1 次提交
    • E
      gcc-plugins: Add latent_entropy plugin · 38addce8
      Emese Revfy 提交于
      This adds a new gcc plugin named "latent_entropy". It is designed to
      extract as much possible uncertainty from a running system at boot time as
      possible, hoping to capitalize on any possible variation in CPU operation
      (due to runtime data differences, hardware differences, SMP ordering,
      thermal timing variation, cache behavior, etc).
      
      At the very least, this plugin is a much more comprehensive example for
      how to manipulate kernel code using the gcc plugin internals.
      
      The need for very-early boot entropy tends to be very architecture or
      system design specific, so this plugin is more suited for those sorts
      of special cases. The existing kernel RNG already attempts to extract
      entropy from reliable runtime variation, but this plugin takes the idea to
      a logical extreme by permuting a global variable based on any variation
      in code execution (e.g. a different value (and permutation function)
      is used to permute the global based on loop count, case statement,
      if/then/else branching, etc).
      
      To do this, the plugin starts by inserting a local variable in every
      marked function. The plugin then adds logic so that the value of this
      variable is modified by randomly chosen operations (add, xor and rol) and
      random values (gcc generates separate static values for each location at
      compile time and also injects the stack pointer at runtime). The resulting
      value depends on the control flow path (e.g., loops and branches taken).
      
      Before the function returns, the plugin mixes this local variable into
      the latent_entropy global variable. The value of this global variable
      is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
      though it does not credit any bytes of entropy to the pool; the contents
      of the global are just used to mix the pool.
      
      Additionally, the plugin can pre-initialize arrays with build-time
      random contents, so that two different kernel builds running on identical
      hardware will not have the same starting values.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message and code comments]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      38addce8
  20. 18 9月, 2016 1 次提交
    • T
      workqueue: make workqueue available early during boot · 3347fa09
      Tejun Heo 提交于
      Workqueue is currently initialized in an early init call; however,
      there are cases where early boot code has to be split and reordered to
      come after workqueue initialization or the same code path which makes
      use of workqueues is used both before workqueue initailization and
      after.  The latter cases have to gate workqueue usages with
      keventd_up() tests, which is nasty and easy to get wrong.
      
      Workqueue usages have become widespread and it'd be a lot more
      convenient if it can be used very early from boot.  This patch splits
      workqueue initialization into two steps.  workqueue_init_early() which
      sets up the basic data structures so that workqueues can be created
      and work items queued, and workqueue_init() which actually brings up
      workqueues online and starts executing queued work items.  The former
      step can be done very early during boot once memory allocation,
      cpumasks and idr are initialized.  The latter right after kthreads
      become available.
      
      This allows work item queueing and canceling from very early boot
      which is what most of these use cases want.
      
      * As systemd_wq being initialized doesn't indicate that workqueue is
        fully online anymore, update keventd_up() to test wq_online instead.
        The follow-up patches will get rid of all its usages and the
        function itself.
      
      * Flushing doesn't make sense before workqueue is fully initialized.
        The flush functions trigger WARN and return immediately before fully
        online.
      
      * Work items are never in-flight before fully online.  Canceling can
        always succeed by skipping the flush step.
      
      * Some code paths can no longer assume to be called with irq enabled
        as irq is disabled during early boot.  Use irqsave/restore
        operations instead.
      
      v2: Watchdog init, which requires timer to be running, moved from
          workqueue_init_early() to workqueue_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
      3347fa09
  21. 03 8月, 2016 2 次提交
  22. 25 6月, 2016 2 次提交
    • R
      init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64 · 0fd5ed8d
      Rasmus Villemoes 提交于
      When I replaced kasprintf("%pf") with a direct call to
      sprint_symbol_no_offset I must have broken the initcall blacklisting
      feature on the arches where dereference_function_descriptor() is
      non-trivial.
      
      Fixes: c8cdd2be (init/main.c: simplify initcall_blacklisted())
      Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fd5ed8d
    • L
      Clarify naming of thread info/stack allocators · b235beea
      Linus Torvalds 提交于
      We've had the thread info allocated together with the thread stack for
      most architectures for a long time (since the thread_info was split off
      from the task struct), but that is about to change.
      
      But the patches that move the thread info to be off-stack (and a part of
      the task struct instead) made it clear how confused the allocator and
      freeing functions are.
      
      Because the common case was that we share an allocation with the thread
      stack and the thread_info, the two pointers were identical.  That
      identity then meant that we would have things like
      
      	ti = alloc_thread_info_node(tsk, node);
      	...
      	tsk->stack = ti;
      
      which certainly _worked_ (since stack and thread_info have the same
      value), but is rather confusing: why are we assigning a thread_info to
      the stack? And if we move the thread_info away, the "confusing" code
      just gets to be entirely bogus.
      
      So remove all this confusion, and make it clear that we are doing the
      stack allocation by renaming and clarifying the function names to be
      about the stack.  The fact that the thread_info then shares the
      allocation is an implementation detail, and not really about the
      allocation itself.
      
      This is a pure renaming and type fix: we pass in the same pointer, it's
      just that we clarify what the pointer means.
      
      The ia64 code that actually only has one single allocation (for all of
      task_struct, thread_info and kernel thread stack) now looks a bit odd,
      but since "tsk->stack" is actually not even used there, that oddity
      doesn't matter.  It would be a separate thing to clean that up, I
      intentionally left the ia64 changes as a pure brute-force renaming and
      type change.
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b235beea
  23. 28 5月, 2016 1 次提交
  24. 21 5月, 2016 3 次提交
    • R
      init/main.c: simplify initcall_blacklisted() · c8cdd2be
      Rasmus Villemoes 提交于
      Using kasprintf to get the function name makes us look up the name
      twice, along with all the vsnprintf overhead of parsing the format
      string etc.  It also means there is an allocation failure case to deal
      with.  Since symbol_string in vsprintf.c would anyway allocate an array
      of size KSYM_SYMBOL_LEN on the stack, that might as well be done up
      here.
      
      Moreover, since this is a debug feature and the blacklisted_initcalls
      list is usually empty, we might as well test that and thus avoid looking
      up the symbol name even once in the common case.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8cdd2be
    • P
      printk/nmi: generic solution for safe printk in NMI · 42a0bb3f
      Petr Mladek 提交于
      printk() takes some locks and could not be used a safe way in NMI
      context.
      
      The chance of a deadlock is real especially when printing stacks from
      all CPUs.  This particular problem has been addressed on x86 by the
      commit a9edc880 ("x86/nmi: Perform a safe NMI stack trace on all
      CPUs").
      
      The patchset brings two big advantages.  First, it makes the NMI
      backtraces safe on all architectures for free.  Second, it makes all NMI
      messages almost safe on all architectures (the temporary buffer is
      limited.  We still should keep the number of messages in NMI context at
      minimum).
      
      Note that there already are several messages printed in NMI context:
      WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
      handlers.  These are not easy to avoid.
      
      This patch reuses most of the code and makes it generic.  It is useful
      for all messages and architectures that support NMI.
      
      The alternative printk_func is set when entering and is reseted when
      leaving NMI context.  It queues IRQ work to copy the messages into the
      main ring buffer in a safe context.
      
      __printk_nmi_flush() copies all available messages and reset the buffer.
      Then we could use a simple cmpxchg operations to get synchronized with
      writers.  There is also used a spinlock to get synchronized with other
      flushers.
      
      We do not longer use seq_buf because it depends on external lock.  It
      would be hard to make all supported operations safe for a lockless use.
      It would be confusing and error prone to make only some operations safe.
      
      The code is put into separate printk/nmi.c as suggested by Steven
      Rostedt.  It needs a per-CPU buffer and is compiled only on
      architectures that call nmi_enter().  This is achieved by the new
      HAVE_NMI Kconfig flag.
      
      The are MN10300 and Xtensa architectures.  We need to clean up NMI
      handling there first.  Let's do it separately.
      
      The patch is heavily based on the draft from Peter Zijlstra, see
      
        https://lkml.org/lkml/2015/6/10/327
      
      [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
      [akpm@linux-foundation.org: min_t->min - all types are size_t here]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>	[arm part]
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42a0bb3f
    • Y
      mm: call page_ext_init() after all struct pages are initialized · b8f1a75d
      Yang Shi 提交于
      When DEFERRED_STRUCT_PAGE_INIT is enabled, just a subset of memmap at
      boot are initialized, then the rest are initialized in parallel by
      starting one-off "pgdatinitX" kernel thread for each node X.
      
      If page_ext_init is called before it, some pages will not have valid
      extension, this may lead the below kernel oops when booting up kernel:
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
        PGD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in:
        CPU: 11 PID: 106 Comm: pgdatinit1 Not tainted 4.6.0-rc5-next-20160427 #26
        Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009
        task: ffff88017c080040 ti: ffff88017c084000 task.ti: ffff88017c084000
        RIP: 0010:[<ffffffff8118d982>]  [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
        RSP: 0000:ffff88017c087c48  EFLAGS: 00010046
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
        RDX: 0000000000000980 RSI: 0000000000000080 RDI: 0000000000660401
        RBP: ffff88017c087cd0 R08: 0000000000000401 R09: 0000000000000009
        R10: ffff88017c080040 R11: 000000000000000a R12: 0000000000000400
        R13: ffffea0019810000 R14: ffffea0019810040 R15: ffff88066cfe6080
        FS:  0000000000000000(0000) GS:ffff88066cd40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000002406000 CR4: 00000000000006e0
        Call Trace:
          free_hot_cold_page+0x192/0x1d0
          __free_pages+0x5c/0x90
          __free_pages_boot_core+0x11a/0x14e
          deferred_free_range+0x50/0x62
          deferred_init_memmap+0x220/0x3c3
          kthread+0xf8/0x110
          ret_from_fork+0x22/0x40
        Code: 49 89 d4 48 c1 e0 06 49 01 c5 e9 de fe ff ff 4c 89 f7 44 89 4d b8 4c 89 45 c0 44 89 5d c8 48 89 4d d0 e8 62 c7 07 00 48 8b 4d d0 <48> 8b 00 44 8b 5d c8 4c 8b 45 c0 44 8b 4d b8 a8 02 0f 84 05 ff
        RIP  [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
         RSP <ffff88017c087c48>
        CR2: 0000000000000000
      
      Move page_ext_init() after page_alloc_init_late() to make sure page extension
      is setup for all pages.
      
      Link: http://lkml.kernel.org/r/1463696006-31360-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8f1a75d
  25. 16 3月, 2016 1 次提交
  26. 02 3月, 2016 2 次提交
    • T
      cpu/hotplug: Unpark smpboot threads from the state machine · 931ef163
      Thomas Gleixner 提交于
      Handle the smpboot threads in the state machine.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.295777684@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      931ef163
    • T
      cpu/hotplug: Convert to a state machine for the control processor · cff7d378
      Thomas Gleixner 提交于
      Move the split out steps into a callback array and let the cpu_up/down
      code iterate through the array functions. For now most of the
      callbacks are asymmetric to resemble the current hotplug maze.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.671816690@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cff7d378
  27. 22 2月, 2016 1 次提交
    • K
      mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings · d2aa1aca
      Kees Cook 提交于
      It may be useful to debug writes to the readonly sections of memory,
      so provide a cmdline "rodata=off" to allow for this. This can be
      expanded in the future to support "log" and "write" modes, but that
      will need to be architecture-specific.
      
      This also makes KDB software breakpoints more usable, as read-only
      mappings can now be disabled on any kernel.
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Brown <david.brown@linaro.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d2aa1aca