1. 25 6月, 2016 1 次提交
    • L
      Clarify naming of thread info/stack allocators · b235beea
      Linus Torvalds 提交于
      We've had the thread info allocated together with the thread stack for
      most architectures for a long time (since the thread_info was split off
      from the task struct), but that is about to change.
      
      But the patches that move the thread info to be off-stack (and a part of
      the task struct instead) made it clear how confused the allocator and
      freeing functions are.
      
      Because the common case was that we share an allocation with the thread
      stack and the thread_info, the two pointers were identical.  That
      identity then meant that we would have things like
      
      	ti = alloc_thread_info_node(tsk, node);
      	...
      	tsk->stack = ti;
      
      which certainly _worked_ (since stack and thread_info have the same
      value), but is rather confusing: why are we assigning a thread_info to
      the stack? And if we move the thread_info away, the "confusing" code
      just gets to be entirely bogus.
      
      So remove all this confusion, and make it clear that we are doing the
      stack allocation by renaming and clarifying the function names to be
      about the stack.  The fact that the thread_info then shares the
      allocation is an implementation detail, and not really about the
      allocation itself.
      
      This is a pure renaming and type fix: we pass in the same pointer, it's
      just that we clarify what the pointer means.
      
      The ia64 code that actually only has one single allocation (for all of
      task_struct, thread_info and kernel thread stack) now looks a bit odd,
      but since "tsk->stack" is actually not even used there, that oddity
      doesn't matter.  It would be a separate thing to clean that up, I
      intentionally left the ia64 changes as a pure brute-force renaming and
      type change.
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b235beea
  2. 28 5月, 2016 1 次提交
  3. 21 5月, 2016 3 次提交
    • R
      init/main.c: simplify initcall_blacklisted() · c8cdd2be
      Rasmus Villemoes 提交于
      Using kasprintf to get the function name makes us look up the name
      twice, along with all the vsnprintf overhead of parsing the format
      string etc.  It also means there is an allocation failure case to deal
      with.  Since symbol_string in vsprintf.c would anyway allocate an array
      of size KSYM_SYMBOL_LEN on the stack, that might as well be done up
      here.
      
      Moreover, since this is a debug feature and the blacklisted_initcalls
      list is usually empty, we might as well test that and thus avoid looking
      up the symbol name even once in the common case.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8cdd2be
    • P
      printk/nmi: generic solution for safe printk in NMI · 42a0bb3f
      Petr Mladek 提交于
      printk() takes some locks and could not be used a safe way in NMI
      context.
      
      The chance of a deadlock is real especially when printing stacks from
      all CPUs.  This particular problem has been addressed on x86 by the
      commit a9edc880 ("x86/nmi: Perform a safe NMI stack trace on all
      CPUs").
      
      The patchset brings two big advantages.  First, it makes the NMI
      backtraces safe on all architectures for free.  Second, it makes all NMI
      messages almost safe on all architectures (the temporary buffer is
      limited.  We still should keep the number of messages in NMI context at
      minimum).
      
      Note that there already are several messages printed in NMI context:
      WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
      handlers.  These are not easy to avoid.
      
      This patch reuses most of the code and makes it generic.  It is useful
      for all messages and architectures that support NMI.
      
      The alternative printk_func is set when entering and is reseted when
      leaving NMI context.  It queues IRQ work to copy the messages into the
      main ring buffer in a safe context.
      
      __printk_nmi_flush() copies all available messages and reset the buffer.
      Then we could use a simple cmpxchg operations to get synchronized with
      writers.  There is also used a spinlock to get synchronized with other
      flushers.
      
      We do not longer use seq_buf because it depends on external lock.  It
      would be hard to make all supported operations safe for a lockless use.
      It would be confusing and error prone to make only some operations safe.
      
      The code is put into separate printk/nmi.c as suggested by Steven
      Rostedt.  It needs a per-CPU buffer and is compiled only on
      architectures that call nmi_enter().  This is achieved by the new
      HAVE_NMI Kconfig flag.
      
      The are MN10300 and Xtensa architectures.  We need to clean up NMI
      handling there first.  Let's do it separately.
      
      The patch is heavily based on the draft from Peter Zijlstra, see
      
        https://lkml.org/lkml/2015/6/10/327
      
      [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
      [akpm@linux-foundation.org: min_t->min - all types are size_t here]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>	[arm part]
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42a0bb3f
    • Y
      mm: call page_ext_init() after all struct pages are initialized · b8f1a75d
      Yang Shi 提交于
      When DEFERRED_STRUCT_PAGE_INIT is enabled, just a subset of memmap at
      boot are initialized, then the rest are initialized in parallel by
      starting one-off "pgdatinitX" kernel thread for each node X.
      
      If page_ext_init is called before it, some pages will not have valid
      extension, this may lead the below kernel oops when booting up kernel:
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
        PGD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in:
        CPU: 11 PID: 106 Comm: pgdatinit1 Not tainted 4.6.0-rc5-next-20160427 #26
        Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009
        task: ffff88017c080040 ti: ffff88017c084000 task.ti: ffff88017c084000
        RIP: 0010:[<ffffffff8118d982>]  [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
        RSP: 0000:ffff88017c087c48  EFLAGS: 00010046
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
        RDX: 0000000000000980 RSI: 0000000000000080 RDI: 0000000000660401
        RBP: ffff88017c087cd0 R08: 0000000000000401 R09: 0000000000000009
        R10: ffff88017c080040 R11: 000000000000000a R12: 0000000000000400
        R13: ffffea0019810000 R14: ffffea0019810040 R15: ffff88066cfe6080
        FS:  0000000000000000(0000) GS:ffff88066cd40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000002406000 CR4: 00000000000006e0
        Call Trace:
          free_hot_cold_page+0x192/0x1d0
          __free_pages+0x5c/0x90
          __free_pages_boot_core+0x11a/0x14e
          deferred_free_range+0x50/0x62
          deferred_init_memmap+0x220/0x3c3
          kthread+0xf8/0x110
          ret_from_fork+0x22/0x40
        Code: 49 89 d4 48 c1 e0 06 49 01 c5 e9 de fe ff ff 4c 89 f7 44 89 4d b8 4c 89 45 c0 44 89 5d c8 48 89 4d d0 e8 62 c7 07 00 48 8b 4d d0 <48> 8b 00 44 8b 5d c8 4c 8b 45 c0 44 8b 4d b8 a8 02 0f 84 05 ff
        RIP  [<ffffffff8118d982>] free_pcppages_bulk+0x2d2/0x8d0
         RSP <ffff88017c087c48>
        CR2: 0000000000000000
      
      Move page_ext_init() after page_alloc_init_late() to make sure page extension
      is setup for all pages.
      
      Link: http://lkml.kernel.org/r/1463696006-31360-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8f1a75d
  4. 16 3月, 2016 1 次提交
  5. 02 3月, 2016 2 次提交
    • T
      cpu/hotplug: Unpark smpboot threads from the state machine · 931ef163
      Thomas Gleixner 提交于
      Handle the smpboot threads in the state machine.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.295777684@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      931ef163
    • T
      cpu/hotplug: Convert to a state machine for the control processor · cff7d378
      Thomas Gleixner 提交于
      Move the split out steps into a callback array and let the cpu_up/down
      code iterate through the array functions. For now most of the
      callbacks are asymmetric to resemble the current hotplug maze.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.671816690@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cff7d378
  6. 22 2月, 2016 1 次提交
    • K
      mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings · d2aa1aca
      Kees Cook 提交于
      It may be useful to debug writes to the readonly sections of memory,
      so provide a cmdline "rodata=off" to allow for this. This can be
      expanded in the future to support "log" and "write" modes, but that
      will need to be architecture-specific.
      
      This also makes KDB software breakpoints more usable, as read-only
      mappings can now be disabled on any kernel.
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Brown <david.brown@linaro.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d2aa1aca
  7. 09 2月, 2016 1 次提交
  8. 21 1月, 2016 1 次提交
  9. 05 12月, 2015 1 次提交
  10. 11 9月, 2015 1 次提交
    • F
      kmod: use system_unbound_wq instead of khelper · 90f02303
      Frederic Weisbecker 提交于
      We need to launch the usermodehelper kernel threads with the widest
      affinity and this is partly why we use khelper.  This workqueue has
      unbound properties and thus a wide affinity inherited by all its children.
      
      Now khelper also has special properties that we aren't much interested in:
      ordered and singlethread.  There is really no need about ordering as all
      we do is creating kernel threads.  This can be done concurrently.  And
      singlethread is a useless limitation as well.
      
      The workqueue engine already proposes generic unbound workqueues that
      don't share these useless properties and handle well parallel jobs.
      
      The only worrysome specific is their affinity to the node of the current
      CPU.  It's fine for creating the usermodehelper kernel threads but those
      inherit this affinity for longer jobs such as requesting modules.
      
      This patch proposes to use these node affine unbound workqueues assuming
      that a node is sufficient to handle several parallel usermodehelper
      requests.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90f02303
  11. 07 8月, 2015 1 次提交
    • M
      fs, file table: reinit files_stat.max_files after deferred memory initialisation · 4248b0da
      Mel Gorman 提交于
      Dave Hansen reported the following;
      
      	My laptop has been behaving strangely with 4.2-rc2.  Once I log
      	in to my X session, I start getting all kinds of strange errors
      	from applications and see this in my dmesg:
      
              	VFS: file-max limit 8192 reached
      
      The problem is that the file-max is calculated before memory is fully
      initialised and miscalculates how much memory the kernel is using.  This
      patch recalculates file-max after deferred memory initialisation.  Note
      that using memory hotplug infrastructure would not have avoided this
      problem as the value is not recalculated after memory hot-add.
      
      4.1:             files_stat.max_files = 6582781
      4.2-rc2:         files_stat.max_files = 8192
      4.2-rc2 patched: files_stat.max_files = 6562467
      
      Small differences with the patch applied and 4.1 but not enough to matter.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Alex Ng <alexng@microsoft.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4248b0da
  12. 01 7月, 2015 1 次提交
  13. 11 6月, 2015 1 次提交
    • R
      ACPI / init: Switch over platform to the ACPI mode later · b064a8fa
      Rafael J. Wysocki 提交于
      Commit 73f7d1ca "ACPI / init: Run acpi_early_init() before
      timekeeping_init()" moved the ACPI subsystem initialization,
      including the ACPI mode enabling, to an earlier point in the
      initialization sequence, to allow the timekeeping subsystem
      use ACPI early.  Unfortunately, that resulted in boot regressions
      on some systems and the early ACPI initialization was moved toward
      its original position in the kernel initialization code by commit
      c4e1acbb "ACPI / init: Invoke early ACPI initialization later".
      
      However, that turns out to be insufficient, as boot is still broken
      on the Tyan S8812 mainboard.
      
      To fix that issue, split the ACPI early initialization code into
      two pieces so the majority of it still located in acpi_early_init()
      and the part switching over the platform into the ACPI mode goes into
      a new function, acpi_subsystem_init(), executed at the original early
      ACPI initialization spot.
      
      That fixes the Tyan S8812 boot problem, but still allows ACPI
      tables to be loaded earlier which is useful to the EFI code in
      efi_enter_virtual_mode().
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141
      Fixes: 73f7d1ca "ACPI / init: Run acpi_early_init() before timekeeping_init()"
      Reported-and-tested-by: NMarius Tolzmann <tolzmann@molgen.mpg.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NHanjun Guo <hanjun.guo@linaro.org>
      Reviewed-by: NLee, Chun-Yi <jlee@suse.com>
      b064a8fa
  14. 20 5月, 2015 1 次提交
    • L
      module: add extra argument for parse_params() callback · ecc86170
      Luis R. Rodriguez 提交于
      This adds an extra argument onto parse_params() to be used
      as a way to make the unused callback a bit more useful and
      generic by allowing the caller to pass on a data structure
      of its choice. An example use case is to allow us to easily
      make module parameters for every module which we will do
      next.
      
      @ parse @
      identifier name, args, params, num, level_min, level_max;
      identifier unknown, param, val, doing;
      type s16;
      @@
       extern char *parse_args(const char *name,
       			 char *args,
       			 const struct kernel_param *params,
       			 unsigned num,
       			 s16 level_min,
       			 s16 level_max,
      +			 void *arg,
       			 int (*unknown)(char *param, char *val,
      					const char *doing
      +					, void *arg
      					));
      
      @ parse_mod @
      identifier name, args, params, num, level_min, level_max;
      identifier unknown, param, val, doing;
      type s16;
      @@
       char *parse_args(const char *name,
       			 char *args,
       			 const struct kernel_param *params,
       			 unsigned num,
       			 s16 level_min,
       			 s16 level_max,
      +			 void *arg,
       			 int (*unknown)(char *param, char *val,
      					const char *doing
      +					, void *arg
      					))
      {
      	...
      }
      
      @ parse_args_found @
      expression R, E1, E2, E3, E4, E5, E6;
      identifier func;
      @@
      
      (
      	R =
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   func);
      |
      	R =
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   &func);
      |
      	R =
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   NULL);
      |
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   func);
      |
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   &func);
      |
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   NULL);
      )
      
      @ parse_args_unused depends on parse_args_found @
      identifier parse_args_found.func;
      @@
      
      int func(char *param, char *val, const char *unused
      +		 , void *arg
      		 )
      {
      	...
      }
      
      @ mod_unused depends on parse_args_found @
      identifier parse_args_found.func;
      expression A1, A2, A3;
      @@
      
      -	func(A1, A2, A3);
      +	func(A1, A2, A3, NULL);
      
      Generated-by: Coccinelle SmPL
      Cc: cocci@systeme.lip6.fr
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Felipe Contreras <felipe.contreras@gmail.com>
      Cc: Ewan Milne <emilne@redhat.com>
      Cc: Jean Delvare <jdelvare@suse.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecc86170
  15. 17 4月, 2015 1 次提交
    • H
      kernel/fork.c: new function for max_threads · ff691f6e
      Heinrich Schuchardt 提交于
      PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
      THREAD_SIZE.
      
      E.g.  architecture hexagon may have page size 1M and thread size 4096.
      This would lead to a division by zero in the calculation of max_threads.
      
      With this patch the buggy code is moved to a separate function
      set_max_threads.  The error is not fixed.
      
      After fixing the problem in a separate patch the new function can be
      reused to adjust max_threads after adding or removing memory.
      
      Argument mempages of function fork_init() is removed as totalram_pages is
      an exported symbol.
      
      The creation of separate patches for refactoring to a new function and for
      fixing the logic was suggested by Ingo Molnar.
      Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff691f6e
  16. 15 4月, 2015 1 次提交
    • T
      lib/ioremap.c: add huge I/O map capability interfaces · 0ddab1d2
      Toshi Kani 提交于
      Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
      I/O mappings with pud/pmd are enabled on the kernel.
      
      ioremap_huge_init() calls arch_ioremap_pud_supported() and
      arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.
      
      A new kernel option "nohugeiomap" is also added, so that user can disable
      the huge I/O map capabilities when necessary.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ddab1d2
  17. 13 4月, 2015 1 次提交
    • P
      cpu: Defer smpboot kthread unparking until CPU known to scheduler · 00df35f9
      Paul E. McKenney 提交于
      Currently, smpboot_unpark_threads() is invoked before the incoming CPU
      has been added to the scheduler's runqueue structures.  This might
      potentially cause the unparked kthread to run on the wrong CPU, since the
      correct CPU isn't fully set up yet.
      
      That causes a sporadic, hard to debug boot crash triggering on some
      systems, reported by Borislav Petkov, and bisected down to:
      
        2a442c9c ("x86: Use common outgoing-CPU-notification code")
      
      This patch places smpboot_unpark_threads() in a CPU hotplug
      notifier with priority set so that these kthreads are unparked just after
      the CPU has been added to the runqueues.
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      00df35f9
  18. 07 3月, 2015 1 次提交
  19. 05 3月, 2015 1 次提交
  20. 14 2月, 2015 1 次提交
  21. 22 1月, 2015 1 次提交
  22. 15 12月, 2014 1 次提交
  23. 14 12月, 2014 1 次提交
    • J
      mm/page_ext: resurrect struct page extending code for debugging · eefa864b
      Joonsoo Kim 提交于
      When we debug something, we'd like to insert some information to every
      page.  For this purpose, we sometimes modify struct page itself.  But,
      this has drawbacks.  First, it requires re-compile.  This makes us
      hesitate to use the powerful debug feature so development process is
      slowed down.  And, second, sometimes it is impossible to rebuild the
      kernel due to third party module dependency.  At third, system behaviour
      would be largely different after re-compile, because it changes size of
      struct page greatly and this structure is accessed by every part of
      kernel.  Keeping this as it is would be better to reproduce errornous
      situation.
      
      This feature is intended to overcome above mentioned problems.  This
      feature allocates memory for extended data per page in certain place
      rather than the struct page itself.  This memory can be accessed by the
      accessor functions provided by this code.  During the boot process, it
      checks whether allocation of huge chunk of memory is needed or not.  If
      not, it avoids allocating memory at all.  With this advantage, we can
      include this feature into the kernel in default and can avoid rebuild and
      solve related problems.
      
      Until now, memcg uses this technique.  But, now, memcg decides to embed
      their variable to struct page itself and it's code to extend struct page
      has been removed.  I'd like to use this code to develop debug feature, so
      this patch resurrect it.
      
      To help these things to work well, this patch introduces two callbacks for
      clients.  One is the need callback which is mandatory if user wants to
      avoid useless memory allocation at boot-time.  The other is optional, init
      callback, which is used to do proper initialization after memory is
      allocated.  Detailed explanation about purpose of these functions is in
      code comment.  Please refer it.
      
      Others are completely same with previous extension code in memcg.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Jungsoo Son <jungsoo.son@lge.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eefa864b
  24. 11 12月, 2014 3 次提交
    • A
      take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b
      Al Viro 提交于
      New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
      It's not mountable (not even registered, so it's not in /proc/filesystems,
      etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().
      
      This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
      get_proc_ns() is a macro now (it's simply returning ->i_private; would
      have been an inline, if not for header ordering headache).
      proc_ns_inode() is an ex-parrot.  The interface used in procfs is
      ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
      
      Dentries and inodes are never hashed; a non-counting reference to dentry
      is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
      if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
      of that mechanism.
      
      As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
      it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
      from ns_get_path().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e149ed2b
    • A
      init: allow CONFIG_INIT_FALLBACK=n to disable defaults if init= fails · 6ef4536e
      Andy Lutomirski 提交于
      If a user puts init=/whatever on the command line and /whatever can't be
      run, then the kernel will try a few default options before giving up.  If
      init=/whatever came from a bootloader prompt, then this is unexpected but
      probably harmless.  On the other hand, if it comes from a script (e.g.  a
      tool like virtme or perhaps a future kselftest script), then the fallbacks
      are likely to exist, but they'll do the wrong thing.  For example, they
      might unexpectedly invoke systemd.
      
      This adds a config option CONFIG_INIT_FALLBACK.  If unset, then a failure
      to run the specified init= process be fatal.
      
      The tentative plan is to remove CONFIG_INIT_FALLBACK for 3.20.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Rob Landley <rob@landley.net>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Shuah Khan <shuah.kh@samsung.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ef4536e
    • J
      mm: embed the memcg pointer directly into struct page · 1306a85a
      Johannes Weiner 提交于
      Memory cgroups used to have 5 per-page pointers.  To allow users to
      disable that amount of overhead during runtime, those pointers were
      allocated in a separate array, with a translation layer between them and
      struct page.
      
      There is now only one page pointer remaining: the memcg pointer, that
      indicates which cgroup the page is associated with when charged.  The
      complexity of runtime allocation and the runtime translation overhead is
      no longer justified to save that *potential* 0.19% of memory.  With
      CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
      after the page->private member and doesn't even increase struct page,
      and then this patch actually saves space.  Remaining users that care can
      still compile their kernels without CONFIG_MEMCG.
      
           text    data     bss     dec     hex     filename
        8828345 1725264  983040 11536649 b00909  vmlinux.old
        8827425 1725264  966656 11519345 afc571  vmlinux.new
      
      [mhocko@suse.cz: update Documentation/cgroups/memory.txt]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1306a85a
  25. 18 11月, 2014 1 次提交
    • D
      integrity: provide a hook to load keys when rootfs is ready · c9cd2ce2
      Dmitry Kasatkin 提交于
      Keys can only be loaded once the rootfs is mounted. Initcalls
      are not suitable for that. This patch defines a special hook
      to load the x509 public keys onto the IMA keyring, before
      attempting to access any file. The keys are required for
      verifying the file's signature. The hook is called after the
      root filesystem is mounted and before the kernel calls 'init'.
      
      Changes in v3:
      * added more explanation to the patch description (Mimi)
      
      Changes in v2:
      * Hook renamed as 'integrity_load_keys()' to handle both IMA and EVM
        keys by integrity subsystem.
      * Hook patch moved after defining loading functions
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      c9cd2ce2
  26. 11 11月, 2014 1 次提交
    • D
      param: fix crash on bad kernel arguments · 3438cf54
      Daniel Thompson 提交于
      Currently if the user passes an invalid value on the kernel command line
      then the kernel will crash during argument parsing. On most systems this
      is very hard to debug because the console hasn't been initialized yet.
      
      This is a regression due to commit 51e158c1 ("param: hand arguments
      after -- straight to init") which, in response to the systemd debug
      controversy, made it possible to explicitly pass arguments to init. To
      achieve this parse_args() was extended from simply returning an error
      code to returning a pointer. Regretably the new init args logic does not
      perform a proper validity check on the pointer resulting in a crash.
      
      This patch fixes the validity check. Should the check fail then no arguments
      will be passed to init. This is reasonable and matches how the kernel treats
      its own arguments (i.e. no error recovery).
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3438cf54
  27. 14 10月, 2014 1 次提交
  28. 19 9月, 2014 1 次提交
    • A
      init/main.c: Give init_task a canary · d4311ff1
      Aaron Tomlin 提交于
      Tasks get their end of stack set to STACK_END_MAGIC with the
      aim to catch stack overruns. Currently this feature does not
      apply to init_task. This patch removes this restriction.
      
      Note that a similar patch was posted by Prarit Bhargava
      some time ago but was never merged:
      
        http://marc.info/?l=linux-kernel&m=127144305403241&w=2Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4311ff1
  29. 17 9月, 2014 1 次提交
  30. 14 9月, 2014 1 次提交
    • F
      nohz: Move nohz full init call to tick init · a80e49e2
      Frederic Weisbecker 提交于
      This way we unbloat a bit main.c and more importantly we initialize
      nohz full after init_IRQ(). This dependency will be needed in further
      patches because nohz full needs irq work to raise its own IRQ.
      Information about the support for this ability on ARM64 is obtained on
      init_IRQ() which initialize the pointer to __smp_call_function.
      
      Since tick_init() is called right after init_IRQ(), this is a good place
      to call tick_nohz_init() and prepare for that dependency.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      a80e49e2
  31. 09 8月, 2014 1 次提交
  32. 05 6月, 2014 4 次提交
    • A
      647f010b
    • O
      kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND · 34a1b723
      Oleg Nesterov 提交于
      1. Remove CLONE_KERNEL, it has no users and it is dangerous.
      
         The (old) comment says "List of flags we want to share for kernel
         threads" but this is not true, we do not want to share ->sighand by
         default. This flag can only be used if the caller is sure that both
         parent/child will never play with signals (say, allow_signal/etc).
      
      2. Change rest_init() to clone kernel_init() without CLONE_SIGHAND.
      
         In this case CLONE_SIGHAND does not really hurt, and it looks like
         optimization because copy_sighand() can avoid kmem_cache_alloc().
      
         But in fact this only adds the minor pessimization. kernel_init()
         is going to exec the init process, and de_thread() will need to
         unshare ->sighand and do kmem_cache_alloc(sighand_cachep) anyway,
         but it needs to do more work and take tasklist_lock and siglock.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34a1b723
    • P
      init/main.c: add initcall_blacklist kernel parameter · 7b0b73d7
      Prarit Bhargava 提交于
      When a module is built into the kernel the module_init() function
      becomes an initcall.  Sometimes debugging through dynamic debug can
      help, however, debugging built in kernel modules is typically done by
      changing the .config, recompiling, and booting the new kernel in an
      effort to determine exactly which module caused a problem.
      
      This patchset can be useful stand-alone or combined with initcall_debug.
      There are cases where some initcalls can hang the machine before the
      console can be flushed, which can make initcall_debug output inaccurate.
      Having the ability to skip initcalls can help further debugging of these
      scenarios.
      
      Usage: initcall_blacklist=<list of comma separated initcalls>
      
      ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
      the log contains:
      
      	blacklisting initcall sgi_uv_sysfs_init
      	...
      	...
      	initcall sgi_uv_sysfs_init blacklisted
      
      ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter
      and the log contains:
      
      	blacklisting initcall foo_bar
      	blacklisting initcall sgi_uv_sysfs_init
      	...
      	...
      	initcall sgi_uv_sysfs_init blacklisted
      
      [akpm@linux-foundation.org: tweak printk text]
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Rob Landley <rob@landley.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b0b73d7
    • A
      init/main.c: don't use pr_debug() · d62cf815
      Andrew Morton 提交于
      Pertially revert commit ea676e84 ("init/main.c: convert to
      pr_foo()").
      
      Unbeknownst to me, pr_debug() is different from the other pr_foo()
      levels: pr_debug() is a no-op when DEBUG is not defined.
      
      Happily, init/main.c does have a #define DEBUG so we didn't break
      initcall_debug.  But the functioning of initcall_debug should not be
      dependent upon the presence of that #define DEBUG.
      Reported-by: NRussell King <rmk@arm.linux.org.uk>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d62cf815