1. 11 11月, 2014 1 次提交
    • D
      param: fix crash on bad kernel arguments · 3438cf54
      Daniel Thompson 提交于
      Currently if the user passes an invalid value on the kernel command line
      then the kernel will crash during argument parsing. On most systems this
      is very hard to debug because the console hasn't been initialized yet.
      
      This is a regression due to commit 51e158c1 ("param: hand arguments
      after -- straight to init") which, in response to the systemd debug
      controversy, made it possible to explicitly pass arguments to init. To
      achieve this parse_args() was extended from simply returning an error
      code to returning a pointer. Regretably the new init args logic does not
      perform a proper validity check on the pointer resulting in a crash.
      
      This patch fixes the validity check. Should the check fail then no arguments
      will be passed to init. This is reasonable and matches how the kernel treats
      its own arguments (i.e. no error recovery).
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3438cf54
  2. 14 10月, 2014 1 次提交
  3. 19 9月, 2014 1 次提交
    • A
      init/main.c: Give init_task a canary · d4311ff1
      Aaron Tomlin 提交于
      Tasks get their end of stack set to STACK_END_MAGIC with the
      aim to catch stack overruns. Currently this feature does not
      apply to init_task. This patch removes this restriction.
      
      Note that a similar patch was posted by Prarit Bhargava
      some time ago but was never merged:
      
        http://marc.info/?l=linux-kernel&m=127144305403241&w=2Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4311ff1
  4. 17 9月, 2014 1 次提交
  5. 14 9月, 2014 1 次提交
    • F
      nohz: Move nohz full init call to tick init · a80e49e2
      Frederic Weisbecker 提交于
      This way we unbloat a bit main.c and more importantly we initialize
      nohz full after init_IRQ(). This dependency will be needed in further
      patches because nohz full needs irq work to raise its own IRQ.
      Information about the support for this ability on ARM64 is obtained on
      init_IRQ() which initialize the pointer to __smp_call_function.
      
      Since tick_init() is called right after init_IRQ(), this is a good place
      to call tick_nohz_init() and prepare for that dependency.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      a80e49e2
  6. 09 8月, 2014 1 次提交
  7. 05 6月, 2014 6 次提交
  8. 06 5月, 2014 1 次提交
  9. 05 5月, 2014 1 次提交
  10. 01 5月, 2014 1 次提交
    • H
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 3891a04a
      H. Peter Anvin 提交于
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  This
      causes some 16-bit software to break, but it also leaks kernel state
      to user space.  We have a software workaround for that ("espfix") for
      the 32-bit kernel, but it relies on a nonzero stack segment base which
      is not available in 64-bit mode.
      
      In checkin:
      
          b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
      the logic that 16-bit support is crippled on 64-bit kernels anyway (no
      V86 support), but it turns out that people are doing stuff like
      running old Win16 binaries under Wine and expect it to work.
      
      This works around this by creating percpu "ministacks", each of which
      is mapped 2^16 times 64K apart.  When we detect that the return SS is
      on the LDT, we copy the IRET frame to the ministack and use the
      relevant alias to return to userspace.  The ministacks are mapped
      readonly, so if IRET faults we promote #GP to #DF which is an IST
      vector and thus has its own stack; we then do the fixup in the #DF
      handler.
      
      (Making #GP an IST exception would make the msr_safe functions unsafe
      in NMI/MC context, and quite possibly have other effects.)
      
      Special thanks to:
      
      - Andy Lutomirski, for the suggestion of using very small stack slots
        and copy (as opposed to map) the IRET frame there, and for the
        suggestion to mark them readonly and let the fault promote to #DF.
      - Konrad Wilk for paravirt fixup and testing.
      - Borislav Petkov for testing help and useful comments.
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Lutomriski <amluto@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
      Cc: comex <comexk@gmail.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # consider after upstream merge
      3891a04a
  11. 28 4月, 2014 1 次提交
    • R
      param: hand arguments after -- straight to init · 51e158c1
      Rusty Russell 提交于
      The kernel passes any args it doesn't need through to init, except it
      assumes anything containing '.' belongs to the kernel (for a module).
      This change means all users can clearly distinguish which arguments
      are for init.
      
      For example, the kernel uses debug ("dee-bug") to mean log everything to
      the console, where systemd uses the debug from the Scandinavian "day-boog"
      meaning "fail to boot".  If a future versions uses argv[] instead of
      reading /proc/cmdline, this confusion will be avoided.
      
      eg: test 'FOO="this is --foo"' -- 'systemd.debug="true true true"'
      
      Gives:
      argv[0] = '/debug-init'
      argv[1] = 'test'
      argv[2] = 'systemd.debug=true true true'
      envp[0] = 'HOME=/'
      envp[1] = 'TERM=linux'
      envp[2] = 'FOO=this is --foo'
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      51e158c1
  12. 13 3月, 2014 1 次提交
  13. 06 2月, 2014 1 次提交
    • L
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds 提交于
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: NIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98
  14. 28 1月, 2014 1 次提交
  15. 24 1月, 2014 2 次提交
  16. 22 1月, 2014 2 次提交
    • S
      init/main.c: use memblock apis for early memory allocations · 098b081b
      Santosh Shilimkar 提交于
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fall back to
      exiting bootmem APIs.
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      098b081b
    • K
      mm: create a separate slab for page->ptl allocation · b35f1819
      Kirill A. Shutemov 提交于
      If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
      is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
      so we loose 24 on each.  An average system can easily allocate few tens
      thousands of page->ptl and overhead is significant.
      
      Let's create a separate slab for page->ptl allocation to solve this.
      
      To make sure that it really works this time, some numbers from my test
      machine (just booted, no load):
      
      Before:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        kmalloc-96         31987  32190    128   30    1 : tunables  120   60    8 : slabdata   1073   1073     92
      After:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        page->ptl          27516  28143     72   53    1 : tunables  120   60    8 : slabdata    531    531      9
        kmalloc-96          3853   5280    128   30    1 : tunables  120   60    8 : slabdata    176    176      0
      
      Note that the patch is useful not only for debug case, but also for
      PREEMPT_RT, where spinlock_t is always bloated.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35f1819
  17. 16 1月, 2014 1 次提交
  18. 21 11月, 2013 1 次提交
  19. 15 11月, 2013 1 次提交
  20. 13 11月, 2013 2 次提交
  21. 31 10月, 2013 1 次提交
    • K
      init: fix in-place parameter modification regression · 08746a65
      Krzysztof Mazur 提交于
      Before commit 026cee00
      ("params: <level>_initcall-like kernel parameters") the __setup
      parameter parsing code could modify parameter in the
      static_command_line buffer and such modifications were kept. After
      that commit such modifications are destroyed during per-initcall level
      parameter parsing because the same static_command_line buffer is used
      and only parameters for appropriate initcall level are parsed.
      
      That change broke at least parsing "ubd" parameter in the ubd driver
      when the COW file is used.
      
      Now the separate buffer is used for per-initcall parameter parsing.
      Signed-off-by: NKrzysztof Mazur <krzysiek@podlesie.net>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      08746a65
  22. 20 10月, 2013 1 次提交
    • H
      static_key: WARN on usage before jump_label_init was called · c4b2c0c5
      Hannes Frederic Sowa 提交于
      Usage of the static key primitives to toggle a branch must not be used
      before jump_label_init() is called from init/main.c. jump_label_init
      reorganizes and wires up the jump_entries so usage before that could
      have unforeseen consequences.
      
      Following primitives are now checked for correct use:
      * static_key_slow_inc
      * static_key_slow_dec
      * static_key_slow_dec_deferred
      * jump_label_rate_limit
      
      The x86 architecture already checks this by testing if the default_nop
      was already replaced with an optimal nop or with a branch instruction. It
      will panic then. Other architectures don't check for this.
      
      Because we need to relax this check for the x86 arch to allow code to
      transition from default_nop to the enabled state and other architectures
      did not check for this at all this patch introduces checking on the
      static_key primitives in a non-arch dependent manner.
      
      All checked functions are considered slow-path so the additional check
      does no harm to performance.
      
      The warnings are best observed with earlyprintk.
      
      Based on a patch from Andi Kleen.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4b2c0c5
  23. 25 9月, 2013 1 次提交
  24. 23 9月, 2013 1 次提交
  25. 14 8月, 2013 1 次提交
    • F
      context_tracking: Ground setup for static key use · 65f382fd
      Frederic Weisbecker 提交于
      Prepare for using a static key in the context tracking subsystem.
      This will help optimizing the off case on its many users:
      
      * user_enter, user_exit, exception_enter, exception_exit, guest_enter,
        guest_exit, vtime_*()
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      65f382fd
  26. 04 7月, 2013 1 次提交
  27. 13 6月, 2013 1 次提交
  28. 28 5月, 2013 1 次提交
    • S
      perf: Use hrtimers for event multiplexing · 9e630205
      Stephane Eranian 提交于
      The current scheme of using the timer tick was fine for per-thread
      events. However, it was causing bias issues in system-wide mode
      (including for uncore PMUs). Event groups would not get their fair
      share of runtime on the PMU. With tickless kernels, if a core is idle
      there is no timer tick, and thus no event rotation (multiplexing).
      However, there are events (especially uncore events) which do count
      even though cores are asleep.
      
      This patch changes the timer source for multiplexing.  It introduces a
      per-PMU per-cpu hrtimer. The advantage is that even when a core goes
      idle, it will come back to service the hrtimer, thus multiplexing on
      system-wide events works much better.
      
      The per-PMU implementation (suggested by PeterZ) enables adjusting the
      multiplexing interval per PMU. The preferred interval is stashed into
      the struct pmu. If not set, it will be forced to the default interval
      value.
      
      In order to minimize the impact of the hrtimer, it is turned on and
      off on demand. When the PMU on a CPU is overcommited, the hrtimer is
      activated.  It is stopped when the PMU is not overcommitted.
      
      In order for this to work properly, we had to change the order of
      initialization in start_kernel() such that hrtimer_init() is run
      before perf_event_init().
      
      The default interval in milliseconds is set to a timer tick just like
      with the old code. We will provide a sysctl to tune this in another
      patch.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e630205
  29. 02 5月, 2013 1 次提交
  30. 30 4月, 2013 3 次提交
    • A
      init/main.c: convert to pr_foo() · ea676e84
      Andrew Morton 提交于
      Also enables cleanup of some 80-col trickery.
      
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea676e84
    • R
      init: raise log level · c2409b00
      Richard Weinberger 提交于
      If the kernel was booted with the "quiet" boot option we have currently no
      chance to see why an initrd fails.  Change KERN_WARNING to KERN_ERR to see
      what is going on.
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2409b00
    • S
      init: scream bloody murder if interrupts are enabled too early · f91eb62f
      Steven Rostedt 提交于
      As I was testing a lot of my code recently, and having several
      "successes", I accidentally noticed in the dmesg this little line:
      
        start_kernel(): bug: interrupts were enabled *very* early, fixing it
      
      Sure enough, one of my patches two commits ago enabled interrupts early.
      The sad part here is that I never noticed it, and I ran several tests with
      ktest too, and ktest did not notice this line.
      
      What ktest looks for (and so does many other automated testing scripts) is
      a back trace produced by a WARN_ON() or BUG().  As a back trace was never
      produced, my buggy patch could have slipped into linux-next, or even
      worse, mainline.
      
      Adding a WARN(!irqs_disabled()) makes this bug a little more obvious:
      
        PID hash table entries: 4096 (order: 3, 32768 bytes)
        __ex_table already sorted, skipping sort
        Checking aperture...
        No AGP bridge found
        Calgary: detecting Calgary via BIOS EBDA area
        Calgary: Unable to locate Rio Grande table in EBDA - bailing!
        Memory: 2003252k/2054848k available (4857k kernel code, 460k absent, 51136k reserved, 6210k data, 1096k init)
        ------------[ cut here ]------------
        WARNING: at /home/rostedt/work/git/linux-trace.git/init/main.c:543 start_kernel+0x21e/0x415()
        Hardware name: To Be Filled By O.E.M.
        Interrupts were enabled *very* early, fixing it
        Modules linked in:
        Pid: 0, comm: swapper/0 Not tainted 3.8.0-test+ #286
        Call Trace:
          warn_slowpath_common+0x83/0x9b
          warn_slowpath_fmt+0x46/0x48
          start_kernel+0x21e/0x415
          x86_64_start_reservations+0x10e/0x112
          x86_64_start_kernel+0x102/0x111
        ---[ end trace 007d8b0491b4f5d8 ]---
        Preemptible hierarchical RCU implementation.
         RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
        NR_IRQS:4352 nr_irqs:712 16
        Console: colour VGA+ 80x25
        console [ttyS0] enabled, bootconsole disabled
      
      Do you see it?
      
      The original version of this patch just slapped a WARN_ON() in there and
      kept the printk().  Ard van Breemen suggested using the WARN() interface,
      which makes the code a bit cleaner.
      
      Also, while examining other warnings in init/main.c, I found two other
      locations that deserve a bloody murder scream if their conditions are hit,
      and updated them accordingly.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Ard van Breemen <ard@telegraafnet.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f91eb62f