1. 14 8月, 2013 1 次提交
    • K
      powerpc: Make flush_fp_to_thread() nop when CONFIG_PPC_FPU is disabled · 037f0eed
      Kevin Hao 提交于
      In the current kernel, the function flush_fp_to_thread() is not
      dependent on CONFIG_PPC_FPU. So most invocations of this function
      is not wrapped by CONFIG_PPC_FPU. Even through we don't really
      save the FPRs to the thread struct if CONFIG_PPC_FPU is not enabled,
      but there does have some runtime overhead such as the check for
      tsk->thread.regs and preempt disable and enable. It really make
      no sense to do that. So make it a nop when CONFIG_PPC_FPU is
      disabled. Also remove the wrapped #ifdef CONFIG_PPC_FPU
      when invoking this function.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      037f0eed
  2. 09 8月, 2013 1 次提交
  3. 01 7月, 2013 1 次提交
  4. 15 6月, 2013 1 次提交
    • M
      powerpc: Fix stack overflow crash in resume_kernel when ftracing · 0e37739b
      Michael Ellerman 提交于
      It's possible for us to crash when running with ftrace enabled, eg:
      
        Bad kernel stack pointer bffffd12 at c00000000000a454
        cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
            pc: c00000000000a454: resume_kernel+0x34/0x60
            lr: c00000000000335c: performance_monitor_common+0x15c/0x180
            sp: bffffd12
           msr: 8000000000001032
           dar: bffffd12
         dsisr: 42000000
      
      If we look at current's stack (paca->__current->stack) we see it is
      equal to c0000002ecab0000. Our stack is 16K, and comparing to
      paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
      kernel stack. This leads to us writing over our struct thread_info, and
      in this case we have corrupted thread_info->flags and set
      _TIF_EMULATE_STACK_STORE.
      
      Dumping the stack we see:
      
        3:mon> t c0000002ecab0000
        [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
        [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
        --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
        [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
        [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
        [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
        [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
        [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
        [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
        [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
        [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
        [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
      
        ... and so on
      
      __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
      path. At that point the irq state is not consistent, ie. interrupts are
      hard disabled (by the exception entry), but the paca soft-enabled flag
      may be out of sync.
      
      This leads to the local_irq_restore() in trace_graph_entry() actually
      enabling interrupts, which we do not want. Because we have not yet
      reprogrammed the decrementer we immediately take another decrementer
      exception, and recurse.
      
      The fix is twofold. Firstly make sure we call DISABLE_INTS before
      calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
      the irq state in the paca with the hardware, making it safe again to
      call local_irq_save/restore().
      
      Although that should be sufficient to fix the bug, we also mark the
      runlatch routines as notrace. They are called very early in the
      exception entry and we are asking for trouble tracing them. They are
      also fairly uninteresting and tracing them just adds unnecessary
      overhead.
      
      [ This regression was introduced by fe1952fc
        "powerpc: Rework runlatch code" by myself --BenH
      ]
      
      CC: <stable@vger.kernel.org> [v3.4+]
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0e37739b
  5. 10 6月, 2013 1 次提交
  6. 14 5月, 2013 2 次提交
    • S
      powerpc/booke64: Fix kernel hangs at kernel_dbg_exc · 6cecf76b
      Scott Wood 提交于
      MSR_DE is not cleared on entry to the kernel, and we don't clear it
      explicitly outside of debug code.  If we have MSR_DE set in
      prime_debug_regs(), and the new thread has events enabled in DBCR0
      (e.g.  ICMP is set in thread->dbsr0, even though it was cleared in the
      real DBCR0 when the thread got scheduled out), we'll end up taking a
      debug exception in the kernel when DBCR0 is loaded.  DSRR0 will not
      point to an exception vector, and the kernel ends up hanging at
      kernel_dbg_exc.  Fix this by always clearing MSR_DE when we load new
      debug state.
      
      Another observed source of kernel_dbg_exc hangs is with the branch
      taken event.  If this event is active, but we take a non-debug trap
      (e.g. a TLB miss or an asynchronous interrupt) before the next branch.
      We end up taking a branch-taken debug exception on the initial branch
      instruction of the exception vector, but because the debug exception is
      DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even
      checking the DSRR0 address.  Fix this by checking for DBSR_BT as well
      as DBSR_IC, which is what 32-bit does and what the comments suggest was
      intended in the 64-bit code as well.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6cecf76b
    • L
      powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again · af945cf4
      Li Zhong 提交于
      Saw this warning again, and this time from the ret_from_fork path.
      
      It seems we could clear the back chain earlier in copy_thread(), which
      could cover both path, and also fix potential lockdep usage in
      schedule_tail(), or exception occurred before we clear the back chain.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      af945cf4
  7. 01 5月, 2013 2 次提交
    • T
      dump_stack: unify debug information printed by show_regs() · a43cb95d
      Tejun Heo 提交于
      show_regs() is inherently arch-dependent but it does make sense to print
      generic debug information and some archs already do albeit in slightly
      different forms.  This patch introduces a generic function to print debug
      information from show_regs() so that different archs print out the same
      information and it's much easier to modify what's printed.
      
      show_regs_print_info() prints out the same debug info as dump_stack()
      does plus task and thread_info pointers.
      
      * Archs which didn't print debug info now do.
      
        alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
        metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
        um, xtensa
      
      * Already prints debug info.  Replaced with show_regs_print_info().
        The printed information is superset of what used to be there.
      
        arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
      
      * s390 is special in that it used to print arch-specific information
        along with generic debug info.  Heiko and Martin think that the
        arch-specific extra isn't worth keeping s390 specfic implementation.
        Converted to use the generic version.
      
      Note that now all archs print the debug info before actual register
      dumps.
      
      An example BUG() dump follows.
      
       kernel BUG at /work/os/work/kernel/workqueue.c:4841!
       invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
       Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
       task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
       RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
       RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
       RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
       RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
       RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
        0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
        ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
       Call Trace:
        [<ffffffff81000312>] do_one_initcall+0x122/0x170
        [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
        [<ffffffff81c47760>] ? rest_init+0x140/0x140
        [<ffffffff81c4776e>] kernel_init+0xe/0xf0
        [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81c47760>] ? rest_init+0x140/0x140
        ...
      
      v2: Typo fix in x86-32.
      
      v3: CPU number dropped from show_regs_print_info() as
          dump_stack_print_info() has been updated to print it.  s390
          specific implementation dropped as requested by s390 maintainers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Acked-by: Chris Metcalf <cmetcalf@tilera.com>		[tile bits]
      Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a43cb95d
    • T
      dump_stack: consolidate dump_stack() implementations and unify their behaviors · 196779b9
      Tejun Heo 提交于
      Both dump_stack() and show_stack() are currently implemented by each
      architecture.  show_stack(NULL, NULL) dumps the backtrace for the
      current task as does dump_stack().  On some archs, dump_stack() prints
      extra information - pid, utsname and so on - in addition to the
      backtrace while the two are identical on other archs.
      
      The usages in arch-independent code of the two functions indicate
      show_stack(NULL, NULL) should print out bare backtrace while
      dump_stack() is used for debugging purposes when something went wrong,
      so it does make sense to print additional information on the task which
      triggered dump_stack().
      
      There's no reason to require archs to implement two separate but mostly
      identical functions.  It leads to unnecessary subtle information.
      
      This patch expands the dummy fallback dump_stack() implementation in
      lib/dump_stack.c such that it prints out debug information (taken from
      x86) and invokes show_stack(NULL, NULL) and drops arch-specific
      dump_stack() implementations in all archs except blackfin.  Blackfin's
      dump_stack() does something wonky that I don't understand.
      
      Debug information can be printed separately by calling
      dump_stack_print_info() so that arch-specific dump_stack()
      implementation can still emit the same debug information.  This is used
      in blackfin.
      
      This patch brings the following behavior changes.
      
      * On some archs, an extra level in backtrace for show_stack() could be
        printed.  This is because the top frame was determined in
        dump_stack() on those archs while generic dump_stack() can't do that
        reliably.  It can be compensated by inlining dump_stack() but not
        sure whether that'd be necessary.
      
      * Most archs didn't use to print debug info on dump_stack().  They do
        now.
      
      An example WARN dump follows.
      
       WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
       Hardware name: empty
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
        0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
        ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
        0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
       Call Trace:
        [<ffffffff81c614dc>] dump_stack+0x19/0x1b
        [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
        [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff8234a071>] init_workqueues+0x35/0x505
        ...
      
      v2: CPU number added to the generic debug info as requested by s390
          folks and dropped the s390 specific dump_stack().  This loses %ksp
          from the debug message which the maintainers think isn't important
          enough to keep the s390-specific dump_stack() implementation.
      
          dump_stack_print_info() is moved to kernel/printk.c from
          lib/dump_stack.c.  Because linkage is per objecct file,
          dump_stack_print_info() living in the same lib file as generic
          dump_stack() means that archs which implement custom dump_stack()
          - at this point, only blackfin - can't use dump_stack_print_info()
          as that will bring in the generic version of dump_stack() too.  v1
          The v1 patch broke build on blackfin due to this issue.  The build
          breakage was reported by Fengguang Wu.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      196779b9
  8. 23 4月, 2013 1 次提交
  9. 10 4月, 2013 1 次提交
  10. 15 2月, 2013 4 次提交
  11. 29 1月, 2013 1 次提交
  12. 16 1月, 2013 1 次提交
  13. 10 1月, 2013 3 次提交
  14. 29 11月, 2012 2 次提交
  15. 22 10月, 2012 5 次提交
  16. 15 10月, 2012 2 次提交
  17. 01 10月, 2012 2 次提交
  18. 10 9月, 2012 1 次提交
  19. 05 9月, 2012 2 次提交
  20. 20 8月, 2012 1 次提交
    • F
      cputime: Consolidate vtime handling on context switch · baa36046
      Frederic Weisbecker 提交于
      The archs that implement virtual cputime accounting all
      flush the cputime of a task when it gets descheduled
      and sometimes set up some ground initialization for the
      next task to account its cputime.
      
      These archs all put their own hooks in their context
      switch callbacks and handle the off-case themselves.
      
      Consolidate this by creating a new account_switch_vtime()
      callback called in generic code right after a context switch
      and that these archs must implement to flush the prev task
      cputime and initialize the next task cputime related state.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      baa36046
  21. 17 5月, 2012 1 次提交
    • S
      fork: move the real prepare_to_copy() users to arch_dup_task_struct() · 55ccf3fe
      Suresh Siddha 提交于
      Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
      the architectures and the rest following the x86 model of flushing the extended
      register state like fpu there.
      
      Remove it and use the arch_dup_task_struct() instead.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.comAcked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      55ccf3fe
  22. 08 5月, 2012 1 次提交
  23. 30 4月, 2012 1 次提交
    • A
      powerpc: Optimise enable_kernel_altivec · 35000870
      Anton Blanchard 提交于
      Add two optimisations to enable_kernel_altivec:
      
      - enable_kernel_altivec has already determined if we need to
      save the previous task's state but we call giveup_altivec
      in both cases, requiring an extra branch in giveup_altivec. Create
      giveup_altivec_notask which only turns on the VMX bit in the
      MSR.
      
      - We write the VMX MSR bit each time we call enable_kernel_altivec
      even it was already set. Check the bit and branch out if we have
      already set it. The classic case for this is vectored IO
      where we have to copy multiple buffers to or from userspace.
      
      The following testcase was used to confirm this patch improves
      performance:
      
      http://ozlabs.org/~anton/junkcode/copy_to_user.c
      
      Since the current breakpoint for using VMX in copy_tofrom_user is
      4096 bytes, I'm using buffers of 4096 + 1 cacheline (4224) bytes.
      A benchmark of 16 entry readvs (-s 16):
      
      time copy_to_user -l 4224 -s 16 -i 1000000
      
      completes 5.2% faster on a POWER7 PS700.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      35000870
  24. 11 4月, 2012 1 次提交
  25. 29 3月, 2012 1 次提交