1. 05 8月, 2014 1 次提交
  2. 28 7月, 2014 2 次提交
  3. 11 6月, 2014 1 次提交
    • S
      powerpc: Correct DSCR during TM context switch · 96d01610
      Sam bobroff 提交于
      Correct the DSCR SPR becoming temporarily corrupted if a task is
      context switched during a transaction.
      
      The problem occurs while suspending the task and is caused by saving
      the DSCR to thread.dscr after it has already been set to the CPU's
      default value:
      
      __switch_to() calls __switch_to_tm()
      	which calls tm_reclaim_task()
      	which calls tm_reclaim_thread()
      	which calls tm_reclaim()
      		where the DSCR is set to the CPU's default
      __switch_to() calls _switch()
      		where thread.dscr is set to the DSCR
      
      When the task is resumed, it's transaction will be doomed (as usual)
      and the DSCR SPR will be corrupted, although the checkpointed value
      will be correct. Therefore the DSCR will be immediately corrected by
      the transaction aborting, unless it has been suspended. In that case
      the incorrect value can be seen by the task until it resumes the
      transaction.
      
      The fix is to treat the DSCR similarly to the TAR and save it early
      in __switch_to().
      
      A program exposing the problem is added to the kernel self tests as:
      tools/testing/selftests/powerpc/tm/tm-resched-dscr.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      CC: <stable@vger.kernel.org> [v3.10+]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      96d01610
  4. 20 5月, 2014 2 次提交
  5. 23 4月, 2014 1 次提交
  6. 07 4月, 2014 1 次提交
    • M
      powerpc/tm: Disable IRQ in tm_recheckpoint · e6b8fd02
      Michael Neuling 提交于
      We can't take an IRQ when we're about to do a trechkpt as our GPR state is set
      to user GPR values.
      
      We've hit this when running some IBM Java stress tests in the lab resulting in
      the following dump:
      
        cpu 0x3f: Vector: 700 (Program Check) at [c000000007eb3d40]
            pc: c000000000050074: restore_gprs+0xc0/0x148
            lr: 00000000b52a8184
            sp: ac57d360
           msr: 8000000100201030
          current = 0xc00000002c500000
          paca    = 0xc000000007dbfc00     softe: 0     irq_happened: 0x00
            pid   = 34535, comm = Pooled Thread #
        R00 = 00000000b52a8184   R16 = 00000000b3e48fda
        R01 = 00000000ac57d360   R17 = 00000000ade79bd8
        R02 = 00000000ac586930   R18 = 000000000fac9bcc
        R03 = 00000000ade60000   R19 = 00000000ac57f930
        R04 = 00000000f6624918   R20 = 00000000ade79be8
        R05 = 00000000f663f238   R21 = 00000000ac218a54
        R06 = 0000000000000002   R22 = 000000000f956280
        R07 = 0000000000000008   R23 = 000000000000007e
        R08 = 000000000000000a   R24 = 000000000000000c
        R09 = 00000000b6e69160   R25 = 00000000b424cf00
        R10 = 0000000000000181   R26 = 00000000f66256d4
        R11 = 000000000f365ec0   R27 = 00000000b6fdcdd0
        R12 = 00000000f66400f0   R28 = 0000000000000001
        R13 = 00000000ada71900   R29 = 00000000ade5a300
        R14 = 00000000ac2185a8   R30 = 00000000f663f238
        R15 = 0000000000000004   R31 = 00000000f6624918
        pc  = c000000000050074 restore_gprs+0xc0/0x148
        cfar= c00000000004fe28 dont_restore_vec+0x1c/0x1a4
        lr  = 00000000b52a8184
        msr = 8000000100201030   cr  = 24804888
        ctr = 0000000000000000   xer = 0000000000000000   trap =  700
      
      This moves tm_recheckpoint to a C function and moves the tm_restore_sprs into
      that function.  It then adds IRQ disabling over the trechkpt critical section.
      It also sets the TEXASR FS in the signals code to ensure this is never set now
      that we explictly write the TM sprs in tm_recheckpoint.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      cc: stable@vger.kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e6b8fd02
  7. 07 3月, 2014 1 次提交
    • M
      powerpc/tm: Fix crash when forking inside a transaction · 621b5060
      Michael Neuling 提交于
      When we fork/clone we currently don't copy any of the TM state to the new
      thread.  This results in a TM bad thing (program check) when the new process is
      switched in as the kernel does a tmrechkpt with TEXASR FS not set.  Also, since
      R1 is from userspace, we trigger the bad kernel stack pointer detection.  So we
      end up with something like this:
      
         Bad kernel stack pointer 0 at c0000000000404fc
         cpu 0x2: Vector: 700 (Program Check) at [c00000003ffefd40]
             pc: c0000000000404fc: restore_gprs+0xc0/0x148
             lr: 0000000000000000
             sp: 0
            msr: 9000000100201030
           current = 0xc000001dd1417c30
           paca    = 0xc00000000fe00800   softe: 0        irq_happened: 0x01
             pid   = 0, comm = swapper/2
         WARNING: exception is not recoverable, can't continue
      
      The below fixes this by flushing the TM state before we copy the task_struct to
      the clone.  To do this we go through the tmreclaim patch, which removes the
      checkpointed registers from the CPU and transitions the CPU out of TM suspend
      mode.  Hence we need to call tmrechkpt after to restore the checkpointed state
      and the TM mode for the current task.
      
      To make this fail from userspace is simply:
      	tbegin
      	li	r0, 2
      	sc
      	<boom>
      
      Kudos to Adhemerval Zanella Neto for finding this.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      cc: Adhemerval Zanella Neto <azanella@br.ibm.com>
      cc: stable@vger.kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      621b5060
  8. 29 1月, 2014 1 次提交
  9. 15 1月, 2014 2 次提交
    • P
      powerpc: Don't corrupt transactional state when using FP/VMX in kernel · d31626f7
      Paul Mackerras 提交于
      Currently, when we have a process using the transactional memory
      facilities on POWER8 (that is, the processor is in transactional
      or suspended state), and the process enters the kernel and the
      kernel then uses the floating-point or vector (VMX/Altivec) facility,
      we end up corrupting the user-visible FP/VMX/VSX state.  This
      happens, for example, if a page fault causes a copy-on-write
      operation, because the copy_page function will use VMX to do the
      copy on POWER8.  The test program below demonstrates the bug.
      
      The bug happens because when FP/VMX state for a transactional process
      is stored in the thread_struct, we store the checkpointed state in
      .fp_state/.vr_state and the transactional (current) state in
      .transact_fp/.transact_vr.  However, when the kernel wants to use
      FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
      which saves the current state in .fp_state/.vr_state.  Furthermore,
      when we return to the user process we return with FP/VMX/VSX
      disabled.  The next time the process uses FP/VMX/VSX, we don't know
      which set of state (the current register values, .fp_state/.vr_state,
      or .transact_fp/.transact_vr) we should be using, since we have no
      way to tell if we are still in the same transaction, and if not,
      whether the previous transaction succeeded or failed.
      
      Thus it is necessary to strictly adhere to the rule that if FP has
      been enabled at any point in a transaction, we must keep FP enabled
      for the user process with the current transactional state in the
      FP registers, until we detect that it is no longer in a transaction.
      Similarly for VMX; once enabled it must stay enabled until the
      process is no longer transactional.
      
      In order to keep this rule, we add a new thread_info flag which we
      test when returning from the kernel to userspace, called TIF_RESTORE_TM.
      This flag indicates that there is FP/VMX/VSX state to be restored
      before entering userspace, and when it is set the .tm_orig_msr field
      in the thread_struct indicates what state needs to be restored.
      The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
      bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
      which are called from enable_kernel_fp/altivec, giveup_vsx, and
      flush_fp/altivec_to_thread instead of giveup_fpu/altivec.
      
      The other thing to be done is to get the transactional FP/VMX/VSX
      state from .fp_state/.vr_state when doing reclaim, if that state
      has been saved there by giveup_fpu/altivec_maybe_transactional.
      Having done this, we set the FP/VMX bit in the thread's MSR after
      reclaim to indicate that that part of the state is now valid
      (having been reclaimed from the processor's checkpointed state).
      
      Finally, in the signal handling code, we move the clearing of the
      transactional state bits in the thread's MSR a bit earlier, before
      calling flush_fp_to_thread(), so that we don't unnecessarily set
      the TIF_RESTORE_TM bit.
      
      This is the test program:
      
      /* Michael Neuling 4/12/2013
       *
       * See if the altivec state is leaked out of an aborted transaction due to
       * kernel vmx copy loops.
       *
       *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
       *
       */
      
      /* We don't use all of these, but for reference: */
      
      int main(int argc, char *argv[])
      {
      	long double vecin = 1.3;
      	long double vecout;
      	unsigned long pgsize = getpagesize();
      	int i;
      	int fd;
      	int size = pgsize*16;
      	char tmpfile[] = "/tmp/page_faultXXXXXX";
      	char buf[pgsize];
      	char *a;
      	uint64_t aborted = 0;
      
      	fd = mkstemp(tmpfile);
      	assert(fd >= 0);
      
      	memset(buf, 0, pgsize);
      	for (i = 0; i < size; i += pgsize)
      		assert(write(fd, buf, pgsize) == pgsize);
      
      	unlink(tmpfile);
      
      	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      	assert(a != MAP_FAILED);
      
      	asm __volatile__(
      		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
      		TBEGIN
      		"beq	3f ;"
      		TSUSPEND
      		"xxlxor 40,40,40 ; " // set 40 to 0
      		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
      		TABORT
      		TRESUME
      		TEND
      		"li	%[res], 0 ;"
      		"b	5f ;"
      		"3: ;" // Abort handler
      		"li	%[res], 1 ;"
      		"5: ;"
      		"stxvd2x 40,0,%[vecoutptr] ; "
      		: [res]"=r"(aborted)
      		: [vecinptr]"r"(&vecin),
      		  [vecoutptr]"r"(&vecout),
      		  [map]"r"(a)
      		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");
      
      	if (aborted && (vecin != vecout)){
      		printf("FAILED: vector state leaked on abort %f != %f\n",
      		       (double)vecin, (double)vecout);
      		exit(1);
      	}
      
      	munmap(a, size);
      
      	close(fd);
      
      	printf("PASSED!\n");
      	return 0;
      }
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d31626f7
    • P
      powerpc: Delete non-required instances of include <linux/init.h> · c141611f
      Paul Gortmaker 提交于
      None of these files are actually using any __init type directives
      and hence don't need to include <linux/init.h>.  Most are just a
      left over from __devinit and __cpuinit removal, or simply due to
      code getting copied from one driver to the next.
      
      The one instance where we add an include for init.h covers off
      a case where that file was implicitly getting it from another
      header which itself didn't need it.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c141611f
  10. 08 1月, 2014 1 次提交
    • J
      powerpc: fix exception clearing in e500 SPE float emulation · 640e9225
      Joseph Myers 提交于
      The e500 SPE floating-point emulation code clears existing exceptions
      (__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the
      emulated operation.  However, these exception bits are the "sticky",
      cumulative exception bits, and should only be cleared by the user
      program setting SPEFSCR, not implicitly by any floating-point
      instruction (whether executed purely by the hardware or emulated).
      The spurious clearing of these bits shows up as missing exceptions in
      glibc testing.
      
      Fixing this, however, is not as simple as just not clearing the bits,
      because while the bits may be from previous floating-point operations
      (in which case they should not be cleared), the processor can also set
      the sticky bits itself before the interrupt for an exception occurs,
      and this can happen in cases when IEEE 754 semantics are that the
      sticky bit should not be set.  Specifically, the "invalid" sticky bit
      is set in various cases with non-finite operands, where IEEE 754
      semantics do not involve raising such an exception, and the
      "underflow" sticky bit is set in cases of exact underflow, whereas
      IEEE 754 semantics are that this flag is set only for inexact
      underflow.  Thus, for correct emulation the kernel needs to know the
      setting of these two sticky bits before the instruction being
      emulated.
      
      When a floating-point operation raises an exception, the kernel can
      note the state of the sticky bits immediately afterwards.  Some
      <fenv.h> functions that affect the state of these bits, such as
      fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and
      PR_SET_FPEXC anyway, and so it is natural to record the state of those
      bits during that call into the kernel and so avoid any need for a
      separate call into the kernel to inform it of a change to those bits.
      Thus, the interface I chose to use (in this patch and the glibc port)
      is that one of those prctl calls must be made after any userspace
      change to those sticky bits, other than through a floating-point
      operation that traps into the kernel anyway.  feclearexcept and
      fesetexceptflag duly make those calls, which would not be required
      were it not for this issue.
      
      The previous EGLIBC port, and the uClibc code copied from it, is
      fundamentally broken as regards any use of prctl for floating-point
      exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its
      prctl calls (and did various worse things, such as passing a pointer
      when prctl expected an integer).  If you avoid anything where prctl is
      used, the clearing of sticky bits still means it will never give
      anything approximating correct exception semantics with existing
      kernels.  I don't believe the patch makes things any worse for
      existing code that doesn't try to inform the kernel of changes to
      sticky bits - such code may get incorrect exceptions in some cases,
      but it would have done so anyway in other cases.
      Signed-off-by: NJoseph Myers <joseph@codesourcery.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      640e9225
  11. 11 12月, 2013 1 次提交
    • S
      powerpc/kvm/booke: Fix build break due to stack frame size warning · f5f97210
      Scott Wood 提交于
      Commit ce11e48b ("KVM: PPC: E500: Add
      userspace debug stub support") added "struct thread_struct" to the
      stack of kvmppc_vcpu_run().  thread_struct is 1152 bytes on my build,
      compared to 48 bytes for the recently-introduced "struct debug_reg".
      Use the latter instead.
      
      This fixes the following error:
      
      cc1: warnings being treated as errors
      arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run':
      arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes
      make[2]: *** [arch/powerpc/kvm/booke.o] Error 1
      make[1]: *** [arch/powerpc/kvm] Error 2
      make[1]: *** Waiting for unfinished jobs....
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f5f97210
  12. 21 11月, 2013 4 次提交
  13. 31 10月, 2013 1 次提交
  14. 19 10月, 2013 3 次提交
  15. 17 10月, 2013 3 次提交
  16. 11 10月, 2013 2 次提交
    • P
      powerpc: Provide for giveup_fpu/altivec to save state in alternate location · 18461960
      Paul Mackerras 提交于
      This provides a facility which is intended for use by KVM, where the
      contents of the FP/VSX and VMX (Altivec) registers can be saved away
      to somewhere other than the thread_struct when kernel code wants to
      use floating point or VMX instructions.  This is done by providing a
      pointer in the thread_struct to indicate where the state should be
      saved to.  The giveup_fpu() and giveup_altivec() functions test these
      pointers and save state to the indicated location if they are non-NULL.
      Note that the MSR_FP/VEC bits in task->thread.regs->msr are still used
      to indicate whether the CPU register state is live, even when an
      alternate save location is being used.
      
      This also provides load_fp_state() and load_vr_state() functions, which
      load up FP/VSX and VMX state from memory into the CPU registers, and
      corresponding store_fp_state() and store_vr_state() functions, which
      store FP/VSX and VMX state into memory from the CPU registers.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      18461960
    • P
      powerpc: Put FP/VSX and VR state into structures · de79f7b9
      Paul Mackerras 提交于
      This creates new 'thread_fp_state' and 'thread_vr_state' structures
      to store FP/VSX state (including FPSCR) and Altivec/VSX state
      (including VSCR), and uses them in the thread_struct.  In the
      thread_fp_state, the FPRs and VSRs are represented as u64 rather
      than double, since we rarely perform floating-point computations
      on the values, and this will enable the structures to be used
      in KVM code as well.  Similarly FPSCR is now a u64 rather than
      a structure of two 32-bit values.
      
      This takes the offsets out of the macros such as SAVE_32FPRS,
      REST_32FPRS, etc.  This enables the same macros to be used for normal
      and transactional state, enabling us to delete the transactional
      versions of the macros.   This also removes the unused do_load_up_fpu
      and do_load_up_altivec, which were in fact buggy since they didn't
      create large enough stack frames to account for the fact that
      load_up_fpu and load_up_altivec are not designed to be called from C
      and assume that their caller's stack frame is an interrupt frame.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      de79f7b9
  17. 25 9月, 2013 1 次提交
    • B
      powerpc: Remove ksp_limit on ppc64 · cbc9565e
      Benjamin Herrenschmidt 提交于
      We've been keeping that field in thread_struct for a while, it contains
      the "limit" of the current stack pointer and is meant to be used for
      detecting stack overflows.
      
      It has a few problems however:
      
       - First, it was never actually *used* on 64-bit. Set and updated but
      not actually exploited
      
       - When switching stack to/from irq and softirq stacks, it's update
      is racy unless we hard disable interrupts, which is costly. This
      is fine on 32-bit as we don't soft-disable there but not on 64-bit.
      
      Thus rather than fixing 2 in order to implement 1 in some hypothetical
      future, let's remove the code completely from 64-bit. In order to avoid
      a clutter of ifdef's, we remove the updates from C code completely
      during interrupt stack switching, and instead maintain it from the
      asm helper that is used to do the stack switching in the first place.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cbc9565e
  18. 14 8月, 2013 1 次提交
    • K
      powerpc: Make flush_fp_to_thread() nop when CONFIG_PPC_FPU is disabled · 037f0eed
      Kevin Hao 提交于
      In the current kernel, the function flush_fp_to_thread() is not
      dependent on CONFIG_PPC_FPU. So most invocations of this function
      is not wrapped by CONFIG_PPC_FPU. Even through we don't really
      save the FPRs to the thread struct if CONFIG_PPC_FPU is not enabled,
      but there does have some runtime overhead such as the check for
      tsk->thread.regs and preempt disable and enable. It really make
      no sense to do that. So make it a nop when CONFIG_PPC_FPU is
      disabled. Also remove the wrapped #ifdef CONFIG_PPC_FPU
      when invoking this function.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      037f0eed
  19. 09 8月, 2013 1 次提交
  20. 01 7月, 2013 1 次提交
  21. 15 6月, 2013 1 次提交
    • M
      powerpc: Fix stack overflow crash in resume_kernel when ftracing · 0e37739b
      Michael Ellerman 提交于
      It's possible for us to crash when running with ftrace enabled, eg:
      
        Bad kernel stack pointer bffffd12 at c00000000000a454
        cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
            pc: c00000000000a454: resume_kernel+0x34/0x60
            lr: c00000000000335c: performance_monitor_common+0x15c/0x180
            sp: bffffd12
           msr: 8000000000001032
           dar: bffffd12
         dsisr: 42000000
      
      If we look at current's stack (paca->__current->stack) we see it is
      equal to c0000002ecab0000. Our stack is 16K, and comparing to
      paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
      kernel stack. This leads to us writing over our struct thread_info, and
      in this case we have corrupted thread_info->flags and set
      _TIF_EMULATE_STACK_STORE.
      
      Dumping the stack we see:
      
        3:mon> t c0000002ecab0000
        [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
        [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
        --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
        [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
        [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
        [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
        [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
        [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
        [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
        [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
        [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
        [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
      
        ... and so on
      
      __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
      path. At that point the irq state is not consistent, ie. interrupts are
      hard disabled (by the exception entry), but the paca soft-enabled flag
      may be out of sync.
      
      This leads to the local_irq_restore() in trace_graph_entry() actually
      enabling interrupts, which we do not want. Because we have not yet
      reprogrammed the decrementer we immediately take another decrementer
      exception, and recurse.
      
      The fix is twofold. Firstly make sure we call DISABLE_INTS before
      calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
      the irq state in the paca with the hardware, making it safe again to
      call local_irq_save/restore().
      
      Although that should be sufficient to fix the bug, we also mark the
      runlatch routines as notrace. They are called very early in the
      exception entry and we are asking for trouble tracing them. They are
      also fairly uninteresting and tracing them just adds unnecessary
      overhead.
      
      [ This regression was introduced by fe1952fc
        "powerpc: Rework runlatch code" by myself --BenH
      ]
      
      CC: <stable@vger.kernel.org> [v3.4+]
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0e37739b
  22. 10 6月, 2013 1 次提交
  23. 14 5月, 2013 2 次提交
    • S
      powerpc/booke64: Fix kernel hangs at kernel_dbg_exc · 6cecf76b
      Scott Wood 提交于
      MSR_DE is not cleared on entry to the kernel, and we don't clear it
      explicitly outside of debug code.  If we have MSR_DE set in
      prime_debug_regs(), and the new thread has events enabled in DBCR0
      (e.g.  ICMP is set in thread->dbsr0, even though it was cleared in the
      real DBCR0 when the thread got scheduled out), we'll end up taking a
      debug exception in the kernel when DBCR0 is loaded.  DSRR0 will not
      point to an exception vector, and the kernel ends up hanging at
      kernel_dbg_exc.  Fix this by always clearing MSR_DE when we load new
      debug state.
      
      Another observed source of kernel_dbg_exc hangs is with the branch
      taken event.  If this event is active, but we take a non-debug trap
      (e.g. a TLB miss or an asynchronous interrupt) before the next branch.
      We end up taking a branch-taken debug exception on the initial branch
      instruction of the exception vector, but because the debug exception is
      DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even
      checking the DSRR0 address.  Fix this by checking for DBSR_BT as well
      as DBSR_IC, which is what 32-bit does and what the comments suggest was
      intended in the 64-bit code as well.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6cecf76b
    • L
      powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again · af945cf4
      Li Zhong 提交于
      Saw this warning again, and this time from the ret_from_fork path.
      
      It seems we could clear the back chain earlier in copy_thread(), which
      could cover both path, and also fix potential lockdep usage in
      schedule_tail(), or exception occurred before we clear the back chain.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      af945cf4
  24. 01 5月, 2013 2 次提交
    • T
      dump_stack: unify debug information printed by show_regs() · a43cb95d
      Tejun Heo 提交于
      show_regs() is inherently arch-dependent but it does make sense to print
      generic debug information and some archs already do albeit in slightly
      different forms.  This patch introduces a generic function to print debug
      information from show_regs() so that different archs print out the same
      information and it's much easier to modify what's printed.
      
      show_regs_print_info() prints out the same debug info as dump_stack()
      does plus task and thread_info pointers.
      
      * Archs which didn't print debug info now do.
      
        alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
        metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
        um, xtensa
      
      * Already prints debug info.  Replaced with show_regs_print_info().
        The printed information is superset of what used to be there.
      
        arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
      
      * s390 is special in that it used to print arch-specific information
        along with generic debug info.  Heiko and Martin think that the
        arch-specific extra isn't worth keeping s390 specfic implementation.
        Converted to use the generic version.
      
      Note that now all archs print the debug info before actual register
      dumps.
      
      An example BUG() dump follows.
      
       kernel BUG at /work/os/work/kernel/workqueue.c:4841!
       invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
       Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
       task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
       RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
       RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
       RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
       RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
       RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
        0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
        ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
       Call Trace:
        [<ffffffff81000312>] do_one_initcall+0x122/0x170
        [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
        [<ffffffff81c47760>] ? rest_init+0x140/0x140
        [<ffffffff81c4776e>] kernel_init+0xe/0xf0
        [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81c47760>] ? rest_init+0x140/0x140
        ...
      
      v2: Typo fix in x86-32.
      
      v3: CPU number dropped from show_regs_print_info() as
          dump_stack_print_info() has been updated to print it.  s390
          specific implementation dropped as requested by s390 maintainers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Acked-by: Chris Metcalf <cmetcalf@tilera.com>		[tile bits]
      Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a43cb95d
    • T
      dump_stack: consolidate dump_stack() implementations and unify their behaviors · 196779b9
      Tejun Heo 提交于
      Both dump_stack() and show_stack() are currently implemented by each
      architecture.  show_stack(NULL, NULL) dumps the backtrace for the
      current task as does dump_stack().  On some archs, dump_stack() prints
      extra information - pid, utsname and so on - in addition to the
      backtrace while the two are identical on other archs.
      
      The usages in arch-independent code of the two functions indicate
      show_stack(NULL, NULL) should print out bare backtrace while
      dump_stack() is used for debugging purposes when something went wrong,
      so it does make sense to print additional information on the task which
      triggered dump_stack().
      
      There's no reason to require archs to implement two separate but mostly
      identical functions.  It leads to unnecessary subtle information.
      
      This patch expands the dummy fallback dump_stack() implementation in
      lib/dump_stack.c such that it prints out debug information (taken from
      x86) and invokes show_stack(NULL, NULL) and drops arch-specific
      dump_stack() implementations in all archs except blackfin.  Blackfin's
      dump_stack() does something wonky that I don't understand.
      
      Debug information can be printed separately by calling
      dump_stack_print_info() so that arch-specific dump_stack()
      implementation can still emit the same debug information.  This is used
      in blackfin.
      
      This patch brings the following behavior changes.
      
      * On some archs, an extra level in backtrace for show_stack() could be
        printed.  This is because the top frame was determined in
        dump_stack() on those archs while generic dump_stack() can't do that
        reliably.  It can be compensated by inlining dump_stack() but not
        sure whether that'd be necessary.
      
      * Most archs didn't use to print debug info on dump_stack().  They do
        now.
      
      An example WARN dump follows.
      
       WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
       Hardware name: empty
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
        0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
        ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
        0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
       Call Trace:
        [<ffffffff81c614dc>] dump_stack+0x19/0x1b
        [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
        [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff8234a071>] init_workqueues+0x35/0x505
        ...
      
      v2: CPU number added to the generic debug info as requested by s390
          folks and dropped the s390 specific dump_stack().  This loses %ksp
          from the debug message which the maintainers think isn't important
          enough to keep the s390-specific dump_stack() implementation.
      
          dump_stack_print_info() is moved to kernel/printk.c from
          lib/dump_stack.c.  Because linkage is per objecct file,
          dump_stack_print_info() living in the same lib file as generic
          dump_stack() means that archs which implement custom dump_stack()
          - at this point, only blackfin - can't use dump_stack_print_info()
          as that will bring in the generic version of dump_stack() too.  v1
          The v1 patch broke build on blackfin due to this issue.  The build
          breakage was reported by Fengguang Wu.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      196779b9
  25. 23 4月, 2013 1 次提交
  26. 10 4月, 2013 1 次提交
  27. 15 2月, 2013 1 次提交