1. 24 3月, 2014 1 次提交
  2. 15 1月, 2014 2 次提交
    • P
      powerpc: Fix transactional FP/VMX/VSX unavailable handlers · 3ac8ff1c
      Paul Mackerras 提交于
      Currently, if a process starts a transaction and then takes an
      exception because the FPU, VMX or VSX unit is unavailable to it,
      we end up corrupting any FP/VMX/VSX state that was valid before
      the interrupt.  For example, if the process starts a transaction
      with the FPU available to it but VMX unavailable, and then does
      a VMX instruction inside the transaction, the FP state gets
      corrupted.
      
      Loading up the desired state generally involves doing a reclaim
      and a recheckpoint.  To avoid corrupting already-valid state, we have
      to be careful not to reload that state from the thread_struct
      between the reclaim and the recheckpoint (since the thread_struct
      values are stale by now), and we have to reload that state from
      the transact_fp/vr arrays after the recheckpoint to get back the
      current transactional values saved there by the reclaim.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ac8ff1c
    • P
      powerpc: Don't corrupt transactional state when using FP/VMX in kernel · d31626f7
      Paul Mackerras 提交于
      Currently, when we have a process using the transactional memory
      facilities on POWER8 (that is, the processor is in transactional
      or suspended state), and the process enters the kernel and the
      kernel then uses the floating-point or vector (VMX/Altivec) facility,
      we end up corrupting the user-visible FP/VMX/VSX state.  This
      happens, for example, if a page fault causes a copy-on-write
      operation, because the copy_page function will use VMX to do the
      copy on POWER8.  The test program below demonstrates the bug.
      
      The bug happens because when FP/VMX state for a transactional process
      is stored in the thread_struct, we store the checkpointed state in
      .fp_state/.vr_state and the transactional (current) state in
      .transact_fp/.transact_vr.  However, when the kernel wants to use
      FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
      which saves the current state in .fp_state/.vr_state.  Furthermore,
      when we return to the user process we return with FP/VMX/VSX
      disabled.  The next time the process uses FP/VMX/VSX, we don't know
      which set of state (the current register values, .fp_state/.vr_state,
      or .transact_fp/.transact_vr) we should be using, since we have no
      way to tell if we are still in the same transaction, and if not,
      whether the previous transaction succeeded or failed.
      
      Thus it is necessary to strictly adhere to the rule that if FP has
      been enabled at any point in a transaction, we must keep FP enabled
      for the user process with the current transactional state in the
      FP registers, until we detect that it is no longer in a transaction.
      Similarly for VMX; once enabled it must stay enabled until the
      process is no longer transactional.
      
      In order to keep this rule, we add a new thread_info flag which we
      test when returning from the kernel to userspace, called TIF_RESTORE_TM.
      This flag indicates that there is FP/VMX/VSX state to be restored
      before entering userspace, and when it is set the .tm_orig_msr field
      in the thread_struct indicates what state needs to be restored.
      The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
      bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
      which are called from enable_kernel_fp/altivec, giveup_vsx, and
      flush_fp/altivec_to_thread instead of giveup_fpu/altivec.
      
      The other thing to be done is to get the transactional FP/VMX/VSX
      state from .fp_state/.vr_state when doing reclaim, if that state
      has been saved there by giveup_fpu/altivec_maybe_transactional.
      Having done this, we set the FP/VMX bit in the thread's MSR after
      reclaim to indicate that that part of the state is now valid
      (having been reclaimed from the processor's checkpointed state).
      
      Finally, in the signal handling code, we move the clearing of the
      transactional state bits in the thread's MSR a bit earlier, before
      calling flush_fp_to_thread(), so that we don't unnecessarily set
      the TIF_RESTORE_TM bit.
      
      This is the test program:
      
      /* Michael Neuling 4/12/2013
       *
       * See if the altivec state is leaked out of an aborted transaction due to
       * kernel vmx copy loops.
       *
       *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
       *
       */
      
      /* We don't use all of these, but for reference: */
      
      int main(int argc, char *argv[])
      {
      	long double vecin = 1.3;
      	long double vecout;
      	unsigned long pgsize = getpagesize();
      	int i;
      	int fd;
      	int size = pgsize*16;
      	char tmpfile[] = "/tmp/page_faultXXXXXX";
      	char buf[pgsize];
      	char *a;
      	uint64_t aborted = 0;
      
      	fd = mkstemp(tmpfile);
      	assert(fd >= 0);
      
      	memset(buf, 0, pgsize);
      	for (i = 0; i < size; i += pgsize)
      		assert(write(fd, buf, pgsize) == pgsize);
      
      	unlink(tmpfile);
      
      	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      	assert(a != MAP_FAILED);
      
      	asm __volatile__(
      		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
      		TBEGIN
      		"beq	3f ;"
      		TSUSPEND
      		"xxlxor 40,40,40 ; " // set 40 to 0
      		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
      		TABORT
      		TRESUME
      		TEND
      		"li	%[res], 0 ;"
      		"b	5f ;"
      		"3: ;" // Abort handler
      		"li	%[res], 1 ;"
      		"5: ;"
      		"stxvd2x 40,0,%[vecoutptr] ; "
      		: [res]"=r"(aborted)
      		: [vecinptr]"r"(&vecin),
      		  [vecoutptr]"r"(&vecout),
      		  [map]"r"(a)
      		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");
      
      	if (aborted && (vecin != vecout)){
      		printf("FAILED: vector state leaked on abort %f != %f\n",
      		       (double)vecin, (double)vecout);
      		exit(1);
      	}
      
      	munmap(a, size);
      
      	close(fd);
      
      	printf("PASSED!\n");
      	return 0;
      }
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d31626f7
  3. 05 12月, 2013 2 次提交
    • M
      powerpc/book3s: Introduce a early machine check hook in cpu_spec. · 4c703416
      Mahesh Salgaonkar 提交于
      This patch adds the early machine check function pointer in cputable for
      CPU specific early machine check handling. The early machine handle routine
      will be called in real mode to handle SLB and TLB errors. We can not reuse
      the existing machine_check hook because it is always invoked in kernel
      virtual mode and we would already be in trouble if we get SLB or TLB errors.
      This patch just sets up a mechanism to invoke CPU specific handler. The
      subsequent patches will populate the function pointer.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4c703416
    • M
      powerpc/book3s: handle machine check in Linux host. · 1e9b4507
      Mahesh Salgaonkar 提交于
      Move machine check entry point into Linux. So far we were dependent on
      firmware to decode MCE error details and handover the high level info to OS.
      
      This patch introduces early machine check routine that saves the MCE
      information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
      stack frame on emergency stack and set the r1 accordingly. This allows us to be
      prepared to take another exception without loosing context. One thing to note
      here that, if we get another machine check while ME bit is off then we risk a
      checkstop. Hence we restrict ourselves to save only MCE information and
      register saved on PACA_EXMC save are before we turn the ME bit on. We use
      paca->in_mce flag to differentiate between first entry and nested machine check
      entry which helps proper use of emergency stack. We increment paca->in_mce
      every time we enter in early machine check handler and decrement it while
      leaving. When we enter machine check early handler first time (paca->in_mce ==
      0), we are sure nobody is using MC emergency stack and allocate a stack frame
      at the start of the emergency stack. During subsequent entry (paca->in_mce >
      0), we know that r1 points inside emergency stack and we allocate separate
      stack frame accordingly. This prevents us from clobbering MCE information
      during nested machine checks.
      
      The early machine check handler changes are placed under CPU_FTR_HVMODE
      section. This makes sure that the early machine check handler will get executed
      only in hypervisor kernel.
      
      This is the code flow:
      
      		Machine Check Interrupt
      			|
      			V
      		   0x200 vector				  ME=0, IR=0, DR=0
      			|
      			V
      	+-----------------------------------------------+
      	|machine_check_pSeries_early:			| ME=0, IR=0, DR=0
      	|	Alloc frame on emergency stack		|
      	|	Save srr1, srr0, dar and dsisr on stack |
      	+-----------------------------------------------+
      			|
      		(ME=1, IR=0, DR=0, RFID)
      			|
      			V
      		machine_check_handle_early		  ME=1, IR=0, DR=0
      			|
      			V
      	+-----------------------------------------------+
      	|	machine_check_early (r3=pt_regs)	| ME=1, IR=0, DR=0
      	|	Things to do: (in next patches)		|
      	|		Flush SLB for SLB errors	|
      	|		Flush TLB for TLB errors	|
      	|		Decode and save MCE info	|
      	+-----------------------------------------------+
      			|
      	(Fall through existing exception handler routine.)
      			|
      			V
      		machine_check_pSerie			  ME=1, IR=0, DR=0
      			|
      		(ME=1, IR=1, DR=1, RFID)
      			|
      			V
      		machine_check_common			  ME=1, IR=1, DR=1
      			.
      			.
      			.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1e9b4507
  4. 29 10月, 2013 2 次提交
  5. 19 10月, 2013 1 次提交
  6. 17 10月, 2013 2 次提交
  7. 11 10月, 2013 1 次提交
    • P
      powerpc: Put FP/VSX and VR state into structures · de79f7b9
      Paul Mackerras 提交于
      This creates new 'thread_fp_state' and 'thread_vr_state' structures
      to store FP/VSX state (including FPSCR) and Altivec/VSX state
      (including VSCR), and uses them in the thread_struct.  In the
      thread_fp_state, the FPRs and VSRs are represented as u64 rather
      than double, since we rarely perform floating-point computations
      on the values, and this will enable the structures to be used
      in KVM code as well.  Similarly FPSCR is now a u64 rather than
      a structure of two 32-bit values.
      
      This takes the offsets out of the macros such as SAVE_32FPRS,
      REST_32FPRS, etc.  This enables the same macros to be used for normal
      and transactional state, enabling us to delete the transactional
      versions of the macros.   This also removes the unused do_load_up_fpu
      and do_load_up_altivec, which were in fact buggy since they didn't
      create large enough stack frames to account for the fact that
      load_up_fpu and load_up_altivec are not designed to be called from C
      and assume that their caller's stack frame is an interrupt frame.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      de79f7b9
  8. 27 8月, 2013 2 次提交
    • M
      powerpc: Cleanup handling of the DSCR bit in the FSCR register · bc683a7e
      Michael Neuling 提交于
      As suggested by paulus we can simplify the Data Stream Control Register
      (DSCR) Facility Status and Control Register (FSCR) handling.
      
      Firstly, we simplify the asm by using a rldimi.
      
      Secondly, we now use the FSCR only to control the DSCR facility, rather
      than both the FSCR and HFSCR.  Users will see no functional change from
      this but will get a minor speedup as they will trap into the kernel only
      once (rather than twice) when they first touch the DSCR.  Also, this
      changes removes a bunch of ugly FTR_SECTION code.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bc683a7e
    • M
      powerpc: Skip emulating & leave interrupts off for kernel program checks · b3f6a459
      Michael Ellerman 提交于
      In the program check handler we handle some causes with interrupts off
      and others with interrupts on.
      
      We need to enable interrupts to handle the emulation cases, because they
      access userspace memory and might sleep.
      
      For faults in the kernel we don't want to do any emulation, and
      emulate_instruction() enforces that. do_mathemu() doesn't but probably
      should.
      
      The other disadvantage of enabling interrupts for kernel faults is that
      we may take another interrupt, and recurse. As seen below:
      
        --- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
        [link register   ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
        [c000000fb185dc10] 0000000000000000 (unreliable)
        [c000000fb185dc80] c0000000007d8558 .program_check_exception+0x298/0x2d0
        [c000000fb185dd00] c000000000002f40 emulation_assist_common+0x140/0x180
        --- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
        [link register   ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
        [c000000fb185dff0] 00000000008b9190 (unreliable)
        [c000000fb185e060] c0000000007d8558 .program_check_exception+0x298/0x2d0
      
      So avoid both problems by checking if the fault was in the kernel and
      skipping the enable of interrupts and the emulation. Go straight to
      delivering the SIGILL, which for kernel faults calls die() and so on,
      dropping us in the debugger etc.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b3f6a459
  9. 14 8月, 2013 3 次提交
  10. 09 8月, 2013 1 次提交
    • M
      powerpc: Fix context switch DSCR on POWER8 · 2517617e
      Michael Neuling 提交于
      POWER8 allows the DSCR to be accessed directly from userspace via a new SPR
      number 0x3 (Rather than 0x11.  DSCR SPR number 0x11 is still used on POWER8 but
      like POWER7, is only accessible in HV and OS modes).  Currently, we allow this
      by setting H/FSCR DSCR bit on boot.
      
      Unfortunately this doesn't work, as the kernel needs to see the DSCR change so
      that it knows to no longer restore the system wide version of DSCR on context
      switch (ie. to set thread.dscr_inherit).
      
      This clears the H/FSCR DSCR bit initially.  If a process then accesses the DSCR
      (via SPR 0x3), it'll trap into the kernel where we set thread.dscr_inherit in
      facility_unavailable_exception().
      
      We also change _switch() so that we set or clear the H/FSCR DSCR bit based on
      the thread.dscr_inherit.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: <stable@vger.kernel.org> [v3.10]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2517617e
  11. 31 7月, 2013 1 次提交
  12. 01 7月, 2013 2 次提交
  13. 30 6月, 2013 1 次提交
  14. 20 6月, 2013 2 次提交
  15. 15 6月, 2013 1 次提交
    • P
      powerpc: Fix emulation of illegal instructions on PowerNV platform · bf593907
      Paul Mackerras 提交于
      Normally, the kernel emulates a few instructions that are unimplemented
      on some processors (e.g. the old dcba instruction), or privileged (e.g.
      mfpvr).  The emulation of unimplemented instructions is currently not
      working on the PowerNV platform.  The reason is that on these machines,
      unimplemented and illegal instructions cause a hypervisor emulation
      assist interrupt, rather than a program interrupt as on older CPUs.
      Our vector for the emulation assist interrupt just calls
      program_check_exception() directly, without setting the bit in SRR1
      that indicates an illegal instruction interrupt.  This fixes it by
      making the emulation assist interrupt set that bit before calling
      program_check_interrupt().  With this, old programs that use no-longer
      implemented instructions such as dcba now work again.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bf593907
  16. 01 6月, 2013 1 次提交
    • M
      powerpc/tm: Abort on emulation and alignment faults · 6ce6c629
      Michael Neuling 提交于
      If we are emulating an instruction inside an active user transaction that
      touches memory, the kernel can't emulate it as it operates in transactional
      suspend context.  We need to abort these transactions and send them back to
      userspace for the hardware to rollback.
      
      We can service these if the user transaction is in suspend mode, since the
      kernel will operate in the same suspend context.
      
      This adds a check to all alignment faults and to specific instruction
      emulations (only string instructions for now).  If the user process is in an
      active (non-suspended) transaction, we abort the transaction go back to
      userspace allowing the HW to roll back the transaction and tell the user of the
      failure.  This also adds new tm abort cause codes to report the reason of the
      persistent error to the user.
      
      Crappy test case here http://neuling.org/devel/junkcode/aligntm.cSigned-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: <stable@vger.kernel.org> # v3.9
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6ce6c629
  17. 14 5月, 2013 1 次提交
    • L
      powerpc: Exception hooks for context tracking subsystem · ba12eede
      Li Zhong 提交于
      This is the exception hooks for context tracking subsystem, including
      data access, program check, single step, instruction breakpoint, machine check,
      alignment, fp unavailable, altivec assist, unknown exception, whose handlers
      might use RCU.
      
      This patch corresponds to
      [PATCH] x86: Exception hooks for userspace RCU extended QS
        commit 6ba3c97a
      
      But after the exception handling moved to generic code, and some changes in
      following two commits:
      56dd9470
        context_tracking: Move exception handling to generic code
      6c1e0256
        context_tracking: Restore correct previous context state on exception exit
      
      it is able for exception hooks to use the generic code above instead of a
      redundant arch implementation.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ba12eede
  18. 06 5月, 2013 1 次提交
  19. 15 2月, 2013 4 次提交
  20. 21 1月, 2013 1 次提交
  21. 10 1月, 2013 2 次提交
  22. 05 9月, 2012 2 次提交
  23. 09 5月, 2012 1 次提交
  24. 29 3月, 2012 1 次提交
  25. 09 3月, 2012 1 次提交
    • B
      powerpc: Disable interrupts in 64-bit kernel FP and vector faults · 9f2f79e3
      Benjamin Herrenschmidt 提交于
      If we get a floating point, altivec or vsx unavaible interrupt in
      kernel, we trigger a kernel error. There is no point preserving
      the interrupt state, in fact, that can even make debugging harder
      as the processor state might change (we may even preempt) between
      taking the exception and landing in a debugger.
      
      So just make those 3 disable interrupts unconditionally.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2: On BookE only disable when hitting the kernel unavailable
          path, otherwise it will fail to restore softe as
          fast_exception_return doesn't do it.
      9f2f79e3
  26. 23 2月, 2012 1 次提交