1. 25 9月, 2016 3 次提交
  2. 19 9月, 2016 1 次提交
  3. 13 9月, 2016 1 次提交
    • P
      powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address · f0f558b1
      Paul Mackerras 提交于
      Currently, if userspace or the kernel accesses a completely bogus address,
      for example with any of bits 46-59 set, we first take an SLB miss interrupt,
      install a corresponding SLB entry with VSID 0, retry the instruction, then
      take a DSI/ISI interrupt because there is no HPT entry mapping the address.
      However, by the time of the second interrupt, the Come-From Address Register
      (CFAR) has been overwritten by the rfid instruction at the end of the SLB
      miss interrupt handler.  Since bogus accesses can often be caused by a
      function return after the stack has been overwritten, the CFAR value would
      be very useful as it could indicate which function it was whose return had
      led to the bogus address.
      
      This patch adds code to create a full exception frame in the SLB miss handler
      in the case of a bogus address, rather than inserting an SLB entry with a
      zero VSID field.  Then we call a new slb_miss_bad_addr() function in C code,
      which delivers a signal for a user access or creates an oops for a kernel
      access.  In the latter case the oops message will show the CFAR value at the
      time of the access.
      
      In the case of the radix MMU, a segment miss interrupt indicates an access
      outside the ranges mapped by the page tables.  Previously this was handled
      by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
      not really correct.  With this patch, we now handle these interrupts with
      slb_miss_bad_addr(), which is much more consistent.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f0f558b1
  4. 22 8月, 2016 1 次提交
  5. 21 6月, 2016 2 次提交
    • J
      powerpc: Load Monitor Register Support · bd3ea317
      Jack Miller 提交于
      This enables new registers, LMRR and LMSER, that can trigger an EBB in
      userspace code when a monitored load (via the new ldmx instruction)
      loads memory from a monitored space. This facility is controlled by a
      new FSCR bit, LM.
      
      This patch disables the FSCR LM control bit on task init and enables
      that bit when a load monitor facility unavailable exception is taken
      for using it. On context switch, this bit is then used to determine
      whether the two relevant registers are saved and restored. This is
      done lazily for performance reasons.
      Signed-off-by: NJack Miller <jack@codezen.org>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bd3ea317
    • M
      powerpc: Improve FSCR init and context switching · b57bd2de
      Michael Neuling 提交于
      This fixes a few issues with FSCR init and switching.
      
      In commit 152d523e ("powerpc: Create context switch helpers
      save_sprs() and restore_sprs()") we moved the setting of the FSCR
      register from inside an CPU_FTR_ARCH_207S section to inside just a
      CPU_FTR_ARCH_DSCR section. Hence we are setting FSCR on POWER6/7 where
      the FSCR doesn't exist. This is harmless but we shouldn't do it.
      
      Also, we can simplify the FSCR context switch. We don't need to go
      through the calculation involving dscr_inherit. We can just restore
      what we saved last time.
      
      We also set an initial value in INIT_THREAD, so that pid 1 which is
      cloned from that gets a sane value.
      
      Based on patch by Jack Miller.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b57bd2de
  6. 20 6月, 2016 1 次提交
    • M
      KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt · fd7bacbc
      Mahesh Salgaonkar 提交于
      When a guest is assigned to a core it converts the host Timebase (TB)
      into guest TB by adding guest timebase offset before entering into
      guest. During guest exit it restores the guest TB to host TB. This means
      under certain conditions (Guest migration) host TB and guest TB can differ.
      
      When we get an HMI for TB related issues the opal HMI handler would
      try fixing errors and restore the correct host TB value. With no guest
      running, we don't have any issues. But with guest running on the core
      we run into TB corruption issues.
      
      If we get an HMI while in the guest, the current HMI handler invokes opal
      hmi handler before forcing guest to exit. The guest exit path subtracts
      the guest TB offset from the current TB value which may have already
      been restored with host value by opal hmi handler. This leads to incorrect
      host and guest TB values.
      
      With split-core, things become more complex. With split-core, TB also gets
      split and each subcore gets its own TB register. When a hmi handler fixes
      a TB error and restores the TB value, it affects all the TB values of
      sibling subcores on the same core. On TB errors all the thread in the core
      gets HMI. With existing code, the individual threads call opal hmi handle
      independently which can easily throw TB out of sync if we have guest
      running on subcores. Hence we will need to co-ordinate with all the
      threads before making opal hmi handler call followed by TB resync.
      
      This patch introduces a sibling subcore state structure (shared by all
      threads in the core) in paca which holds information about whether sibling
      subcores are in Guest mode or host mode. An array in_guest[] of size
      MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
      The subcore id is used as index into in_guest[] array. Only primary
      thread entering/exiting the guest is responsible to set/unset its
      designated array element.
      
      On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
      this patch will now force guest to vacate the core/subcore. Primary
      thread from each subcore will then turn off its respective bit
      from the above bitmap during the guest exit path just after the
      guest->host partition switch is complete.
      
      All other threads that have just exited the guest OR were already in host
      will wait until all other subcores clears their respective bit.
      Once all the subcores turn off their respective bit, all threads will
      will make call to opal hmi handler.
      
      It is not necessary that opal hmi handler would resync the TB value for
      every HMI interrupts. It would do so only for the HMI caused due to
      TB errors. For rest, it would not touch TB value. Hence to make things
      simpler, primary thread would call TB resync explicitly once for each
      core immediately after opal hmi handler instead of subtracting guest
      offset from TB. TB resync call will restore the TB with host value.
      Thus we can be sure about the TB state.
      
      One of the primary threads exiting the guest will take up the
      responsibility of calling TB resync. It will use one of the top bits
      (bit 63) from subcore state flags bitmap to make the decision. The first
      primary thread (among the subcores) that is able to set the bit will
      have to call the TB resync. Rest all other threads will wait until TB
      resync is complete.  Once TB resync is complete all threads will then
      proceed.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      fd7bacbc
  7. 16 6月, 2016 1 次提交
  8. 18 3月, 2016 1 次提交
  9. 01 3月, 2016 1 次提交
  10. 24 2月, 2016 1 次提交
  11. 26 11月, 2015 1 次提交
  12. 06 7月, 2015 1 次提交
  13. 07 6月, 2015 1 次提交
    • A
      powerpc: Fix handling of DSCR related facility unavailable exception · c952c1c4
      Anshuman Khandual 提交于
      Currently DSCR (Data Stream Control Register) can be accessed with
      mfspr or mtspr instructions inside a thread via two different SPR
      numbers. One being the user accessible problem state SPR number 0x03
      and the other being the privilege state SPR number 0x11. All access
      through the privilege state SPR number get emulated through illegal
      instruction exception. Any access through the problem state SPR number
      raises one facility unavailable exception which sets the thread based
      dscr_inherit bit and enables DSCR facility through FSCR register thus
      allowing direct access to DSCR without going through this exception in
      the future. We set the thread.dscr_inherit bit whether the access was
      with mfspr or mtspr instruction which is neither correct nor does it
      match the behaviour through the instruction emulation code path driven
      from privilege state SPR number. User currently observes two different
      kind of behaviour when accessing the DSCR through these two SPR numbers.
      This problem can be observed through these two test cases by replacing
      the privilege state SPR number with the problem state SPR number.
      
      	(1) http://ozlabs.org/~anton/junkcode/dscr_default_test.c
      	(2) http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
      
      This patch fixes the problem by making sure that the behaviour visible
      to the user remains the same irrespective of which SPR number is being
      used. Inside facility unavailable exception, we check whether it was
      cuased by a mfspr or a mtspr isntrucction. In case of mfspr instruction,
      just emulate the instruction. In case of mtspr instruction, set the
      thread based dscr_inherit bit and also enable the facility through FSCR.
      All user SPR based mfspr instruction will be emulated till one user SPR
      based mtspr has been executed.
      Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c952c1c4
  14. 28 1月, 2015 1 次提交
    • M
      powerpc: Remove some unused functions · 8aa989b8
      Michael Ellerman 提交于
      Remove slice_set_psize() which is not used.
      
      It was added in 3a8247cc "powerpc: Only demote individual slices
      rather than whole process" but was never used.
      
      Remove vsx_assist_exception() which is not used.
      
      It was added in ce48b210 "powerpc: Add VSX context save/restore,
      ptrace and signal support" but was never used.
      
      Remove generic_mach_cpu_die() which is not used.
      
      Its last caller was removed in 375f561a "powerpc/powernv: Always go
      into nap mode when CPU is offline".
      
      Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
      unused.
      
      These were introduced in c5d56332 "[POWERPC] Add general support for
      mpc7448hpc2 (Taiga) platform" but were never used.
      
      This was partially found by using a static code analysis program called
      cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      [mpe: Update changelog with details on when/why they are unused]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8aa989b8
  15. 03 11月, 2014 1 次提交
    • C
      powerpc: Replace __get_cpu_var uses · 69111bac
      Christoph Lameter 提交于
      This still has not been merged and now powerpc is the only arch that does
      not have this change. Sorry about missing linuxppc-dev before.
      
      V2->V2
        - Fix up to work against 3.18-rc1
      
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [mpe: Fix build errors caused by set/or_softirq_pending(), and rework
            assignment in __set_breakpoint() to use memcpy().]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69111bac
  16. 27 8月, 2014 2 次提交
    • T
      Revert "powerpc: Replace __get_cpu_var uses" · 23f66e2d
      Tejun Heo 提交于
      This reverts commit 5828f666 due to
      build failure after merging with pending powerpc changes.
      
      Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.auSigned-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      23f66e2d
    • C
      powerpc: Replace __get_cpu_var uses · 5828f666
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      tj: Folded a fix patch.
          http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5828f666
  17. 05 8月, 2014 1 次提交
  18. 26 6月, 2014 1 次提交
  19. 11 6月, 2014 1 次提交
  20. 09 4月, 2014 1 次提交
  21. 24 3月, 2014 1 次提交
  22. 15 1月, 2014 2 次提交
    • P
      powerpc: Fix transactional FP/VMX/VSX unavailable handlers · 3ac8ff1c
      Paul Mackerras 提交于
      Currently, if a process starts a transaction and then takes an
      exception because the FPU, VMX or VSX unit is unavailable to it,
      we end up corrupting any FP/VMX/VSX state that was valid before
      the interrupt.  For example, if the process starts a transaction
      with the FPU available to it but VMX unavailable, and then does
      a VMX instruction inside the transaction, the FP state gets
      corrupted.
      
      Loading up the desired state generally involves doing a reclaim
      and a recheckpoint.  To avoid corrupting already-valid state, we have
      to be careful not to reload that state from the thread_struct
      between the reclaim and the recheckpoint (since the thread_struct
      values are stale by now), and we have to reload that state from
      the transact_fp/vr arrays after the recheckpoint to get back the
      current transactional values saved there by the reclaim.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ac8ff1c
    • P
      powerpc: Don't corrupt transactional state when using FP/VMX in kernel · d31626f7
      Paul Mackerras 提交于
      Currently, when we have a process using the transactional memory
      facilities on POWER8 (that is, the processor is in transactional
      or suspended state), and the process enters the kernel and the
      kernel then uses the floating-point or vector (VMX/Altivec) facility,
      we end up corrupting the user-visible FP/VMX/VSX state.  This
      happens, for example, if a page fault causes a copy-on-write
      operation, because the copy_page function will use VMX to do the
      copy on POWER8.  The test program below demonstrates the bug.
      
      The bug happens because when FP/VMX state for a transactional process
      is stored in the thread_struct, we store the checkpointed state in
      .fp_state/.vr_state and the transactional (current) state in
      .transact_fp/.transact_vr.  However, when the kernel wants to use
      FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
      which saves the current state in .fp_state/.vr_state.  Furthermore,
      when we return to the user process we return with FP/VMX/VSX
      disabled.  The next time the process uses FP/VMX/VSX, we don't know
      which set of state (the current register values, .fp_state/.vr_state,
      or .transact_fp/.transact_vr) we should be using, since we have no
      way to tell if we are still in the same transaction, and if not,
      whether the previous transaction succeeded or failed.
      
      Thus it is necessary to strictly adhere to the rule that if FP has
      been enabled at any point in a transaction, we must keep FP enabled
      for the user process with the current transactional state in the
      FP registers, until we detect that it is no longer in a transaction.
      Similarly for VMX; once enabled it must stay enabled until the
      process is no longer transactional.
      
      In order to keep this rule, we add a new thread_info flag which we
      test when returning from the kernel to userspace, called TIF_RESTORE_TM.
      This flag indicates that there is FP/VMX/VSX state to be restored
      before entering userspace, and when it is set the .tm_orig_msr field
      in the thread_struct indicates what state needs to be restored.
      The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
      bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
      which are called from enable_kernel_fp/altivec, giveup_vsx, and
      flush_fp/altivec_to_thread instead of giveup_fpu/altivec.
      
      The other thing to be done is to get the transactional FP/VMX/VSX
      state from .fp_state/.vr_state when doing reclaim, if that state
      has been saved there by giveup_fpu/altivec_maybe_transactional.
      Having done this, we set the FP/VMX bit in the thread's MSR after
      reclaim to indicate that that part of the state is now valid
      (having been reclaimed from the processor's checkpointed state).
      
      Finally, in the signal handling code, we move the clearing of the
      transactional state bits in the thread's MSR a bit earlier, before
      calling flush_fp_to_thread(), so that we don't unnecessarily set
      the TIF_RESTORE_TM bit.
      
      This is the test program:
      
      /* Michael Neuling 4/12/2013
       *
       * See if the altivec state is leaked out of an aborted transaction due to
       * kernel vmx copy loops.
       *
       *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
       *
       */
      
      /* We don't use all of these, but for reference: */
      
      int main(int argc, char *argv[])
      {
      	long double vecin = 1.3;
      	long double vecout;
      	unsigned long pgsize = getpagesize();
      	int i;
      	int fd;
      	int size = pgsize*16;
      	char tmpfile[] = "/tmp/page_faultXXXXXX";
      	char buf[pgsize];
      	char *a;
      	uint64_t aborted = 0;
      
      	fd = mkstemp(tmpfile);
      	assert(fd >= 0);
      
      	memset(buf, 0, pgsize);
      	for (i = 0; i < size; i += pgsize)
      		assert(write(fd, buf, pgsize) == pgsize);
      
      	unlink(tmpfile);
      
      	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      	assert(a != MAP_FAILED);
      
      	asm __volatile__(
      		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
      		TBEGIN
      		"beq	3f ;"
      		TSUSPEND
      		"xxlxor 40,40,40 ; " // set 40 to 0
      		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
      		TABORT
      		TRESUME
      		TEND
      		"li	%[res], 0 ;"
      		"b	5f ;"
      		"3: ;" // Abort handler
      		"li	%[res], 1 ;"
      		"5: ;"
      		"stxvd2x 40,0,%[vecoutptr] ; "
      		: [res]"=r"(aborted)
      		: [vecinptr]"r"(&vecin),
      		  [vecoutptr]"r"(&vecout),
      		  [map]"r"(a)
      		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");
      
      	if (aborted && (vecin != vecout)){
      		printf("FAILED: vector state leaked on abort %f != %f\n",
      		       (double)vecin, (double)vecout);
      		exit(1);
      	}
      
      	munmap(a, size);
      
      	close(fd);
      
      	printf("PASSED!\n");
      	return 0;
      }
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d31626f7
  23. 05 12月, 2013 2 次提交
    • M
      powerpc/book3s: Introduce a early machine check hook in cpu_spec. · 4c703416
      Mahesh Salgaonkar 提交于
      This patch adds the early machine check function pointer in cputable for
      CPU specific early machine check handling. The early machine handle routine
      will be called in real mode to handle SLB and TLB errors. We can not reuse
      the existing machine_check hook because it is always invoked in kernel
      virtual mode and we would already be in trouble if we get SLB or TLB errors.
      This patch just sets up a mechanism to invoke CPU specific handler. The
      subsequent patches will populate the function pointer.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4c703416
    • M
      powerpc/book3s: handle machine check in Linux host. · 1e9b4507
      Mahesh Salgaonkar 提交于
      Move machine check entry point into Linux. So far we were dependent on
      firmware to decode MCE error details and handover the high level info to OS.
      
      This patch introduces early machine check routine that saves the MCE
      information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
      stack frame on emergency stack and set the r1 accordingly. This allows us to be
      prepared to take another exception without loosing context. One thing to note
      here that, if we get another machine check while ME bit is off then we risk a
      checkstop. Hence we restrict ourselves to save only MCE information and
      register saved on PACA_EXMC save are before we turn the ME bit on. We use
      paca->in_mce flag to differentiate between first entry and nested machine check
      entry which helps proper use of emergency stack. We increment paca->in_mce
      every time we enter in early machine check handler and decrement it while
      leaving. When we enter machine check early handler first time (paca->in_mce ==
      0), we are sure nobody is using MC emergency stack and allocate a stack frame
      at the start of the emergency stack. During subsequent entry (paca->in_mce >
      0), we know that r1 points inside emergency stack and we allocate separate
      stack frame accordingly. This prevents us from clobbering MCE information
      during nested machine checks.
      
      The early machine check handler changes are placed under CPU_FTR_HVMODE
      section. This makes sure that the early machine check handler will get executed
      only in hypervisor kernel.
      
      This is the code flow:
      
      		Machine Check Interrupt
      			|
      			V
      		   0x200 vector				  ME=0, IR=0, DR=0
      			|
      			V
      	+-----------------------------------------------+
      	|machine_check_pSeries_early:			| ME=0, IR=0, DR=0
      	|	Alloc frame on emergency stack		|
      	|	Save srr1, srr0, dar and dsisr on stack |
      	+-----------------------------------------------+
      			|
      		(ME=1, IR=0, DR=0, RFID)
      			|
      			V
      		machine_check_handle_early		  ME=1, IR=0, DR=0
      			|
      			V
      	+-----------------------------------------------+
      	|	machine_check_early (r3=pt_regs)	| ME=1, IR=0, DR=0
      	|	Things to do: (in next patches)		|
      	|		Flush SLB for SLB errors	|
      	|		Flush TLB for TLB errors	|
      	|		Decode and save MCE info	|
      	+-----------------------------------------------+
      			|
      	(Fall through existing exception handler routine.)
      			|
      			V
      		machine_check_pSerie			  ME=1, IR=0, DR=0
      			|
      		(ME=1, IR=1, DR=1, RFID)
      			|
      			V
      		machine_check_common			  ME=1, IR=1, DR=1
      			.
      			.
      			.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1e9b4507
  24. 29 10月, 2013 2 次提交
  25. 19 10月, 2013 1 次提交
  26. 17 10月, 2013 2 次提交
  27. 11 10月, 2013 1 次提交
    • P
      powerpc: Put FP/VSX and VR state into structures · de79f7b9
      Paul Mackerras 提交于
      This creates new 'thread_fp_state' and 'thread_vr_state' structures
      to store FP/VSX state (including FPSCR) and Altivec/VSX state
      (including VSCR), and uses them in the thread_struct.  In the
      thread_fp_state, the FPRs and VSRs are represented as u64 rather
      than double, since we rarely perform floating-point computations
      on the values, and this will enable the structures to be used
      in KVM code as well.  Similarly FPSCR is now a u64 rather than
      a structure of two 32-bit values.
      
      This takes the offsets out of the macros such as SAVE_32FPRS,
      REST_32FPRS, etc.  This enables the same macros to be used for normal
      and transactional state, enabling us to delete the transactional
      versions of the macros.   This also removes the unused do_load_up_fpu
      and do_load_up_altivec, which were in fact buggy since they didn't
      create large enough stack frames to account for the fact that
      load_up_fpu and load_up_altivec are not designed to be called from C
      and assume that their caller's stack frame is an interrupt frame.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      de79f7b9
  28. 27 8月, 2013 2 次提交
    • M
      powerpc: Cleanup handling of the DSCR bit in the FSCR register · bc683a7e
      Michael Neuling 提交于
      As suggested by paulus we can simplify the Data Stream Control Register
      (DSCR) Facility Status and Control Register (FSCR) handling.
      
      Firstly, we simplify the asm by using a rldimi.
      
      Secondly, we now use the FSCR only to control the DSCR facility, rather
      than both the FSCR and HFSCR.  Users will see no functional change from
      this but will get a minor speedup as they will trap into the kernel only
      once (rather than twice) when they first touch the DSCR.  Also, this
      changes removes a bunch of ugly FTR_SECTION code.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bc683a7e
    • M
      powerpc: Skip emulating & leave interrupts off for kernel program checks · b3f6a459
      Michael Ellerman 提交于
      In the program check handler we handle some causes with interrupts off
      and others with interrupts on.
      
      We need to enable interrupts to handle the emulation cases, because they
      access userspace memory and might sleep.
      
      For faults in the kernel we don't want to do any emulation, and
      emulate_instruction() enforces that. do_mathemu() doesn't but probably
      should.
      
      The other disadvantage of enabling interrupts for kernel faults is that
      we may take another interrupt, and recurse. As seen below:
      
        --- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
        [link register   ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
        [c000000fb185dc10] 0000000000000000 (unreliable)
        [c000000fb185dc80] c0000000007d8558 .program_check_exception+0x298/0x2d0
        [c000000fb185dd00] c000000000002f40 emulation_assist_common+0x140/0x180
        --- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
        [link register   ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
        [c000000fb185dff0] 00000000008b9190 (unreliable)
        [c000000fb185e060] c0000000007d8558 .program_check_exception+0x298/0x2d0
      
      So avoid both problems by checking if the fault was in the kernel and
      skipping the enable of interrupts and the emulation. Go straight to
      delivering the SIGILL, which for kernel faults calls die() and so on,
      dropping us in the debugger etc.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b3f6a459
  29. 14 8月, 2013 3 次提交