1. 15 2月, 2013 3 次提交
    • M
      powerpc: Add transactional memory unavaliable execption handler · d0c0c9a1
      Michael Neuling 提交于
      These should never happen since we always turn on MSR TM when in userspace. We
      don't do lazy TM.
      
      Hence if we hit this, we barf and kill the task as something's gone horribly
      wrong.
      Signed-off-by: NMatt Evans <matt@ozlabs.org>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d0c0c9a1
    • P
      powerpc: Save CFAR before branching in interrupt entry paths · 1707dd16
      Paul Mackerras 提交于
      Some of the interrupt vectors on 64-bit POWER server processors are
      only 32 bytes long, which is not enough for the full first-level
      interrupt handler.  For these we currently just have a branch to an
      out-of-line handler.  However, this means that we corrupt the CFAR
      (come-from address register) on POWER7 and later processors.
      
      To fix this, we split the EXCEPTION_PROLOG_1 macro into two pieces:
      EXCEPTION_PROLOG_0 contains the part up to the point where the CFAR
      is saved in the PACA, and EXCEPTION_PROLOG_1 contains the rest.  We
      then put EXCEPTION_PROLOG_0 in the short interrupt vectors before
      we branch to the out-of-line handler, which contains the rest of the
      first-level interrupt handler.  To facilitate this, we define new
      _OOL (out of line) variants of STD_EXCEPTION_PSERIES, etc.
      
      In order to get EXCEPTION_PROLOG_0 to be short enough, i.e., no more
      than 6 instructions, it was necessary to move the stores that move
      the PPR and CFAR values into the PACA into __EXCEPTION_PROLOG_1 and
      to get rid of one of the two HMT_MEDIUM instructions.  Previously
      there was a HMT_MEDIUM_PPR_DISCARD before the prolog, which was
      nop'd out on processors with the PPR (POWER7 and later), and then
      another HMT_MEDIUM inside the HMT_MEDIUM_PPR_SAVE macro call inside
      __EXCEPTION_PROLOG_1, which was nop'd out on processors without PPR.
      Now the HMT_MEDIUM inside EXCEPTION_PROLOG_0 is there unconditionally
      and the HMT_MEDIUM_PPR_DISCARD is not strictly necessary, although
      this leaves it in for the interrupt vectors where there is room for
      it.
      
      Previously we had a handler for hypervisor maintenance interrupts at
      0xe50, which doesn't leave enough room for the vector for hypervisor
      emulation assist interrupts at 0xe40, since we need 8 instructions.
      The 0xe50 vector was only used on POWER6, as the HMI vector was moved
      to 0xe60 on POWER7.  Since we don't support running in hypervisor mode
      on POWER6, we just remove the handler at 0xe50.
      
      This also changes denorm_exception_hv to use EXCEPTION_PROLOG_0
      instead of open-coding it, and removes the HMT_MEDIUM_PPR_DISCARD
      from the relocation-on vectors (since any CPU that supports
      relocation-on interrupts also has the PPR).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1707dd16
    • P
      powerpc: Remove Cell-specific relocation-on interrupt vector code · 6100209b
      Paul Mackerras 提交于
      The Cell processor doesn't support relocation-on interrupts, so we
      don't need relocation-on versions of the interrupt vectors that are
      purely Cell-specific.  This removes them.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6100209b
  2. 10 1月, 2013 6 次提交
  3. 15 11月, 2012 7 次提交
  4. 17 9月, 2012 2 次提交
  5. 05 9月, 2012 1 次提交
    • P
      powerpc: Give hypervisor decrementer interrupts their own handler · dabe859e
      Paul Mackerras 提交于
      At the moment the handler for hypervisor decrementer interrupts is
      the same as for decrementer interrupts, i.e. timer_interrupt().
      This is bogus; if we ever do get a hypervisor decrementer interrupt
      it won't have anything to do with the next timer event.  In fact
      the only time we get hypervisor decrementer interrupts is when one
      is left pending on exit from a KVM guest.
      
      When we get a hypervisor decrementer interrupt we don't need to do
      anything special to clear it, since they are edge-triggered on the
      transition of HDEC from 0 to -1.  Thus this adds an empty handler
      function for them.  We don't need to have them masked when interrupts
      are soft-disabled, so we use STD_EXCEPTION_HV instead of
      MASKABLE_EXCEPTION_HV.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dabe859e
  6. 11 7月, 2012 2 次提交
  7. 09 5月, 2012 1 次提交
  8. 30 4月, 2012 1 次提交
  9. 08 4月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs · f0888f70
      Paul Mackerras 提交于
      Currently on POWER7, if we are running the guest on a core and we don't
      need all the hardware threads, we do nothing to ensure that the unused
      threads aren't executing in the kernel (other than checking that they
      are offline).  We just assume they're napping and we don't do anything
      to stop them trying to enter the kernel while the guest is running.
      This means that a stray IPI can wake up the hardware thread and it will
      then try to enter the kernel, but since the core is in guest context,
      it will execute code from the guest in hypervisor mode once it turns the
      MMU on, which tends to lead to crashes or hangs in the host.
      
      This fixes the problem by adding two new one-byte flags in the
      kvmppc_host_state structure in the PACA which are used to interlock
      between the primary thread and the unused secondary threads when entering
      the guest.  With these flags, the primary thread can ensure that the
      unused secondaries are not already in kernel mode (i.e. handling a stray
      IPI) and then indicate that they should not try to enter the kernel
      if they do get woken for any reason.  Instead they will go into KVM code,
      find that there is no vcpu to run, acknowledge and clear the IPI and go
      back to nap mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f0888f70
  10. 09 3月, 2012 7 次提交
    • B
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt 提交于
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      interrupts.
      
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      field.
      
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      
      In addition, this adds a few refinements:
      
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
      
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
      
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
      
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2:
      
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
      
      v3:
      
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
      
      v4:
      
       - Fix resend of doorbells on return from interrupt on Book3E
      
      v5:
      
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
      
      v6:
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
      
      v7:
       - Fix a bug with hard irq state tracking on native power7
      7230c564
    • B
      powerpc: Replace mfmsr instructions with load from PACA kernel_msr field · d9ada91a
      Benjamin Herrenschmidt 提交于
      On 64-bit, the mfmsr instruction can be quite slow, slower
      than loading a field from the cache-hot PACA, which happens
      to already contain the value we want in most cases.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d9ada91a
    • B
      powerpc: Disable interrupts in 64-bit kernel FP and vector faults · 9f2f79e3
      Benjamin Herrenschmidt 提交于
      If we get a floating point, altivec or vsx unavaible interrupt in
      kernel, we trigger a kernel error. There is no point preserving
      the interrupt state, in fact, that can even make debugging harder
      as the processor state might change (we may even preempt) between
      taking the exception and landing in a debugger.
      
      So just make those 3 disable interrupts unconditionally.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2: On BookE only disable when hitting the kernel unavailable
          path, otherwise it will fail to restore softe as
          fast_exception_return doesn't do it.
      9f2f79e3
    • B
      powerpc: Call do_page_fault() with interrupts off · a546498f
      Benjamin Herrenschmidt 提交于
      We currently turn interrupts back to their previous state before
      calling do_page_fault(). This can be annoying when debugging as
      a bad fault will potentially have lost some processor state before
      getting into the debugger.
      
      We also end up calling some generic code with interrupts enabled
      such as notify_page_fault() with interrupts enabled, which could
      be unexpected.
      
      This changes our code to behave more like other architectures,
      and make the assembly entry code call into do_page_faults() with
      interrupts disabled. They are conditionally re-enabled from
      within do_page_fault() in the same spot x86 does it.
      
      While there, add the might_sleep() test in the case of a successful
      trylock of the mmap semaphore, again like x86.
      
      Also fix a bug in the existing assembly where r12 (_MSR) could get
      clobbered by C calls (the DTL accounting in the exception common
      macro and DISABLE_INTS) in some cases.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2. Add the r12 clobber fix
      a546498f
    • B
      powerpc: Rework runlatch code · fe1952fc
      Benjamin Herrenschmidt 提交于
      This moves the inlines into system.h and changes the runlatch
      code to use the thread local flags (non-atomic) rather than
      the TIF flags (atomic) to keep track of the latch state.
      
      The code to turn it back on in an asynchronous interrupt is
      now simplified and partially inlined.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fe1952fc
    • B
      powerpc: Use the same interrupt prolog for perfmon as other interrupts · 7450f6f0
      Benjamin Herrenschmidt 提交于
      The perfmon interrupt is the sole user of a special variant of the
      interrupt prolog which differs from the one used by external and timer
      interrupts in that it saves the non-volatile GPRs and doesn't turn the
      runlatch on.
      
      The former is unnecessary and the later is arguably incorrect, so
      let's clean that up by using the same prolog. While at it we rename
      that prolog to use the _ASYNC prefix.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7450f6f0
    • B
      powerpc: Remove legacy iSeries bits from assembly files · 4f8cf36f
      Benjamin Herrenschmidt 提交于
      This removes the various bits of assembly in the kernel entry,
      exception handling and SLB management code that were specific
      to running under the legacy iSeries hypervisor which is no
      longer supported.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4f8cf36f
  11. 05 3月, 2012 1 次提交
    • P
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras 提交于
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      697d3899
  12. 22 2月, 2012 1 次提交
  13. 16 2月, 2012 1 次提交
    • B
      powerpc: Disable interrupts early in Program Check · 54321242
      Benjamin Herrenschmidt 提交于
      Program Check exceptions are the result of WARNs, BUGs, some
      type of breakpoints, kprobe, and other illegal instructions.
      
      We want interrupts (and thus preemption) to remain disabled
      while doing the initial stage of testing the reason and
      branching off to a debugger or kprobe, so we are still on
      the original CPU which makes debugging easier in various cases.
      
      This is how the code was intended, hence the local_irq_enable()
      right in the middle of program_check_exception().
      
      However, the assembly exception prologue for that exception was
      incorrectly marked as enabling interrupts, which defeats that
      (and records a redundant enable with lockdep).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      54321242
  14. 08 12月, 2011 1 次提交
    • P
      powerpc/powernv: Fix problems in onlining CPUs · cba313da
      Paul Mackerras 提交于
      At present, on the powernv platform, if you off-line a CPU that was
      online, and then try to on-line it again, the kernel generates a
      warning message "OPAL Error -1 starting CPU n".  Furthermore, if the
      CPU is a secondary thread that was used by KVM while it was off-line,
      the CPU fails to come online.
      
      The first problem is fixed by only calling OPAL to start the CPU the
      first time it is on-lined, as indicated by the cpu_start field of its
      PACA being zero.  The second problem is fixed by restoring the
      cpu_start field to 1 instead of 0 when using the CPU within KVM.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cba313da
  15. 08 11月, 2011 1 次提交
    • A
      powerpc/kvm: Fix build failure with HV KVM and CBE · 5ccf55dd
      Alexander Graf 提交于
      When running with HV KVM and CBE config options enabled, I get
      build failures like the following:
      
        arch/powerpc/kernel/head_64.o: In function `cbe_system_error_hv':
        (.text+0x1228): undefined reference to `do_kvm_0x1202'
        arch/powerpc/kernel/head_64.o: In function `cbe_maintenance_hv':
        (.text+0x1628): undefined reference to `do_kvm_0x1602'
        arch/powerpc/kernel/head_64.o: In function `cbe_thermal_hv':
        (.text+0x1828): undefined reference to `do_kvm_0x1802'
      
      This is because we jump to a KVM handler when HV is enabled, but we
      only generate the handler with PR KVM mode.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5ccf55dd
  16. 26 9月, 2011 1 次提交
    • P
      KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately · 177339d7
      Paul Mackerras 提交于
      This makes arch/powerpc/kvm/book3s_rmhandlers.S and
      arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
      separate compilation units rather than having them #included in
      arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
      conditional branches between the exception prologs in
      exceptions-64s.S and the KVM handlers, so there is no need to
      keep their contents close together in the vmlinux image.
      
      In their current location, they are using up part of the limited
      space between the first-level interrupt handlers and the firmware
      NMI data area at offset 0x7000, and with some kernel configurations
      this area will overflow (e.g. allyesconfig), leading to an
      "attempt to .org backwards" error when compiling exceptions-64s.S.
      
      Moving them out requires that we add some #includes that the
      book3s_{,hv_}rmhandlers.S code was previously getting implicitly
      via exceptions-64s.S.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      177339d7
  17. 20 9月, 2011 1 次提交
  18. 12 7月, 2011 2 次提交
    • P
      KVM: PPC: book3s_hv: Add support for PPC970-family processors · 9e368f29
      Paul Mackerras 提交于
      This adds support for running KVM guests in supervisor mode on those
      PPC970 processors that have a usable hypervisor mode.  Unfortunately,
      Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
      1), but the YDL PowerStation does have a usable hypervisor mode.
      
      There are several differences between the PPC970 and POWER7 in how
      guests are managed.  These differences are accommodated using the
      CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
      bits.  Notably, on PPC970:
      
      * The LPCR, LPID or RMOR registers don't exist, and the functions of
        those registers are provided by bits in HID4 and one bit in HID0.
      
      * External interrupts can be directed to the hypervisor, but unlike
        POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
        SRR0/1 not HSRR0/1.
      
      * There is no virtual RMA (VRMA) mode; the guest must use an RMO
        (real mode offset) area.
      
      * The TLB entries are not tagged with the LPID, so it is necessary to
        flush the whole TLB on partition switch.  Furthermore, when switching
        partitions we have to ensure that no other CPU is executing the tlbie
        or tlbsync instructions in either the old or the new partition,
        otherwise undefined behaviour can occur.
      
      * The PMU has 8 counters (PMC registers) rather than 6.
      
      * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.
      
      * The SLB has 64 entries rather than 32.
      
      * There is no mediated external interrupt facility, so if we switch to
        a guest that has a virtual external interrupt pending but the guest
        has MSR[EE] = 0, we have to arrange to have an interrupt pending for
        it so that we can get control back once it re-enables interrupts.  We
        do that by sending ourselves an IPI with smp_send_reschedule after
        hard-disabling interrupts.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9e368f29
    • P
      powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits · 969391c5
      Paul Mackerras 提交于
      This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
      indicate that we have a usable hypervisor mode, and another to indicate
      that the processor conforms to PowerISA version 2.06.  We also add
      another bit to indicate that the processor conforms to ISA version 2.01
      and set that for PPC970 and derivatives.
      
      Some PPC970 chips (specifically those in Apple machines) have a
      hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
      is not useful in the sense that there is no way to run any code in
      supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
      bits in HID4 are always 0, and we use that as a way of detecting that
      hypervisor mode is not useful.
      
      Where we have a feature section in assembly code around code that
      only applies on POWER7 in hypervisor mode, we use a construct like
      
      END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
      
      The definition of END_FTR_SECTION_IFSET is such that the code will
      be enabled (not overwritten with nops) only if all bits in the
      provided mask are set.
      
      Note that the CPU feature check in __tlbie() only needs to check the
      ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
      if we are running bare-metal, i.e. in hypervisor mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      969391c5