1. 09 10月, 2018 9 次提交
    • P
      KVM: PPC: Book3S HV: Handle hypercalls correctly when nested · 4bad7779
      Paul Mackerras 提交于
      When we are running as a nested hypervisor, we use a hypercall to
      enter the guest rather than code in book3s_hv_rmhandlers.S.  This means
      that the hypercall handlers listed in hcall_real_table never get called.
      There are some hypercalls that are handled there and not in
      kvmppc_pseries_do_hcall(), which therefore won't get processed for
      a nested guest.
      
      To fix this, we add cases to kvmppc_pseries_do_hcall() to handle those
      hypercalls, with the following exceptions:
      
      - The HPT hypercalls (H_ENTER, H_REMOVE, etc.) are not handled because
        we only support radix mode for nested guests.
      
      - H_CEDE has to be handled specially because the cede logic in
        kvmhv_run_single_vcpu assumes that it has been processed by the time
        that kvmhv_p9_guest_entry() returns.  Therefore we put a special
        case for H_CEDE in kvmhv_p9_guest_entry().
      
      For the XICS hypercalls, if real-mode processing is enabled, then the
      virtual-mode handlers assume that they are being called only to finish
      up the operation.  Therefore we turn off the real-mode flag in the XICS
      code when running as a nested hypervisor.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4bad7779
    • P
      KVM: PPC: Book3S HV: Nested guest entry via hypercall · 360cae31
      Paul Mackerras 提交于
      This adds a new hypercall, H_ENTER_NESTED, which is used by a nested
      hypervisor to enter one of its nested guests.  The hypercall supplies
      register values in two structs.  Those values are copied by the level 0
      (L0) hypervisor (the one which is running in hypervisor mode) into the
      vcpu struct of the L1 guest, and then the guest is run until an
      interrupt or error occurs which needs to be reported to L1 via the
      hypercall return value.
      
      Currently this assumes that the L0 and L1 hypervisors are the same
      endianness, and the structs passed as arguments are in native
      endianness.  If they are of different endianness, the version number
      check will fail and the hcall will be rejected.
      
      Nested hypervisors do not support indep_threads_mode=N, so this adds
      code to print a warning message if the administrator has set
      indep_threads_mode=N, and treat it as Y.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      360cae31
    • P
      KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct · fd0944ba
      Paul Mackerras 提交于
      When the 'regs' field was added to struct kvm_vcpu_arch, the code
      was changed to use several of the fields inside regs (e.g., gpr, lr,
      etc.) but not the ccr field, because the ccr field in struct pt_regs
      is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is
      only 32 bits.  This changes the code to use the regs.ccr field
      instead of cr, and changes the assembly code on 64-bit platforms to
      use 64-bit loads and stores instead of 32-bit ones.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fd0944ba
    • P
      KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests · 95a6432c
      Paul Mackerras 提交于
      This creates an alternative guest entry/exit path which is used for
      radix guests on POWER9 systems when we have indep_threads_mode=Y.  In
      these circumstances there is exactly one vcpu per vcore and there is
      no coordination required between vcpus or vcores; the vcpu can enter
      the guest without needing to synchronize with anything else.
      
      The new fast path is implemented almost entirely in C in book3s_hv.c
      and runs with the MMU on until the guest is entered.  On guest exit
      we use the existing path until the point where we are committed to
      exiting the guest (as distinct from handling an interrupt in the
      low-level code and returning to the guest) and we have pulled the
      guest context from the XIVE.  At that point we check a flag in the
      stack frame to see whether we came in via the old path and the new
      path; if we came in via the new path then we go back to C code to do
      the rest of the process of saving the guest context and restoring the
      host context.
      
      The C code is split into separate functions for handling the
      OS-accessible state and the hypervisor state, with the idea that the
      latter can be replaced by a hypercall when we implement nested
      virtualization.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [mpe: Fix CONFIG_ALTIVEC=n build]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      95a6432c
    • P
      KVM: PPC: Book3S: Rework TM save/restore code and make it C-callable · 7854f754
      Paul Mackerras 提交于
      This adds a parameter to __kvmppc_save_tm and __kvmppc_restore_tm
      which allows the caller to indicate whether it wants the nonvolatile
      register state to be preserved across the call, as required by the C
      calling conventions.  This parameter being non-zero also causes the
      MSR bits that enable TM, FP, VMX and VSX to be preserved.  The
      condition register and DSCR are now always preserved.
      
      With this, kvmppc_save_tm_hv and kvmppc_restore_tm_hv can be called
      from C code provided the 3rd parameter is non-zero.  So that these
      functions can be called from modules, they now include code to set
      the TOC pointer (r2) on entry, as they can call other built-in C
      functions which will assume the TOC to have been set.
      
      Also, the fake suspend code in kvmppc_save_tm_hv is modified here to
      assume that treclaim in fake-suspend state does not modify any registers,
      which is the case on POWER9.  This enables the code to be simplified
      quite a bit.
      
      _kvmppc_save_tm_pr and _kvmppc_restore_tm_pr become much simpler with
      this change, since they now only need to save and restore TAR and pass
      1 for the 3rd argument to __kvmppc_{save,restore}_tm.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7854f754
    • P
      KVM: PPC: Book3S HV: Simplify real-mode interrupt handling · df709a29
      Paul Mackerras 提交于
      This streamlines the first part of the code that handles a hypervisor
      interrupt that occurred in the guest.  With this, all of the real-mode
      handling that occurs is done before the "guest_exit_cont" label; once
      we get to that label we are committed to exiting to host virtual mode.
      Thus the machine check and HMI real-mode handling is moved before that
      label.
      
      Also, the code to handle external interrupts is moved out of line, as
      is the code that calls kvmppc_realmode_hmi_handler().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      df709a29
    • P
      KVM: PPC: Book3S HV: Extract PMU save/restore operations as C-callable functions · 41f4e631
      Paul Mackerras 提交于
      This pulls out the assembler code that is responsible for saving and
      restoring the PMU state for the host and guest into separate functions
      so they can be used from an alternate entry path.  The calling
      convention is made compatible with C.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      41f4e631
    • P
      KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C code · f7035ce9
      Paul Mackerras 提交于
      This is based on a patch by Suraj Jitindar Singh.
      
      This moves the code in book3s_hv_rmhandlers.S that generates an
      external, decrementer or privileged doorbell interrupt just before
      entering the guest to C code in book3s_hv_builtin.c.  This is to
      make future maintenance and modification easier.  The algorithm
      expressed in the C code is almost identical to the previous
      algorithm.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f7035ce9
    • P
      KVM: PPC: Book3S: Simplify external interrupt handling · d24ea8a7
      Paul Mackerras 提交于
      Currently we use two bits in the vcpu pending_exceptions bitmap to
      indicate that an external interrupt is pending for the guest, one
      for "one-shot" interrupts that are cleared when delivered, and one
      for interrupts that persist until cleared by an explicit action of
      the OS (e.g. an acknowledge to an interrupt controller).  The
      BOOK3S_IRQPRIO_EXTERNAL bit is used for one-shot interrupt requests
      and BOOK3S_IRQPRIO_EXTERNAL_LEVEL is used for persisting interrupts.
      
      In practice BOOK3S_IRQPRIO_EXTERNAL never gets used, because our
      Book3S platforms generally, and pseries in particular, expect
      external interrupt requests to persist until they are acknowledged
      at the interrupt controller.  That combined with the confusion
      introduced by having two bits for what is essentially the same thing
      makes it attractive to simplify things by only using one bit.  This
      patch does that.
      
      With this patch there is only BOOK3S_IRQPRIO_EXTERNAL, and by default
      it has the semantics of a persisting interrupt.  In order to avoid
      breaking the ABI, we introduce a new "external_oneshot" flag which
      preserves the behaviour of the KVM_INTERRUPT ioctl with the
      KVM_INTERRUPT_SET argument.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d24ea8a7
  2. 30 7月, 2018 2 次提交
  3. 16 7月, 2018 1 次提交
  4. 01 6月, 2018 2 次提交
    • S
      KVM: PPC: Book3S PR: Add C function wrapper for _kvmppc_save/restore_tm() · caa3be92
      Simon Guo 提交于
      Currently __kvmppc_save/restore_tm() APIs can only be invoked from
      assembly function. This patch adds C function wrappers for them so
      that they can be safely called from C function.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      caa3be92
    • S
      KVM: PPC: Book3S PR: Add guest MSR parameter for kvmppc_save_tm()/kvmppc_restore_tm() · 6f597c6b
      Simon Guo 提交于
      HV KVM and PR KVM need different MSR source to indicate whether
      treclaim. or trecheckpoint. is necessary.
      
      This patch add new parameter (guest MSR) for these kvmppc_save_tm/
      kvmppc_restore_tm() APIs:
      - For HV KVM, it is VCPU_MSR
      - For PR KVM, it is current host MSR or VCPU_SHADOW_SRR1
      
      This enhancement enables these 2 APIs to be reused by PR KVM later.
      And the patch keeps HV KVM logic unchanged.
      
      This patch also reworks kvmppc_save_tm()/kvmppc_restore_tm() to
      have a clean ABI: r3 for vcpu and r4 for guest_msr.
      
      During kvmppc_save_tm/kvmppc_restore_tm(), the R1 need to be saved
      or restored. Currently the R1 is saved into HSTATE_HOST_R1. In PR
      KVM, we are going to add a C function wrapper for
      kvmppc_save_tm/kvmppc_restore_tm() where the R1 will be incremented
      with added stackframe and save into HSTATE_HOST_R1. There are several
      places in HV KVM to load HSTATE_HOST_R1 as R1, and we don't want to
      bring risk or confusion by TM code.
      
      This patch will use HSTATE_SCRATCH2 to save/restore R1 in
      kvmppc_save_tm/kvmppc_restore_tm() to avoid future confusion, since
      the r1 is actually a temporary/scratch value to be saved/stored.
      
      [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV:
       Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)]
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6f597c6b
  5. 31 5月, 2018 2 次提交
    • S
      KVM: PPC: Book3S PR: Move kvmppc_save_tm/kvmppc_restore_tm to separate file · 009c872a
      Simon Guo 提交于
      It is a simple patch just for moving kvmppc_save_tm/kvmppc_restore_tm()
      functionalities to tm.S. There is no logic change. The reconstruct of
      those APIs will be done in later patches to improve readability.
      
      It is for preparation of reusing those APIs on both HV/PR PPC KVM.
      
      Some slight change during move the functions includes:
      - surrounds some HV KVM specific code with CONFIG_KVM_BOOK3S_HV_POSSIBLE
      for compilation.
      - use _GLOBAL() to define kvmppc_save_tm/kvmppc_restore_tm()
      
      [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV:
       Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)]
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      009c872a
    • P
      KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm · 7b0e827c
      Paul Mackerras 提交于
      This splits out the handling of "fake suspend" mode, part of the
      hypervisor TM assist code for POWER9, and puts almost all of it in
      new kvmppc_save_tm_hv and kvmppc_restore_tm_hv functions.  The new
      functions branch to kvmppc_save/restore_tm if the CPU does not
      require hypervisor TM assistance.
      
      With this, it will be more straightforward to move kvmppc_save_tm and
      kvmppc_restore_tm to another file and use them for transactional
      memory support in PR KVM.  Additionally, it also makes the code a
      bit clearer and reduces the number of feature sections.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7b0e827c
  6. 18 5月, 2018 2 次提交
  7. 17 5月, 2018 2 次提交
    • P
      KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path · df158189
      Paul Mackerras 提交于
      A radix guest can execute tlbie instructions to invalidate TLB entries.
      After a tlbie or a group of tlbies, it must then do the architected
      sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation
      has been processed by all CPUs in the system before it can rely on
      no CPU using any translation that it just invalidated.
      
      In fact it is the ptesync which does the actual synchronization in
      this sequence, and hardware has a requirement that the ptesync must
      be executed on the same CPU thread as the tlbies which it is expected
      to order.  Thus, if a vCPU gets moved from one physical CPU to
      another after it has done some tlbies but before it can get to do the
      ptesync, the ptesync will not have the desired effect when it is
      executed on the second physical CPU.
      
      To fix this, we do a ptesync in the exit path for radix guests.  If
      there are any pending tlbies, this will wait for them to complete.
      If there aren't, then ptesync will just do the same as sync.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      df158189
    • P
      KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry · 57b8daa7
      Paul Mackerras 提交于
      Currently, the HV KVM guest entry/exit code adds the timebase offset
      from the vcore struct to the timebase on guest entry, and subtracts
      it on guest exit.  Which is fine, except that it is possible for
      userspace to change the offset using the SET_ONE_REG interface while
      the vcore is running, as there is only one timebase offset per vcore
      but potentially multiple VCPUs in the vcore.  If that were to happen,
      KVM would subtract a different offset on guest exit from that which
      it had added on guest entry, leading to the timebase being out of sync
      between cores in the host, which then leads to bad things happening
      such as hangs and spurious watchdog timeouts.
      
      To fix this, we add a new field 'tb_offset_applied' to the vcore struct
      which stores the offset that is currently applied to the timebase.
      This value is set from the vcore tb_offset field on guest entry, and
      is what is subtracted from the timebase on guest exit.  Since it is
      zero when the timebase offset is not applied, we can simplify the
      logic in kvmhv_start_timing and kvmhv_accumulate_time.
      
      In addition, we had secondary threads reading the timebase while
      running concurrently with code on the primary thread which would
      eventually add or subtract the timebase offset from the timebase.
      This occurred while saving or restoring the DEC register value on
      the secondary threads.  Although no specific incorrect behaviour has
      been observed, this is a race which should be fixed.  To fix it, we
      move the DEC saving code to just before we call kvmhv_commence_exit,
      and the DEC restoring code to after the point where we have waited
      for the primary thread to switch the MMU context and add the timebase
      offset.  That way we are sure that the timebase contains the guest
      timebase value in both cases.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      57b8daa7
  8. 03 5月, 2018 1 次提交
    • N
      powerpc64/ftrace: Disable ftrace during kvm entry/exit · a4bc64d3
      Naveen N. Rao 提交于
      During guest entry/exit, we switch over to/from the guest MMU context
      and we cannot take exceptions in the hypervisor code.
      
      Since ftrace may be enabled and since it can result in us taking a trap,
      disable ftrace by setting paca->ftrace_enabled to zero. There are two
      paths through which we enter/exit a guest:
      1. If we are the vcore runner, then we enter the guest via
      __kvmppc_vcore_entry() and we disable ftrace around this. This is always
      the case for Power9, and for the primary thread on Power8.
      2. If we are a secondary thread in Power8, then we would be in nap due
      to SMT being disabled. We are woken up by an IPI to enter the guest. In
      this scenario, we enter the guest through kvm_start_guest(). We disable
      ftrace at this point. In this scenario, ftrace would only get re-enabled
      on the secondary thread when SMT is re-enabled (via start_secondary()).
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a4bc64d3
  9. 31 3月, 2018 1 次提交
    • A
      powerpc/kvm: Fix guest boot failure on Power9 since DAWR changes · ca9a16c3
      Aneesh Kumar K.V 提交于
      SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with
      H_SET_DABR. Since the recent commit e8ebedbf ("KVM: PPC: Book3S
      HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to
      return H_UNSUPPORTED on Power9, we see guest boot failures, the
      symptom is the boot seems to just stop in SLOF, eg:
      
        SLOF ***************************************************************
        QEMU Starting
         Build Date = Sep 24 2017 12:23:07
         FW Version = buildd@ release 20170724
        <no further output>
      
      SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return
      value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break
      the guest boot.
      
      That does mean we return a different error to PowerVM in this case,
      but that's probably not a big concern.
      
      Fixes: e8ebedbf ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca9a16c3
  10. 30 3月, 2018 1 次提交
  11. 27 3月, 2018 2 次提交
  12. 23 3月, 2018 4 次提交
    • P
      KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state · 681c617b
      Paul Mackerras 提交于
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where the contents of the TEXASR can get corrupted while a thread is
      in fake suspend state.  The workaround is for the instruction emulation
      code to use the value saved at the most recent guest exit in real
      suspend mode.  We achieve this by simply not saving the TEXASR into
      the vcpu struct on an exit in fake suspend state.  We also have to
      take care to set the orig_texasr field only on guest exit in real
      suspend state.
      
      This also means that on guest entry in fake suspend state, TEXASR
      will be restored to the value it had on the last exit in real suspend
      state, effectively counteracting any hardware-caused corruption.  This
      works because TEXASR may not be written in suspend state.
      
      With this, the guest might see the wrong values in TEXASR if it reads
      it while in suspend state, but will see the correct value in
      non-transactional state (e.g. after a treclaim), and treclaim will
      work correctly.
      
      With this workaround, the code will actually run slightly faster, and
      will operate correctly on systems without the TEXASR bug (since TEXASR
      may not be written in suspend state, and is only changed by failure
      recording, which will have already been done before we get into fake
      suspend state).  Therefore these changes are not made subject to a CPU
      feature bit.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      681c617b
    • S
      KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode · 87a11bb6
      Suraj Jitindar Singh 提交于
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where a treclaim performed in fake suspend mode can cause subsequent
      reads from the XER register to return inconsistent values for the SO
      (summary overflow) bit.  The inconsistent SO bit state can potentially
      be observed on any thread in the core.  We have to do the treclaim
      because that is the only way to get the thread out of suspend state
      (fake or real) and into non-transactional state.
      
      The workaround for the bug is to force the core into SMT4 mode before
      doing the treclaim.  This patch adds the code to do that, conditional
      on the CPU_FTR_P9_TM_XER_SO_BUG feature bit.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      87a11bb6
    • P
      KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 · 4bb3c7a0
      Paul Mackerras 提交于
      POWER9 has hardware bugs relating to transactional memory and thread
      reconfiguration (changes to hardware SMT mode).  Specifically, the core
      does not have enough storage to store a complete checkpoint of all the
      architected state for all four threads.  The DD2.2 version of POWER9
      includes hardware modifications designed to allow hypervisor software
      to implement workarounds for these problems.  This patch implements
      those workarounds in KVM code so that KVM guests see a full, working
      transactional memory implementation.
      
      The problems center around the use of TM suspended state, where the
      CPU has a checkpointed state but execution is not transactional.  The
      workaround is to implement a "fake suspend" state, which looks to the
      guest like suspended state but the CPU does not store a checkpoint.
      In this state, any instruction that would cause a transition to
      transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
      checkpointed state (treclaim) causes a "soft patch" interrupt (vector
      0x1500) to the hypervisor so that it can be emulated.  The trechkpt
      instruction also causes a soft patch interrupt.
      
      On POWER9 DD2.2, we avoid returning to the guest in any state which
      would require a checkpoint to be present.  The trechkpt in the guest
      entry path which would normally create that checkpoint is replaced by
      either a transition to fake suspend state, if the guest is in suspend
      state, or a rollback to the pre-transactional state if the guest is in
      transactional state.  Fake suspend state is indicated by a flag in the
      PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
      reads back as 0.
      
      On exit from the guest, if the guest is in fake suspend state, we still
      do the treclaim instruction as we would in real suspend state, in order
      to get into non-transactional state, but we do not save the resulting
      register state since there was no checkpoint.
      
      Emulation of the instructions that cause a softpatch interrupt is
      handled in two paths.  If the guest is in real suspend mode, we call
      kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
      transitioning to transactional state.  This is called before we do the
      treclaim in the guest exit path; because we haven't done treclaim, we
      can get back to the guest with the transaction still active.  If the
      instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
      handle, or if the guest is in fake suspend state, then we proceed to
      do the complete guest exit path and subsequently call
      kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
      all the cases including the cases that generate program interrupts
      (illegal instruction or TM Bad Thing) and facility unavailable
      interrupts.
      
      The emulation is reasonably straightforward and is mostly concerned
      with checking for exception conditions and updating the state of
      registers such as MSR and CR0.  The treclaim emulation takes care to
      ensure that the TEXASR register gets updated as if it were the guest
      treclaim instruction that had done failure recording, not the treclaim
      done in hypervisor state in the guest exit path.
      
      With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
      transactional memory is not available to host userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4bb3c7a0
    • P
      KVM: PPC: Book3S HV: Fix duplication of host SLB entries · cda4a147
      Paul Mackerras 提交于
      Since commit 6964e6a4 ("KVM: PPC: Book3S HV: Do SLB load/unload
      with guest LPCR value loaded", 2018-01-11), we have been seeing
      occasional machine check interrupts on POWER8 systems when running
      KVM guests, due to SLB multihit errors.
      
      This turns out to be due to the guest exit code reloading the host
      SLB entries from the SLB shadow buffer when the SLB was not previously
      cleared in the guest entry path.  This can happen because the path
      which skips from the guest entry code to the guest exit code without
      entering the guest now does the skip before the SLB is cleared and
      loaded with guest values, but the host values are loaded after the
      point in the guest exit path that we skip to.
      
      To fix this, we move the code that reloads the host SLB values up
      so that it occurs just before the point in the guest exit code (the
      label guest_bypass:) where we skip to from the guest entry path.
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Fixes: 6964e6a4 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded")
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      cda4a147
  13. 14 3月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry · a8b48a4d
      Paul Mackerras 提交于
      This fixes a bug where the trap number that is returned by
      __kvmppc_vcore_entry gets corrupted.  The effect of the corruption
      is that IPIs get ignored on POWER9 systems when the IPI is sent via
      a doorbell interrupt to a CPU which is executing in a KVM guest.
      The effect of the IPI being ignored is often that another CPU locks
      up inside smp_call_function_many() (and if that CPU is holding a
      spinlock, other CPUs then lock up inside raw_spin_lock()).
      
      The trap number is currently held in register r12 for most of the
      assembly-language part of the guest exit path.  In that path, we
      call kvmppc_subcore_exit_guest(), which is a C function, without
      restoring r12 afterwards.  Depending on the kernel config and the
      compiler, it may modify r12 or it may not, so some config/compiler
      combinations see the bug and others don't.
      
      To fix this, we arrange for the trap number to be stored on the
      stack from the 'guest_bypass:' label until the end of the function,
      then the trap number is loaded and returned in r12 as before.
      
      Cc: stable@vger.kernel.org # v4.8+
      Fixes: fd7bacbc ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      a8b48a4d
  14. 09 2月, 2018 1 次提交
    • A
      KVM: PPC: Book3S HV: Branch inside feature section · d20fe50a
      Alexander Graf 提交于
      We ended up with code that did a conditional branch inside a feature
      section to code outside of the feature section. Depending on how the
      object file gets organized, that might mean we exceed the 14bit
      relocation limit for conditional branches:
      
        arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4
      
      So instead of doing a conditional branch outside of the feature section,
      let's just jump at the end of the same, making the branch very short.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d20fe50a
  15. 19 1月, 2018 5 次提交
  16. 18 1月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9 · d075745d
      Paul Mackerras 提交于
      Hypervisor maintenance interrupts (HMIs) are generated by various
      causes, signalled by bits in the hypervisor maintenance exception
      register (HMER).  In most cases calling OPAL to handle the interrupt
      is the correct thing to do, but the "debug trigger" HMIs signalled by
      PPC bit 17 (bit 46) of HMER are used to invoke software workarounds
      for hardware bugs, and OPAL does not have any code to handle this
      cause.  The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips
      to work around a hardware bug in executing vector load instructions to
      cache inhibited memory.  In POWER9 DD2.2 chips, it is generated when
      conditions are detected relating to threads being in TM (transactional
      memory) suspended mode when the core SMT configuration needs to be
      reconfigured.
      
      The kernel currently has code to detect the vector CI load condition,
      but only when the HMI occurs in the host, not when it occurs in a
      guest.  If a HMI occurs in the guest, it is always passed to OPAL, and
      then we always re-sync the timebase, because the HMI cause might have
      been a timebase error, for which OPAL would re-sync the timebase, thus
      removing the timebase offset which KVM applied for the guest.  Since
      we don't know what OPAL did, we don't know whether to subtract the
      timebase offset from the timebase, so instead we re-sync the timebase.
      
      This adds code to determine explicitly what the cause of a debug
      trigger HMI will be.  This is based on a new device-tree property
      under the CPU nodes called ibm,hmi-special-triggers, if it is
      present, or otherwise based on the PVR (processor version register).
      The handling of debug trigger HMIs is pulled out into a separate
      function which can be called from the KVM guest exit code.  If this
      function handles and clears the HMI, and no other HMI causes remain,
      then we skip calling OPAL and we proceed to subtract the guest
      timebase offset from the timebase.
      
      The overall handling for HMIs that occur in the host (i.e. not in a
      KVM guest) is largely unchanged, except that we now don't set the flag
      for the vector CI load workaround on DD2.2 processors.
      
      This also removes a BUG_ON in the KVM code.  BUG_ON is generally not
      useful in KVM guest entry/exit code since it is difficult to handle
      the resulting trap gracefully.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d075745d
  17. 17 1月, 2018 2 次提交
    • P
      KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded · 6964e6a4
      Paul Mackerras 提交于
      This moves the code that loads and unloads the guest SLB values so that
      it is done while the guest LPCR value is loaded in the LPCR register.
      The reason for doing this is that on POWER9, the behaviour of the
      slbmte instruction depends on the LPCR[UPRT] bit.  If UPRT is 1, as
      it is for a radix host (or guest), the SLB index is truncated to
      2 bits.  This means that for a HPT guest on a radix host, the SLB
      was not being loaded correctly, causing the guest to crash.
      
      The SLB is now loaded much later in the guest entry path, after the
      LPCR is loaded, which for a secondary thread is after it sees that
      the primary thread has switched the MMU to the guest.  The loop that
      waits for the primary thread has a branch out to the exit code that
      is taken if it sees that other threads have commenced exiting the
      guest.  Since we have now not loaded the SLB at this point, we make
      this path branch to a new label 'guest_bypass' and we move the SLB
      unload code to before this label.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6964e6a4
    • P
      KVM: PPC: Book3S HV: Make sure we don't re-enter guest without XIVE loaded · 43ff3f65
      Paul Mackerras 提交于
      This fixes a bug where it is possible to enter a guest on a POWER9
      system without having the XIVE (interrupt controller) context loaded.
      This can happen because we unload the XIVE context from the CPU
      before doing the real-mode handling for machine checks.  After the
      real-mode handler runs, it is possible that we re-enter the guest
      via a fast path which does not load the XIVE context.
      
      To fix this, we move the unloading of the XIVE context to come after
      the real-mode machine check handler is called.
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      43ff3f65
  18. 11 1月, 2018 1 次提交