1. 01 4月, 2020 1 次提交
    • N
      powerpc/64s/exception: Move KVM test to common code · 9600f261
      Nicholas Piggin 提交于
      This allows more code to be moved out of unrelocated regions. The
      system call KVMTEST is changed to be open-coded and remain in the
      tramp area to avoid having to move it to entry_64.S. The custom nature
      of the system call entry code means the hcall case can be made more
      streamlined than regular interrupt handlers.
      
      mpe: Incorporate fix from Nick:
      
      Moving KVM test to the common entry code missed the case of HMI and
      MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to
      switch to virt mode).
      
      This means a MCE or HMI exception that is taken while KVM is running a
      guest context will not be switched out of that context, and KVM won't
      be notified. Found by running sigfuz in guest with patched host on
      POWER9 DD2.3, which causes some TM related HMI interrupts (which are
      expected and supposed to be handled by KVM).
      
      This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add
      the KVM test. This makes them look a little more like other handlers
      that all use __GEN_COMMON_ENTRY.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200225173541.1549955-13-npiggin@gmail.com
      9600f261
  2. 25 1月, 2020 1 次提交
  3. 17 12月, 2019 1 次提交
  4. 14 11月, 2019 1 次提交
    • M
      KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel · af2e8c68
      Michael Ellerman 提交于
      On some systems that are vulnerable to Spectre v2, it is up to
      software to flush the link stack (return address stack), in order to
      protect against Spectre-RSB.
      
      When exiting from a guest we do some house keeping and then
      potentially exit to C code which is several stack frames deep in the
      host kernel. We will then execute a series of returns without
      preceeding calls, opening up the possiblity that the guest could have
      poisoned the link stack, and direct speculative execution of the host
      to a gadget of some sort.
      
      To prevent this we add a flush of the link stack on exit from a guest.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      af2e8c68
  5. 09 10月, 2019 1 次提交
    • J
      powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host · 7fe4e117
      Jordan Niethe 提交于
      kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S
      needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to
      continue. This happens after resetting the PCR. Before commit
      13c7bb3c ("powerpc/64s: Set reserved PCR bits"), r0 would always
      be 0 before it was stored to kvmppc_vcore->in_guest. However because
      of this change in the commit:
      
                /* Reset PCR */
                ld      r0, VCORE_PCR(r5)
        -       cmpdi   r0, 0
        +       LOAD_REG_IMMEDIATE(r6, PCR_MASK)
        +       cmpld   r0, r6
                beq     18f
        -       li      r0, 0
        -       mtspr   SPRN_PCR, r0
        +       mtspr   SPRN_PCR, r6
         18:
                /* Signal secondary CPUs to continue */
                stb     r0,VCORE_IN_GUEST(r5)
      
      We are no longer comparing r0 against 0 and loading it with 0 if it
      contains something else. Hence when we store r0 to
      kvmppc_vcore->in_guest, it might not be 0. This means that secondary
      CPUs will not be signalled to continue. Those CPUs get stuck and
      errors like the following are logged:
      
          KVM: CPU 1 seems to be stuck
          KVM: CPU 2 seems to be stuck
          KVM: CPU 3 seems to be stuck
          KVM: CPU 4 seems to be stuck
          KVM: CPU 5 seems to be stuck
          KVM: CPU 6 seems to be stuck
          KVM: CPU 7 seems to be stuck
      
      This can be reproduced with:
          $ for i in `seq 1 7` ; do chcpu -d $i ; done ;
          $ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \
             -M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \
             -kernel vmlinux -initrd initrd.cpio.xz
      
      Fix by making sure r0 is 0 before storing it to
      kvmppc_vcore->in_guest.
      
      Fixes: 13c7bb3c ("powerpc/64s: Set reserved PCR bits")
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
      Reviewed-by: NAlistair Popple <alistair@popple.id.au>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com
      7fe4e117
  6. 21 9月, 2019 1 次提交
    • J
      powerpc/64s: Set reserved PCR bits · 13c7bb3c
      Jordan Niethe 提交于
      Currently the reserved bits of the Processor Compatibility
      Register (PCR) are cleared as per the Programming Note in Section
      1.3.3 of version 3.0B of the Power ISA. This causes all new
      architecture features to be made available when running on newer
      processors with new architecture features added to the PCR as bits
      must be set to disable a given feature.
      
      For example to disable new features added as part of Version 2.07 of
      the ISA the corresponding bit in the PCR needs to be set.
      
      As new processor features generally require explicit kernel support
      they should be disabled until such support is implemented. Therefore
      kernels should set all unknown/reserved bits in the PCR such that any
      new architecture features which the kernel does not currently know
      about get disabled.
      
      An update is planned to the ISA to clarify that the PCR is an
      exception to the Programming Note on reserved bits in Section 1.3.3.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
      Tested-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190917004605.22471-2-alistair@popple.id.au
      13c7bb3c
  7. 30 8月, 2019 1 次提交
  8. 16 8月, 2019 2 次提交
    • P
      KVM: PPC: Book3S HV: Don't push XIVE context when not using XIVE device · 8d4ba9c9
      Paul Mackerras 提交于
      At present, when running a guest on POWER9 using HV KVM but not using
      an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
      is run with the kernel_irqchip=off option, the guest entry code goes
      ahead and tries to load the guest context into the XIVE hardware, even
      though no context has been set up.
      
      To fix this, we check that the "CAM word" is non-zero before pushing
      it to the hardware.  The CAM word is initialized to a non-zero value
      in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
      and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190813100100.GC9567@blackberry
      8d4ba9c9
    • P
      KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts · 959c5d51
      Paul Mackerras 提交于
      Escalation interrupts are interrupts sent to the host by the XIVE
      hardware when it has an interrupt to deliver to a guest VCPU but that
      VCPU is not running anywhere in the system.  Hence we disable the
      escalation interrupt for the VCPU being run when we enter the guest
      and re-enable it when the guest does an H_CEDE hypercall indicating
      it is idle.
      
      It is possible that an escalation interrupt gets generated just as we
      are entering the guest.  In that case the escalation interrupt may be
      using a queue entry in one of the interrupt queues, and that queue
      entry may not have been processed when the guest exits with an H_CEDE.
      The existing entry code detects this situation and does not clear the
      vcpu->arch.xive_esc_on flag as an indication that there is a pending
      queue entry (if the queue entry gets processed, xive_esc_irq() will
      clear the flag).  There is a comment in the code saying that if the
      flag is still set on H_CEDE, we have to abort the cede rather than
      re-enabling the escalation interrupt, lest we end up with two
      occurrences of the escalation interrupt in the interrupt queue.
      
      However, the exit code doesn't do that; it aborts the cede in the sense
      that vcpu->arch.ceded gets cleared, but it still enables the escalation
      interrupt by setting the source's PQ bits to 00.  Instead we need to
      set the PQ bits to 10, indicating that an interrupt has been triggered.
      We also need to avoid setting vcpu->arch.xive_esc_on in this case
      (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
      xive_esc_irq() will run at some point and clear it, and if we race with
      that we may end up with an incorrect result (i.e. xive_esc_on set when
      the escalation interrupt has just been handled).
      
      It is extremely unlikely that having two queue entries would cause
      observable problems; theoretically it could cause queue overflow, but
      the CPU would have to have thousands of interrupts targetted to it for
      that to be possible.  However, this fix will also make it possible to
      determine accurately whether there is an unhandled escalation
      interrupt in the queue, which will be needed by the following patch.
      
      Fixes: 9b9b13a6 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry
      959c5d51
  9. 18 6月, 2019 2 次提交
    • S
      KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode · 84b02824
      Suraj Jitindar Singh 提交于
      The hcall H_SET_DAWR is used by a guest to set the data address
      watchpoint register (DAWR). This hcall is handled in the host in
      kvmppc_h_set_dawr() which can be called in either real mode on the
      guest exit path from hcall_try_real_mode() in book3s_hv_rmhandlers.S,
      or in virtual mode when called from kvmppc_pseries_do_hcall() in
      book3s_hv.c.
      
      The function kvmppc_h_set_dawr() updates the dawr and dawrx fields in
      the vcpu struct accordingly and then also writes the respective values
      into the DAWR and DAWRX registers directly. It is necessary to write
      the registers directly here when calling the function in real mode
      since the path to re-enter the guest won't do this. However when in
      virtual mode the host DAWR and DAWRX values have already been
      restored, and so writing the registers would overwrite these.
      Additionally there is no reason to write the guest values here as
      these will be read from the vcpu struct and written to the registers
      appropriately the next time the vcpu is run.
      
      This also avoids the case when handling h_set_dawr for a nested guest
      where the guest hypervisor isn't able to write the DAWR and DAWRX
      registers directly and must rely on the real hypervisor to do this for
      it when it calls H_ENTER_NESTED.
      
      Fixes: c1fe190c ("powerpc: Add force enable of DAWR on P9 option")
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      84b02824
    • M
      KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr() · fabb2efc
      Michael Neuling 提交于
      Commit c1fe190c ("powerpc: Add force enable of DAWR on P9 option")
      screwed up some assembler and corrupted a pointer in r3. This resulted
      in crashes like the below:
      
        BUG: Kernel NULL pointer dereference at 0x000013bf
        Faulting instruction address: 0xc00000000010b044
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3
        NIP:  c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4
        REGS: c00000179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
        MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42244842  XER: 00000000
        CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0
        GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff
        GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0
        ...
        NIP kvmppc_h_set_dabr+0x50/0x68
        LR  kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv]
        Call Trace:
          0xc0000017f11c0000 (unreliable)
          kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
          kvmppc_vcpu_run+0x34/0x48 [kvm]
          kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
          kvm_vcpu_ioctl+0x460/0x850 [kvm]
          do_vfs_ioctl+0xe4/0xb40
          ksys_ioctl+0xc4/0x110
          sys_ioctl+0x28/0x80
          system_call+0x5c/0x70
        Instruction dump:
        4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff
        4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8 7c942ba6 7cbc2ba6
      
      Fix the bug by only changing r3 when we are returning immediately.
      
      Fixes: c1fe190c ("powerpc: Add force enable of DAWR on P9 option")
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reported-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fabb2efc
  10. 31 5月, 2019 1 次提交
  11. 14 5月, 2019 1 次提交
  12. 30 4月, 2019 5 次提交
    • N
      powerpc/64s: Reimplement book3s idle code in C · 10d91611
      Nicholas Piggin 提交于
      Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
      speific HV idle code to the powernv platform code.
      
      Book3S assembly stubs are kept in common code and used only to save
      the stack frame and non-volatile GPRs before executing architected
      idle instructions, and restoring the stack and reloading GPRs then
      returning to C after waking from idle.
      
      The complex logic dealing with threads and subcores, locking, SPRs,
      HMIs, timebase resync, etc., is all done in C which makes it more
      maintainable.
      
      This is not a strict translation to C code, there are some
      significant differences:
      
      - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
        but saves and restores them itself.
      
      - The optimisation where EC=ESL=0 idle modes did not have to save GPRs
        or change MSR is restored, because it's now simple to do. ESL=1
        sleeps that do not lose GPRs can use this optimization too.
      
      - KVM secondary entry and cede is now more of a call/return style
        rather than branchy. nap_state_lost is not required because KVM
        always returns via NVGPR restoring path.
      
      - KVM secondary wakeup from offline sequence is moved entirely into
        the offline wakeup, which avoids a hwsync in the normal idle wakeup
        path.
      
      Performance measured with context switch ping-pong on different
      threads or cores, is possibly improved a small amount, 1-3% depending
      on stop state and core vs thread test for shallow states. Deep states
      it's in the noise compared with other latencies.
      
      KVM improvements:
      
      - Idle sleepers now always return to caller rather than branch out
        to KVM first.
      
      - This allows optimisations like very fast return to caller when no
        state has been lost.
      
      - KVM no longer requires nap_state_lost because it controls NVGPR
        save/restore itself on the way in and out.
      
      - The heavy idle wakeup KVM request check can be moved out of the
        normal host idle code and into the not-performance-critical offline
        code.
      
      - KVM nap code now returns from where it is called, which makes the
        flow a bit easier to follow.
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Squash the KVM changes in]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10d91611
    • P
      KVM: PPC: Book3S HV: Flush TLB on secondary radix threads · 70ea13f6
      Paul Mackerras 提交于
      When running on POWER9 with kvm_hv.indep_threads_mode = N and the host
      in SMT1 mode, KVM will run guest VCPUs on offline secondary threads.
      If those guests are in radix mode, we fail to load the LPID and flush
      the TLB if necessary, leading to the guest crashing with an
      unsupported MMU fault.  This arises from commit 9a4506e1 ("KVM:
      PPC: Book3S HV: Make radix handle process scoped LPID flush in C,
      with relocation on", 2018-05-17), which didn't consider the case
      where indep_threads_mode = N.
      
      For simplicity, this makes the real-mode guest entry path flush the
      TLB in the same place for both radix and hash guests, as we did before
      9a4506e1, though the code is now C code rather than assembly code.
      We also have the radix TLB flush open-coded rather than calling
      radix__local_flush_tlb_lpid_guest(), because the TLB flush can be
      called in real mode, and in real mode we don't want to invoke the
      tracepoint code.
      
      Fixes: 9a4506e1 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      70ea13f6
    • P
      KVM: PPC: Book3S HV: Move HPT guest TLB flushing to C code · 2940ba0c
      Paul Mackerras 提交于
      This replaces assembler code in book3s_hv_rmhandlers.S that checks
      the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush
      with C code in book3s_hv_builtin.c.  Note that unlike the radix
      version, the hash version doesn't do an explicit ERAT invalidation
      because we will invalidate and load up the SLB before entering the
      guest, and that will invalidate the ERAT.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2940ba0c
    • S
      KVM: PPC: Book3S HV: Handle virtual mode in XIVE VCPU push code · 7ae9bda7
      Suraj Jitindar Singh 提交于
      The code in book3s_hv_rmhandlers.S that pushes the XIVE virtual CPU
      context to the hardware currently assumes it is being called in real
      mode, which is usually true.  There is however a path by which it can
      be executed in virtual mode, in the case where indep_threads_mode = N.
      A virtual CPU executing on an offline secondary thread can take a
      hypervisor interrupt in virtual mode and return from the
      kvmppc_hv_entry() call after the kvm_secondary_got_guest label.
      It is possible for it to be given another vcpu to execute before it
      gets to execute the stop instruction.  In that case it will call
      kvmppc_hv_entry() for the second VCPU in virtual mode, and the XIVE
      vCPU push code will be executed in virtual mode.  The result in that
      case will be a host crash due to an unexpected data storage interrupt
      caused by executing the stdcix instruction in virtual mode.
      
      This fixes it by adding a code path for virtual mode, which uses the
      virtual TIMA pointer and normal load/store instructions.
      
      [paulus@ozlabs.org - wrote patch description]
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7ae9bda7
    • S
      KVM: PPC: Book3S HV: Implement real mode H_PAGE_INIT handler · eadfb1c5
      Suraj Jitindar Singh 提交于
      Implement a real mode handler for the H_CALL H_PAGE_INIT which can be
      used to zero or copy a guest page. The page is defined to be 4k and must
      be 4k aligned.
      
      The in-kernel real mode handler halves the time to handle this H_CALL
      compared to handling it in userspace for a hash guest.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      eadfb1c5
  13. 20 4月, 2019 1 次提交
    • M
      powerpc: Add force enable of DAWR on P9 option · c1fe190c
      Michael Neuling 提交于
      This adds a flag so that the DAWR can be enabled on P9 via:
        echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous
      
      The DAWR was previously force disabled on POWER9 in:
        96541531 powerpc: Disable DAWR in the base POWER9 CPU features
      Also see Documentation/powerpc/DAWR-POWER9.txt
      
      This is a dangerous setting, USE AT YOUR OWN RISK.
      
      Some users may not care about a bad user crashing their box
      (ie. single user/desktop systems) and really want the DAWR.  This
      allows them to force enable DAWR.
      
      This flag can also be used to disable DAWR access. Once this is
      cleared, all DAWR access should be cleared immediately and your
      machine once again safe from crashing.
      
      Userspace may get confused by toggling this. If DAWR is force
      enabled/disabled between getting the number of breakpoints (via
      PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
      inconsistent view of what's available. Similarly for guests.
      
      For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
      enabled in the host AND the guest. For this reason, this won't work on
      POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
      dawr_enable_dangerous file will fail if the hypervisor doesn't support
      writing the DAWR.
      
      To double check the DAWR is working, run this kernel selftest:
        tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
      Any errors/failures/skips mean something is wrong.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1fe190c
  14. 22 2月, 2019 2 次提交
    • M
      powerpc/kvm: Save and restore host AMR/IAMR/UAMOR · c3c7470c
      Michael Ellerman 提交于
      When the hash MMU is active the AMR, IAMR and UAMOR are used for
      pkeys. The AMR is directly writable by user space, and the UAMOR masks
      those writes, meaning both registers are effectively user register
      state. The IAMR is used to create an execute only key.
      
      Also we must maintain the value of at least the AMR when running in
      process context, so that any memory accesses done by the kernel on
      behalf of the process are correctly controlled by the AMR.
      
      Although we are correctly switching all registers when going into a
      guest, on returning to the host we just write 0 into all regs, except
      on Power9 where we restore the IAMR correctly.
      
      This could be observed by a user process if it writes the AMR, then
      runs a guest and we then return immediately to it without
      rescheduling. Because we have written 0 to the AMR that would have the
      effect of granting read/write permission to pages that the process was
      trying to protect.
      
      In addition, when using the Radix MMU, the AMR can prevent inadvertent
      kernel access to userspace data, writing 0 to the AMR disables that
      protection.
      
      So save and restore AMR, IAMR and UAMOR.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      c3c7470c
    • J
      KVM: PPC: Book3S HV: Fix build failure without IOMMU support · e40542af
      Jordan Niethe 提交于
      Currently trying to build without IOMMU support will fail:
      
        (.text+0x1380): undefined reference to `kvmppc_h_get_tce'
        (.text+0x1384): undefined reference to `kvmppc_rm_h_put_tce'
        (.text+0x149c): undefined reference to `kvmppc_rm_h_stuff_tce'
        (.text+0x14a0): undefined reference to `kvmppc_rm_h_put_tce_indirect'
      
      This happens because turning off IOMMU support will prevent
      book3s_64_vio_hv.c from being built because it is only built when
      SPAPR_TCE_IOMMU is set, which depends on IOMMU support.
      
      Fix it using ifdefs for the undefined references.
      
      Fixes: 76d837a4 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms")
      Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e40542af
  15. 21 2月, 2019 1 次提交
    • P
      KVM: PPC: Book3S HV: Simplify machine check handling · 884dfb72
      Paul Mackerras 提交于
      This makes the handling of machine check interrupts that occur inside
      a guest simpler and more robust, with less done in assembler code and
      in real mode.
      
      Now, when a machine check occurs inside a guest, we always get the
      machine check event struct and put a copy in the vcpu struct for the
      vcpu where the machine check occurred.  We no longer call
      machine_check_queue_event() from kvmppc_realmode_mc_power7(), because
      on POWER8, when a vcpu is running on an offline secondary thread and
      we call machine_check_queue_event(), that calls irq_work_queue(),
      which doesn't work because the CPU is offline, but instead triggers
      the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which
      fires again and again because nothing clears the condition).
      
      All that machine_check_queue_event() actually does is to cause the
      event to be printed to the console.  For a machine check occurring in
      the guest, we now print the event in kvmppc_handle_exit_hv()
      instead.
      
      The assembly code at label machine_check_realmode now just calls C
      code and then continues exiting the guest.  We no longer either
      synthesize a machine check for the guest in assembly code or return
      to the guest without a machine check.
      
      The code in kvmppc_handle_exit_hv() is extended to handle the case
      where the guest is not FWNMI-capable.  In that case we now always
      synthesize a machine check interrupt for the guest.  Previously, if
      the host thinks it has recovered the machine check fully, it would
      return to the guest without any notification that the machine check
      had occurred.  If the machine check was caused by some action of the
      guest (such as creating duplicate SLB entries), it is much better to
      tell the guest that it has caused a problem.  Therefore we now always
      generate a machine check interrupt for guests that are not
      FWNMI-capable.
      Reviewed-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      884dfb72
  16. 09 10月, 2018 9 次提交
    • P
      KVM: PPC: Book3S HV: Handle hypercalls correctly when nested · 4bad7779
      Paul Mackerras 提交于
      When we are running as a nested hypervisor, we use a hypercall to
      enter the guest rather than code in book3s_hv_rmhandlers.S.  This means
      that the hypercall handlers listed in hcall_real_table never get called.
      There are some hypercalls that are handled there and not in
      kvmppc_pseries_do_hcall(), which therefore won't get processed for
      a nested guest.
      
      To fix this, we add cases to kvmppc_pseries_do_hcall() to handle those
      hypercalls, with the following exceptions:
      
      - The HPT hypercalls (H_ENTER, H_REMOVE, etc.) are not handled because
        we only support radix mode for nested guests.
      
      - H_CEDE has to be handled specially because the cede logic in
        kvmhv_run_single_vcpu assumes that it has been processed by the time
        that kvmhv_p9_guest_entry() returns.  Therefore we put a special
        case for H_CEDE in kvmhv_p9_guest_entry().
      
      For the XICS hypercalls, if real-mode processing is enabled, then the
      virtual-mode handlers assume that they are being called only to finish
      up the operation.  Therefore we turn off the real-mode flag in the XICS
      code when running as a nested hypervisor.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4bad7779
    • P
      KVM: PPC: Book3S HV: Nested guest entry via hypercall · 360cae31
      Paul Mackerras 提交于
      This adds a new hypercall, H_ENTER_NESTED, which is used by a nested
      hypervisor to enter one of its nested guests.  The hypercall supplies
      register values in two structs.  Those values are copied by the level 0
      (L0) hypervisor (the one which is running in hypervisor mode) into the
      vcpu struct of the L1 guest, and then the guest is run until an
      interrupt or error occurs which needs to be reported to L1 via the
      hypercall return value.
      
      Currently this assumes that the L0 and L1 hypervisors are the same
      endianness, and the structs passed as arguments are in native
      endianness.  If they are of different endianness, the version number
      check will fail and the hcall will be rejected.
      
      Nested hypervisors do not support indep_threads_mode=N, so this adds
      code to print a warning message if the administrator has set
      indep_threads_mode=N, and treat it as Y.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      360cae31
    • P
      KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct · fd0944ba
      Paul Mackerras 提交于
      When the 'regs' field was added to struct kvm_vcpu_arch, the code
      was changed to use several of the fields inside regs (e.g., gpr, lr,
      etc.) but not the ccr field, because the ccr field in struct pt_regs
      is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is
      only 32 bits.  This changes the code to use the regs.ccr field
      instead of cr, and changes the assembly code on 64-bit platforms to
      use 64-bit loads and stores instead of 32-bit ones.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fd0944ba
    • P
      KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests · 95a6432c
      Paul Mackerras 提交于
      This creates an alternative guest entry/exit path which is used for
      radix guests on POWER9 systems when we have indep_threads_mode=Y.  In
      these circumstances there is exactly one vcpu per vcore and there is
      no coordination required between vcpus or vcores; the vcpu can enter
      the guest without needing to synchronize with anything else.
      
      The new fast path is implemented almost entirely in C in book3s_hv.c
      and runs with the MMU on until the guest is entered.  On guest exit
      we use the existing path until the point where we are committed to
      exiting the guest (as distinct from handling an interrupt in the
      low-level code and returning to the guest) and we have pulled the
      guest context from the XIVE.  At that point we check a flag in the
      stack frame to see whether we came in via the old path and the new
      path; if we came in via the new path then we go back to C code to do
      the rest of the process of saving the guest context and restoring the
      host context.
      
      The C code is split into separate functions for handling the
      OS-accessible state and the hypervisor state, with the idea that the
      latter can be replaced by a hypercall when we implement nested
      virtualization.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [mpe: Fix CONFIG_ALTIVEC=n build]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      95a6432c
    • P
      KVM: PPC: Book3S: Rework TM save/restore code and make it C-callable · 7854f754
      Paul Mackerras 提交于
      This adds a parameter to __kvmppc_save_tm and __kvmppc_restore_tm
      which allows the caller to indicate whether it wants the nonvolatile
      register state to be preserved across the call, as required by the C
      calling conventions.  This parameter being non-zero also causes the
      MSR bits that enable TM, FP, VMX and VSX to be preserved.  The
      condition register and DSCR are now always preserved.
      
      With this, kvmppc_save_tm_hv and kvmppc_restore_tm_hv can be called
      from C code provided the 3rd parameter is non-zero.  So that these
      functions can be called from modules, they now include code to set
      the TOC pointer (r2) on entry, as they can call other built-in C
      functions which will assume the TOC to have been set.
      
      Also, the fake suspend code in kvmppc_save_tm_hv is modified here to
      assume that treclaim in fake-suspend state does not modify any registers,
      which is the case on POWER9.  This enables the code to be simplified
      quite a bit.
      
      _kvmppc_save_tm_pr and _kvmppc_restore_tm_pr become much simpler with
      this change, since they now only need to save and restore TAR and pass
      1 for the 3rd argument to __kvmppc_{save,restore}_tm.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7854f754
    • P
      KVM: PPC: Book3S HV: Simplify real-mode interrupt handling · df709a29
      Paul Mackerras 提交于
      This streamlines the first part of the code that handles a hypervisor
      interrupt that occurred in the guest.  With this, all of the real-mode
      handling that occurs is done before the "guest_exit_cont" label; once
      we get to that label we are committed to exiting to host virtual mode.
      Thus the machine check and HMI real-mode handling is moved before that
      label.
      
      Also, the code to handle external interrupts is moved out of line, as
      is the code that calls kvmppc_realmode_hmi_handler().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      df709a29
    • P
      KVM: PPC: Book3S HV: Extract PMU save/restore operations as C-callable functions · 41f4e631
      Paul Mackerras 提交于
      This pulls out the assembler code that is responsible for saving and
      restoring the PMU state for the host and guest into separate functions
      so they can be used from an alternate entry path.  The calling
      convention is made compatible with C.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      41f4e631
    • P
      KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C code · f7035ce9
      Paul Mackerras 提交于
      This is based on a patch by Suraj Jitindar Singh.
      
      This moves the code in book3s_hv_rmhandlers.S that generates an
      external, decrementer or privileged doorbell interrupt just before
      entering the guest to C code in book3s_hv_builtin.c.  This is to
      make future maintenance and modification easier.  The algorithm
      expressed in the C code is almost identical to the previous
      algorithm.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f7035ce9
    • P
      KVM: PPC: Book3S: Simplify external interrupt handling · d24ea8a7
      Paul Mackerras 提交于
      Currently we use two bits in the vcpu pending_exceptions bitmap to
      indicate that an external interrupt is pending for the guest, one
      for "one-shot" interrupts that are cleared when delivered, and one
      for interrupts that persist until cleared by an explicit action of
      the OS (e.g. an acknowledge to an interrupt controller).  The
      BOOK3S_IRQPRIO_EXTERNAL bit is used for one-shot interrupt requests
      and BOOK3S_IRQPRIO_EXTERNAL_LEVEL is used for persisting interrupts.
      
      In practice BOOK3S_IRQPRIO_EXTERNAL never gets used, because our
      Book3S platforms generally, and pseries in particular, expect
      external interrupt requests to persist until they are acknowledged
      at the interrupt controller.  That combined with the confusion
      introduced by having two bits for what is essentially the same thing
      makes it attractive to simplify things by only using one bit.  This
      patch does that.
      
      With this patch there is only BOOK3S_IRQPRIO_EXTERNAL, and by default
      it has the semantics of a persisting interrupt.  In order to avoid
      breaking the ABI, we introduce a new "external_oneshot" flag which
      preserves the behaviour of the KVM_INTERRUPT ioctl with the
      KVM_INTERRUPT_SET argument.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d24ea8a7
  17. 30 7月, 2018 2 次提交
  18. 16 7月, 2018 1 次提交
  19. 01 6月, 2018 2 次提交
    • S
      KVM: PPC: Book3S PR: Add C function wrapper for _kvmppc_save/restore_tm() · caa3be92
      Simon Guo 提交于
      Currently __kvmppc_save/restore_tm() APIs can only be invoked from
      assembly function. This patch adds C function wrappers for them so
      that they can be safely called from C function.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      caa3be92
    • S
      KVM: PPC: Book3S PR: Add guest MSR parameter for kvmppc_save_tm()/kvmppc_restore_tm() · 6f597c6b
      Simon Guo 提交于
      HV KVM and PR KVM need different MSR source to indicate whether
      treclaim. or trecheckpoint. is necessary.
      
      This patch add new parameter (guest MSR) for these kvmppc_save_tm/
      kvmppc_restore_tm() APIs:
      - For HV KVM, it is VCPU_MSR
      - For PR KVM, it is current host MSR or VCPU_SHADOW_SRR1
      
      This enhancement enables these 2 APIs to be reused by PR KVM later.
      And the patch keeps HV KVM logic unchanged.
      
      This patch also reworks kvmppc_save_tm()/kvmppc_restore_tm() to
      have a clean ABI: r3 for vcpu and r4 for guest_msr.
      
      During kvmppc_save_tm/kvmppc_restore_tm(), the R1 need to be saved
      or restored. Currently the R1 is saved into HSTATE_HOST_R1. In PR
      KVM, we are going to add a C function wrapper for
      kvmppc_save_tm/kvmppc_restore_tm() where the R1 will be incremented
      with added stackframe and save into HSTATE_HOST_R1. There are several
      places in HV KVM to load HSTATE_HOST_R1 as R1, and we don't want to
      bring risk or confusion by TM code.
      
      This patch will use HSTATE_SCRATCH2 to save/restore R1 in
      kvmppc_save_tm/kvmppc_restore_tm() to avoid future confusion, since
      the r1 is actually a temporary/scratch value to be saved/stored.
      
      [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV:
       Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)]
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6f597c6b
  20. 31 5月, 2018 2 次提交
    • S
      KVM: PPC: Book3S PR: Move kvmppc_save_tm/kvmppc_restore_tm to separate file · 009c872a
      Simon Guo 提交于
      It is a simple patch just for moving kvmppc_save_tm/kvmppc_restore_tm()
      functionalities to tm.S. There is no logic change. The reconstruct of
      those APIs will be done in later patches to improve readability.
      
      It is for preparation of reusing those APIs on both HV/PR PPC KVM.
      
      Some slight change during move the functions includes:
      - surrounds some HV KVM specific code with CONFIG_KVM_BOOK3S_HV_POSSIBLE
      for compilation.
      - use _GLOBAL() to define kvmppc_save_tm/kvmppc_restore_tm()
      
      [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV:
       Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)]
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      009c872a
    • P
      KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm · 7b0e827c
      Paul Mackerras 提交于
      This splits out the handling of "fake suspend" mode, part of the
      hypervisor TM assist code for POWER9, and puts almost all of it in
      new kvmppc_save_tm_hv and kvmppc_restore_tm_hv functions.  The new
      functions branch to kvmppc_save/restore_tm if the CPU does not
      require hypervisor TM assistance.
      
      With this, it will be more straightforward to move kvmppc_save_tm and
      kvmppc_restore_tm to another file and use them for transactional
      memory support in PR KVM.  Additionally, it also makes the code a
      bit clearer and reduces the number of feature sections.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7b0e827c
  21. 18 5月, 2018 2 次提交