1. 24 11月, 2016 12 次提交
    • P
      KVM: PPC: Book3S HV: Fix compilation with unusual configurations · e2702871
      Paul Mackerras 提交于
      This adds the "again" parameter to the dummy version of
      kvmppc_check_passthru(), so that it matches the real version.
      This fixes compilation with CONFIG_BOOK3S_64_HV set but
      CONFIG_KVM_XICS=n.
      
      This includes asm/smp.h in book3s_hv_builtin.c to fix compilation
      with CONFIG_SMP=n.  The explicit inclusion is necessary to provide
      definitions of hard_smp_processor_id() and get_hard_smp_processor_id()
      in UP configs.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e2702871
    • S
      KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00 · 2ee13be3
      Suraj Jitindar Singh 提交于
      The function kvmppc_set_arch_compat() is used to determine the value of the
      processor compatibility register (PCR) for a guest running in a given
      compatibility mode. There is currently no support for v3.00 of the ISA.
      
      Add support for v3.00 of the ISA which adds an ISA v2.07 compatilibity mode
      to the PCR.
      
      We also add a check to ensure the processor we are running on is capable of
      emulating the chosen processor (for example a POWER7 cannot emulate a
      POWER8, similarly with a POWER8 and a POWER9).
      
      Based on work by: Paul Mackerras <paulus@ozlabs.org>
      
      [paulus@ozlabs.org - moved dummy PCR_ARCH_300 definition here; set
       guest_pcr_bit when arch_compat == 0, added comment.]
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2ee13be3
    • P
      KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores · 45c940ba
      Paul Mackerras 提交于
      With POWER9, each CPU thread has its own MMU context and can be
      in the host or a guest independently of the other threads; there is
      still however a restriction that all threads must use the same type
      of address translation, either radix tree or hashed page table (HPT).
      
      Since we only support HPT guests on a HPT host at this point, we
      can treat the threads as being independent, and avoid all of the
      work of coordinating the CPU threads.  To make this simpler, we
      introduce a new threads_per_vcore() function that returns 1 on
      POWER9 and threads_per_subcore on POWER7/8, and use that instead
      of threads_per_subcore or threads_per_core in various places.
      
      This also changes the value of the KVM_CAP_PPC_SMT capability on
      POWER9 systems from 4 to 1, so that userspace will not try to
      create VMs with multiple vcpus per vcore.  (If userspace did create
      a VM that thought it was in an SMT mode, the VM might try to use
      the msgsndp instruction, which will not work as expected.  In
      future it may be possible to trap and emulate msgsndp in order to
      allow VMs to think they are in an SMT mode, if only for the purpose
      of allowing migration from POWER8 systems.)
      
      With all this, we can now run guests on POWER9 as long as the host
      is running with HPT translation.  Since userspace currently has no
      way to request radix tree translation for the guest, the guest has
      no choice but to use HPT translation.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      45c940ba
    • P
      KVM: PPC: Book3S HV: Enable hypervisor virtualization interrupts while in guest · 84f7139c
      Paul Mackerras 提交于
      The new XIVE interrupt controller on POWER9 can direct external
      interrupts to the hypervisor or the guest.  The interrupts directed to
      the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and
      come in as a "hypervisor virtualization interrupt".  This sets the
      LPCR bit so that hypervisor virtualization interrupts can occur while
      we are in the guest.  We then also need to cope with exiting the guest
      because of a hypervisor virtualization interrupt.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      84f7139c
    • P
      KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9 · bf53c88e
      Paul Mackerras 提交于
      POWER9 replaces the various power-saving mode instructions on POWER8
      (doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus
      a register, PSSCR, which controls the depth of the power-saving mode.
      This replaces the use of the nap instruction when threads are idle
      during guest execution with the stop instruction, and adds code to
      set PSSCR to a value which will allow an SMT mode switch while the
      thread is idle (given that the core as a whole won't be idle in these
      cases).
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bf53c88e
    • P
      KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9 · f725758b
      Paul Mackerras 提交于
      POWER9 includes a new interrupt controller, called XIVE, which is
      quite different from the XICS interrupt controller on POWER7 and
      POWER8 machines.  KVM-HV accesses the XICS directly in several places
      in order to send and clear IPIs and handle interrupts from PCI
      devices being passed through to the guest.
      
      In order to make the transition to XIVE easier, OPAL firmware will
      include an emulation of XICS on top of XIVE.  Access to the emulated
      XICS is via OPAL calls.  The one complication is that the EOI
      (end-of-interrupt) function can now return a value indicating that
      another interrupt is pending; in this case, the XIVE will not signal
      an interrupt in hardware to the CPU, and software is supposed to
      acknowledge the new interrupt without waiting for another interrupt
      to be delivered in hardware.
      
      This adapts KVM-HV to use the OPAL calls on machines where there is
      no XICS hardware.  When there is no XICS, we look for a device-tree
      node with "ibm,opal-intc" in its compatible property, which is how
      OPAL indicates that it provides XICS emulation.
      
      In order to handle the EOI return value, kvmppc_read_intr() has
      become kvmppc_read_one_intr(), with a boolean variable passed by
      reference which can be set by the EOI functions to indicate that
      another interrupt is pending.  The new kvmppc_read_intr() keeps
      calling kvmppc_read_one_intr() until there are no more interrupts
      to process.  The return value from kvmppc_read_intr() is the
      largest non-zero value of the returns from kvmppc_read_one_intr().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f725758b
    • P
      KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9 · 1704a81c
      Paul Mackerras 提交于
      On POWER9, the msgsnd instruction is able to send interrupts to
      other cores, as well as other threads on the local core.  Since
      msgsnd is generally simpler and faster than sending an IPI via the
      XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1704a81c
    • P
      KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 · 7c5b06ca
      Paul Mackerras 提交于
      POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
      and tlbiel (local tlbie) instructions.  Both instructions get a
      set of new parameters (RIC, PRS and R) which appear as bits in the
      instruction word.  The tlbiel instruction now has a second register
      operand, which contains a PID and/or LPID value if needed, and
      should otherwise contain 0.
      
      This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
      as well as older processors.  Since we only handle HPT guests so
      far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
      word as on previous processors, so we don't need to conditionally
      execute different instructions depending on the processor.
      
      The local flush on first entry to a guest in book3s_hv_rmhandlers.S
      is a loop which depends on the number of TLB sets.  Rather than
      using feature sections to set the number of iterations based on
      which CPU we're on, we now work out this number at VM creation time
      and store it in the kvm_arch struct.  That will make it possible to
      get the number from the device tree in future, which will help with
      compatibility with future processors.
      
      Since mmu_partition_table_set_entry() does a global flush of the
      whole LPID, we don't need to do the TLB flush on first entry to the
      guest on each processor.  Therefore we don't set all bits in the
      tlb_need_flush bitmap on VM startup on POWER9.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7c5b06ca
    • P
      KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08
      Paul Mackerras 提交于
      This adds code to handle two new guest-accessible special-purpose
      registers on POWER9: TIDR (thread ID register) and PSSCR (processor
      stop status and control register).  They are context-switched
      between host and guest, and the guest values can be read and set
      via the one_reg interface.
      
      The PSSCR contains some fields which are guest-accessible and some
      which are only accessible in hypervisor mode.  We only allow the
      guest-accessible fields to be read or set by userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e9cf1e08
    • P
      KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9 · 83677f55
      Paul Mackerras 提交于
      Some special-purpose registers that were present and accessible
      by guests on POWER8 no longer exist on POWER9, so this adds
      feature sections to ensure that we don't try to context-switch
      them when going into or out of a guest on POWER9.  These are
      all relatively obscure, rarely-used registers, but we had to
      context-switch them on POWER8 to avoid creating a covert channel.
      They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      83677f55
    • P
      KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9 · 7a84084c
      Paul Mackerras 提交于
      On POWER9, the SDR1 register (hashed page table base address) is no
      longer used, and instead the hardware reads the HPT base address
      and size from the partition table.  The partition table entry also
      contains the bits that specify the page size for the VRMA mapping,
      which were previously in the LPCR.  The VPM0 bit of the LPCR is
      now reserved; the processor now always uses the VRMA (virtual
      real-mode area) mechanism for guest real-mode accesses in HPT mode,
      and the RMO (real-mode offset) mechanism has been dropped.
      
      When entering or exiting the guest, we now only have to set the
      LPIDR (logical partition ID register), not the SDR1 register.
      There is also no requirement now to transition via a reserved
      LPID value.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7a84084c
    • P
      KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9 · abb7c7dd
      Paul Mackerras 提交于
      This adapts the KVM-HV hashed page table (HPT) code to read and write
      HPT entries in the new format defined in Power ISA v3.00 on POWER9
      machines.  The new format moves the B (segment size) field from the
      first doubleword to the second, and trims some bits from the AVA
      (abbreviated virtual address) and ARPN (abbreviated real page number)
      fields.  As far as possible, the conversion is done when reading or
      writing the HPT entries, and the rest of the code continues to use
      the old format.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      abb7c7dd
  2. 21 11月, 2016 8 次提交
  3. 22 10月, 2016 1 次提交
    • M
      KVM: PPC: Book3S HV: Fix build error when SMP=n · 62623d5f
      Michael Ellerman 提交于
      Commit 5d375199 ("KVM: PPC: Book3S HV: Set server for passed-through
      interrupts") broke the SMP=n build:
      
        arch/powerpc/kvm/book3s_hv_rm_xics.c:758:2: error: implicit declaration of function 'get_hard_smp_processor_id'
      
      That is because we lost the implicit include of asm/smp.h, so include it
      explicitly to get the definition for get_hard_smp_processor_id().
      
      Fixes: 5d375199 ("KVM: PPC: Book3S HV: Set server for passed-through interrupts")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      62623d5f
  4. 27 9月, 2016 5 次提交
    • T
      KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register · fa73c3b2
      Thomas Huth 提交于
      The MMCR2 register is available twice, one time with number 785
      (privileged access), and one time with number 769 (unprivileged,
      but it can be disabled completely). In former times, the Linux
      kernel was using the unprivileged register 769 only, but since
      commit 8dd75ccb ("powerpc: Use privileged SPR number
      for MMCR2"), it uses the privileged register 785 instead.
      The KVM-PR code then of course also switched to use the SPR 785,
      but this is causing older guest kernels to crash, since these
      kernels still access 769 instead. So to support older kernels
      with KVM-PR again, we have to support register 769 in KVM-PR, too.
      
      Fixes: 8dd75ccb
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      fa73c3b2
    • T
      KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL · 2365f6b6
      Thomas Huth 提交于
      On POWER8E and POWER8NVL, KVM-PR does not announce support for
      64kB page sizes and 1TB segments yet. Looks like this has just
      been forgotton so far, since there is no reason why this should
      be different to the normal POWER8 CPUs.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2365f6b6
    • D
      KVM: PPC: BookE: Fix a sanity check · ac0e89bb
      Dan Carpenter 提交于
      We use logical negate where bitwise negate was intended.  It means that
      we never return -EINVAL here.
      
      Fixes: ce11e48b ('KVM: PPC: E500: Add userspace debug stub support')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ac0e89bb
    • P
      KVM: PPC: Book3S HV: Take out virtual core piggybacking code · b009031f
      Paul Mackerras 提交于
      This takes out the code that arranges to run two (or more) virtual
      cores on a single subcore when possible, that is, when both vcores
      are from the same VM, the VM is configured with one CPU thread per
      virtual core, and all the per-subcore registers have the same value
      in each vcore.  Since the VTB (virtual timebase) is a per-subcore
      register, and will almost always differ between vcores, this code
      is disabled on POWER8 machines, meaning that it is only usable on
      POWER7 machines (which don't have VTB).  Given the tiny number of
      POWER7 machines which have firmware that allows them to run HV KVM,
      the benefit of simplifying the code outweighs the loss of this
      feature on POWER7 machines.
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      b009031f
    • P
      KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9
      Paul Mackerras 提交于
      POWER8 has one virtual timebase (VTB) register per subcore, not one
      per CPU thread.  The HV KVM code currently treats VTB as a per-thread
      register, which can lead to spurious soft lockup messages from guests
      which use the VTB as the time source for the soft lockup detector.
      (CPUs before POWER8 did not have the VTB register.)
      
      For HV KVM, this fixes the problem by making only the primary thread
      in each virtual core save and restore the VTB value.  With this,
      the VTB state becomes part of the kvmppc_vcore structure.  This
      also means that "piggybacking" of multiple virtual cores onto one
      subcore is not possible on POWER8, because then the virtual cores
      would share a single VTB register.
      
      PR KVM emulates a VTB register, which is per-vcpu because PR KVM
      has no notion of CPU threads or SMT.  For PR KVM we move the VTB
      state into the kvmppc_vcpu_book3s struct.
      
      Cc: stable@vger.kernel.org # v3.14+
      Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      88b02cf9
  5. 16 9月, 2016 1 次提交
  6. 13 9月, 2016 1 次提交
  7. 12 9月, 2016 12 次提交
    • M
      KVM: PPC: e500: Use kmalloc_array() in kvmppc_e500_tlb_init() · 90235dc1
      Markus Elfring 提交于
      * A multiplication for the size determination of a memory allocation
        indicated that an array data structure should be processed.
        Thus use the corresponding function "kmalloc_array".
      
      * Replace the specification of a data structure by a pointer dereference
        to make the corresponding size determination a bit safer according to
        the Linux coding style convention.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      90235dc1
    • M
      KVM: PPC: e500: Replace kzalloc() calls by kcalloc() in two functions · b0ac477b
      Markus Elfring 提交于
      * A multiplication for the size determination of a memory allocation
        indicated that an array data structure should be processed.
        Thus use the corresponding function "kcalloc".
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      
        This issue was detected also by using the Coccinelle software.
      
      * Replace the specification of data structures by pointer dereferences
        to make the corresponding size determination a bit safer according to
        the Linux coding style convention.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      b0ac477b
    • M
      KVM: PPC: e500: Delete an unnecessary initialisation in kvm_vcpu_ioctl_config_tlb() · cfb60813
      Markus Elfring 提交于
      The local variable "g2h_bitmap" will be set to an appropriate value
      a bit later. Thus omit the explicit initialisation at the beginning.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      cfb60813
    • M
      KVM: PPC: e500: Less function calls in kvm_vcpu_ioctl_config_tlb() after error detection · 46d4e747
      Markus Elfring 提交于
      The kfree() function was called in two cases by the
      kvm_vcpu_ioctl_config_tlb() function during error handling
      even if the passed data structure element contained a null pointer.
      
      * Split a condition check for memory allocation failures.
      
      * Adjust jump targets according to the Linux coding style convention.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      46d4e747
    • M
      KVM: PPC: e500: Use kmalloc_array() in kvm_vcpu_ioctl_config_tlb() · f3c0ce86
      Markus Elfring 提交于
      * A multiplication for the size determination of a memory allocation
        indicated that an array data structure should be processed.
        Thus use the corresponding function "kmalloc_array".
      
        This issue was detected by using the Coccinelle software.
      
      * Replace the specification of a data type by a pointer dereference
        to make the corresponding size determination a bit safer according to
        the Linux coding style convention.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f3c0ce86
    • S
      KVM: PPC: Book3S HV: Counters for passthrough IRQ stats · 65e7026a
      Suresh Warrier 提交于
      Add VCPU stat counters to track affinity for passthrough
      interrupts.
      
      pthru_all: Counts all passthrough interrupts whose IRQ mappings are
                 in the kvmppc_passthru_irq_map structure.
      pthru_host: Counts all cached passthrough interrupts that were injected
      	    from the host through kvm_set_irq (i.e. not handled in
      	    real mode).
      pthru_bad_aff: Counts how many cached passthrough interrupts have
                     bad affinity (receiving CPU is not running VCPU that is
      	       the target of the virtual interrupt in the guest).
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      65e7026a
    • P
      KVM: PPC: Book3S HV: Set server for passed-through interrupts · 5d375199
      Paul Mackerras 提交于
      When a guest has a PCI pass-through device with an interrupt, it
      will direct the interrupt to a particular guest VCPU.  In fact the
      physical interrupt might arrive on any CPU, and then get
      delivered to the target VCPU in the emulated XICS (guest interrupt
      controller), and eventually delivered to the target VCPU.
      
      Now that we have code to handle device interrupts in real mode
      without exiting to the host kernel, there is an advantage to having
      the device interrupt arrive on the same sub(core) as the target
      VCPU is running on.  In this situation, the interrupt can be
      delivered to the target VCPU without any exit to the host kernel
      (using a hypervisor doorbell interrupt between threads if
      necessary).
      
      This patch aims to get passed-through device interrupts arriving
      on the correct core by setting the interrupt server in the real
      hardware XICS for the interrupt to the first thread in the (sub)core
      where its target VCPU is running.  We do this in the real-mode H_EOI
      code because the H_EOI handler already needs to look at the
      emulated ICS state for the interrupt (whereas the H_XIRR handler
      doesn't), and we know we are running in the target VCPU context
      at that point.
      
      We set the server CPU in hardware using an OPAL call, regardless of
      what the IRQ affinity mask for the interrupt says, and without
      updating the affinity mask.  This amounts to saying that when an
      interrupt is passed through to a guest, as a matter of policy we
      allow the guest's affinity for the interrupt to override the host's.
      
      This is inspired by an earlier patch from Suresh Warrier, although
      none of this code came from that earlier patch.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5d375199
    • S
      KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode · 366274f5
      Suresh Warrier 提交于
      When a passthrough IRQ is handled completely within KVM real
      mode code, it has to also update the IRQ stats since this
      does not go through the generic IRQ handling code.
      
      However, the per CPU kstat_irqs field is an allocated (not static)
      field and so cannot be directly accessed in real mode safely.
      
      The function this_cpu_inc_rm() is introduced to safely increment
      per CPU fields (currently coded for unsigned integers only) that
      are allocated and could thus be vmalloced also.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      366274f5
    • S
      KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass · 644abbb2
      Suresh Warrier 提交于
      Add a  module parameter kvm_irq_bypass for kvm_hv.ko to
      disable IRQ bypass for passthrough interrupts. The default
      value of this tunable is 1 - that is enable the feature.
      
      Since the tunable is used by built-in kernel code, we use
      the module_param_cb macro to achieve this.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      644abbb2
    • S
      KVM: PPC: Book3S HV: Dump irqmap in debugfs · af893c7d
      Suresh Warrier 提交于
      Dump the passthrough irqmap structure associated with a
      guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      af893c7d
    • S
      KVM: PPC: Book3S HV: Complete passthrough interrupt in host · f7af5209
      Suresh Warrier 提交于
      In existing real mode ICP code, when updating the virtual ICP
      state, if there is a required action that cannot be completely
      handled in real mode, as for instance, a VCPU needs to be woken
      up, flags are set in the ICP to indicate the required action.
      This is checked when returning from hypercalls to decide whether
      the call needs switch back to the host where the action can be
      performed in virtual mode. Note that if h_ipi_redirect is enabled,
      real mode code will first try to message a free host CPU to
      complete this job instead of returning the host to do it ourselves.
      
      Currently, the real mode PCI passthrough interrupt handling code
      checks if any of these flags are set and simply returns to the host.
      This is not good enough as the trap value (0x500) is treated as an
      external interrupt by the host code. It is only when the trap value
      is a hypercall that the host code searches for and acts on unfinished
      work by calling kvmppc_xics_rm_complete.
      
      This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
      which is returned by KVM if there is unfinished business to be
      completed in host virtual mode after handling a PCI passthrough
      interrupt. The host checks for this special interrupt condition
      and calls into the kvmppc_xics_rm_complete, which is made an
      exported function for this reason.
      
      [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
       in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f7af5209
    • S
      KVM: PPC: Book3S HV: Handle passthrough interrupts in guest · e3c13e56
      Suresh Warrier 提交于
      Currently, KVM switches back to the host to handle any external
      interrupt (when the interrupt is received while running in the
      guest). This patch updates real-mode KVM to check if an interrupt
      is generated by a passthrough adapter that is owned by this guest.
      If so, the real mode KVM will directly inject the corresponding
      virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
      in hardware. In short, the interrupt is handled entirely in real
      mode in the guest context without switching back to the host.
      
      In some rare cases, the interrupt cannot be completely handled in
      real mode, for instance, a VCPU that is sleeping needs to be woken
      up. In this case, KVM simply switches back to the host with trap
      reason set to 0x500. This works, but it is clearly not very efficient.
      A following patch will distinguish this case and handle it
      correctly in the host. Note that we can use the existing
      check_too_hard() routine even though we are not in a hypercall to
      determine if there is unfinished business that needs to be
      completed in host virtual mode.
      
      The patch assumes that the mapping between hardware interrupt IRQ
      and virtual IRQ to be injected to the guest already exists for the
      PCI passthrough interrupts that need to be handled in real mode.
      If the mapping does not exist, KVM falls back to the default
      existing behavior.
      
      The KVM real mode code reads mappings from the mapped array in the
      passthrough IRQ map without taking any lock.  We carefully order the
      loads and stores of the fields in the kvmppc_irq_map data structure
      using memory barriers to avoid an inconsistent mapping being seen by
      the reader. Thus, although it is possible to miss a map entry, it is
      not possible to read a stale value.
      
      [paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
       pulled out powernv eoi change into a separate patch, made
       kvmppc_read_intr get the vcpu from the paca rather than being
       passed in, rewrote the logic at the end of kvmppc_read_intr to
       avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
       since we were always restoring SRR0/1 anyway, get rid of the cached
       array (just use the mapped array), removed the kick_all_cpus_sync()
       call, clear saved_xirr PACA field when we handle the interrupt in
       real mode, fix compilation with CONFIG_KVM_XICS=n.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e3c13e56