1. 24 11月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9 · f725758b
      Paul Mackerras 提交于
      POWER9 includes a new interrupt controller, called XIVE, which is
      quite different from the XICS interrupt controller on POWER7 and
      POWER8 machines.  KVM-HV accesses the XICS directly in several places
      in order to send and clear IPIs and handle interrupts from PCI
      devices being passed through to the guest.
      
      In order to make the transition to XIVE easier, OPAL firmware will
      include an emulation of XICS on top of XIVE.  Access to the emulated
      XICS is via OPAL calls.  The one complication is that the EOI
      (end-of-interrupt) function can now return a value indicating that
      another interrupt is pending; in this case, the XIVE will not signal
      an interrupt in hardware to the CPU, and software is supposed to
      acknowledge the new interrupt without waiting for another interrupt
      to be delivered in hardware.
      
      This adapts KVM-HV to use the OPAL calls on machines where there is
      no XICS hardware.  When there is no XICS, we look for a device-tree
      node with "ibm,opal-intc" in its compatible property, which is how
      OPAL indicates that it provides XICS emulation.
      
      In order to handle the EOI return value, kvmppc_read_intr() has
      become kvmppc_read_one_intr(), with a boolean variable passed by
      reference which can be set by the EOI functions to indicate that
      another interrupt is pending.  The new kvmppc_read_intr() keeps
      calling kvmppc_read_one_intr() until there are no more interrupts
      to process.  The return value from kvmppc_read_intr() is the
      largest non-zero value of the returns from kvmppc_read_one_intr().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f725758b
  2. 21 11月, 2016 1 次提交
  3. 22 10月, 2016 1 次提交
    • M
      KVM: PPC: Book3S HV: Fix build error when SMP=n · 62623d5f
      Michael Ellerman 提交于
      Commit 5d375199 ("KVM: PPC: Book3S HV: Set server for passed-through
      interrupts") broke the SMP=n build:
      
        arch/powerpc/kvm/book3s_hv_rm_xics.c:758:2: error: implicit declaration of function 'get_hard_smp_processor_id'
      
      That is because we lost the implicit include of asm/smp.h, so include it
      explicitly to get the definition for get_hard_smp_processor_id().
      
      Fixes: 5d375199 ("KVM: PPC: Book3S HV: Set server for passed-through interrupts")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      62623d5f
  4. 12 9月, 2016 6 次提交
    • S
      KVM: PPC: Book3S HV: Counters for passthrough IRQ stats · 65e7026a
      Suresh Warrier 提交于
      Add VCPU stat counters to track affinity for passthrough
      interrupts.
      
      pthru_all: Counts all passthrough interrupts whose IRQ mappings are
                 in the kvmppc_passthru_irq_map structure.
      pthru_host: Counts all cached passthrough interrupts that were injected
      	    from the host through kvm_set_irq (i.e. not handled in
      	    real mode).
      pthru_bad_aff: Counts how many cached passthrough interrupts have
                     bad affinity (receiving CPU is not running VCPU that is
      	       the target of the virtual interrupt in the guest).
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      65e7026a
    • P
      KVM: PPC: Book3S HV: Set server for passed-through interrupts · 5d375199
      Paul Mackerras 提交于
      When a guest has a PCI pass-through device with an interrupt, it
      will direct the interrupt to a particular guest VCPU.  In fact the
      physical interrupt might arrive on any CPU, and then get
      delivered to the target VCPU in the emulated XICS (guest interrupt
      controller), and eventually delivered to the target VCPU.
      
      Now that we have code to handle device interrupts in real mode
      without exiting to the host kernel, there is an advantage to having
      the device interrupt arrive on the same sub(core) as the target
      VCPU is running on.  In this situation, the interrupt can be
      delivered to the target VCPU without any exit to the host kernel
      (using a hypervisor doorbell interrupt between threads if
      necessary).
      
      This patch aims to get passed-through device interrupts arriving
      on the correct core by setting the interrupt server in the real
      hardware XICS for the interrupt to the first thread in the (sub)core
      where its target VCPU is running.  We do this in the real-mode H_EOI
      code because the H_EOI handler already needs to look at the
      emulated ICS state for the interrupt (whereas the H_XIRR handler
      doesn't), and we know we are running in the target VCPU context
      at that point.
      
      We set the server CPU in hardware using an OPAL call, regardless of
      what the IRQ affinity mask for the interrupt says, and without
      updating the affinity mask.  This amounts to saying that when an
      interrupt is passed through to a guest, as a matter of policy we
      allow the guest's affinity for the interrupt to override the host's.
      
      This is inspired by an earlier patch from Suresh Warrier, although
      none of this code came from that earlier patch.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5d375199
    • S
      KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode · 366274f5
      Suresh Warrier 提交于
      When a passthrough IRQ is handled completely within KVM real
      mode code, it has to also update the IRQ stats since this
      does not go through the generic IRQ handling code.
      
      However, the per CPU kstat_irqs field is an allocated (not static)
      field and so cannot be directly accessed in real mode safely.
      
      The function this_cpu_inc_rm() is introduced to safely increment
      per CPU fields (currently coded for unsigned integers only) that
      are allocated and could thus be vmalloced also.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      366274f5
    • S
      KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass · 644abbb2
      Suresh Warrier 提交于
      Add a  module parameter kvm_irq_bypass for kvm_hv.ko to
      disable IRQ bypass for passthrough interrupts. The default
      value of this tunable is 1 - that is enable the feature.
      
      Since the tunable is used by built-in kernel code, we use
      the module_param_cb macro to achieve this.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      644abbb2
    • S
      KVM: PPC: Book3S HV: Complete passthrough interrupt in host · f7af5209
      Suresh Warrier 提交于
      In existing real mode ICP code, when updating the virtual ICP
      state, if there is a required action that cannot be completely
      handled in real mode, as for instance, a VCPU needs to be woken
      up, flags are set in the ICP to indicate the required action.
      This is checked when returning from hypercalls to decide whether
      the call needs switch back to the host where the action can be
      performed in virtual mode. Note that if h_ipi_redirect is enabled,
      real mode code will first try to message a free host CPU to
      complete this job instead of returning the host to do it ourselves.
      
      Currently, the real mode PCI passthrough interrupt handling code
      checks if any of these flags are set and simply returns to the host.
      This is not good enough as the trap value (0x500) is treated as an
      external interrupt by the host code. It is only when the trap value
      is a hypercall that the host code searches for and acts on unfinished
      work by calling kvmppc_xics_rm_complete.
      
      This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
      which is returned by KVM if there is unfinished business to be
      completed in host virtual mode after handling a PCI passthrough
      interrupt. The host checks for this special interrupt condition
      and calls into the kvmppc_xics_rm_complete, which is made an
      exported function for this reason.
      
      [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
       in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f7af5209
    • S
      KVM: PPC: Book3S HV: Handle passthrough interrupts in guest · e3c13e56
      Suresh Warrier 提交于
      Currently, KVM switches back to the host to handle any external
      interrupt (when the interrupt is received while running in the
      guest). This patch updates real-mode KVM to check if an interrupt
      is generated by a passthrough adapter that is owned by this guest.
      If so, the real mode KVM will directly inject the corresponding
      virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
      in hardware. In short, the interrupt is handled entirely in real
      mode in the guest context without switching back to the host.
      
      In some rare cases, the interrupt cannot be completely handled in
      real mode, for instance, a VCPU that is sleeping needs to be woken
      up. In this case, KVM simply switches back to the host with trap
      reason set to 0x500. This works, but it is clearly not very efficient.
      A following patch will distinguish this case and handle it
      correctly in the host. Note that we can use the existing
      check_too_hard() routine even though we are not in a hypercall to
      determine if there is unfinished business that needs to be
      completed in host virtual mode.
      
      The patch assumes that the mapping between hardware interrupt IRQ
      and virtual IRQ to be injected to the guest already exists for the
      PCI passthrough interrupts that need to be handled in real mode.
      If the mapping does not exist, KVM falls back to the default
      existing behavior.
      
      The KVM real mode code reads mappings from the mapped array in the
      passthrough IRQ map without taking any lock.  We carefully order the
      loads and stores of the fields in the kvmppc_irq_map data structure
      using memory barriers to avoid an inconsistent mapping being seen by
      the reader. Thus, although it is possible to miss a map entry, it is
      not possible to read a stale value.
      
      [paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
       pulled out powernv eoi change into a separate patch, made
       kvmppc_read_intr get the vcpu from the paca rather than being
       passed in, rewrote the logic at the end of kvmppc_read_intr to
       avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
       since we were always restoring SRR0/1 anyway, get rid of the cached
       array (just use the mapped array), removed the kick_all_cpus_sync()
       call, clear saved_xirr PACA field when we handle the interrupt in
       real mode, fix compilation with CONFIG_KVM_XICS=n.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e3c13e56
  5. 29 2月, 2016 3 次提交
    • S
      KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection · 520fe9c6
      Suresh E. Warrier 提交于
      Redirecting the wakeup of a VCPU from the H_IPI hypercall to
      a core running in the host is usually a good idea, most workloads
      seemed to benefit. However, in one heavily interrupt-driven SMT1
      workload, some regression was observed. This patch adds a kvm_hv
      module parameter called h_ipi_redirect to control this feature.
      
      The default value for this tunable is 1 - that is enable the feature.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      520fe9c6
    • S
      KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU · e17769eb
      Suresh E. Warrier 提交于
      This patch adds support to real-mode KVM to search for a core
      running in the host partition and send it an IPI message with
      VCPU to be woken. This avoids having to switch to the host
      partition to complete an H_IPI hypercall when the VCPU which
      is the target of the the H_IPI is not loaded (is not running
      in the guest).
      
      The patch also includes the support in the IPI handler running
      in the host to do the wakeup by calling kvmppc_xics_ipi_action
      for the PPC_MSG_RM_HOST_ACTION message.
      
      When a guest is being destroyed, we need to ensure that there
      are no pending IPIs waiting to wake up a VCPU before we free
      the VCPUs of the guest. This is accomplished by:
      - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
        before freeing any VCPUs in kvm_arch_destroy_vm().
      - Any PPC_MSG_RM_HOST_ACTION messages must be executed first
        before any other PPC_MSG_CALL_FUNCTION messages.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e17769eb
    • S
      KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM · 0c2a6606
      Suresh Warrier 提交于
      This patch adds the support for the kick VCPU operation for
      kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function
      provides the function to be invoked for a host side operation
      when poked by the real mode KVM. This is initiated by KVM by
      sending an IPI to any free host core.
      
      KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and
      rm_data to point to the VCPU to be woken up before sending the IPI.
      Note that we have allocated one kvmppc_host_rm_core structure
      per core. The above values need to be set in the structure
      corresponding to the core to which the IPI will be sent.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0c2a6606
  6. 22 8月, 2015 1 次提交
    • P
      KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165
      Paul Mackerras 提交于
      When running a virtual core of a guest that is configured with fewer
      threads per core than the physical cores have, the extra physical
      threads are currently unused.  This makes it possible to use them to
      run one or more other virtual cores from the same guest when certain
      conditions are met.  This applies on POWER7, and on POWER8 to guests
      with one thread per virtual core.  (It doesn't apply to POWER8 guests
      with multiple threads per vcore because they require a 1-1 virtual to
      physical thread mapping in order to be able to use msgsndp and the
      TIR.)
      
      The idea is that we maintain a list of preempted vcores for each
      physical cpu (i.e. each core, since the host runs single-threaded).
      Then, when a vcore is about to run, it checks to see if there are
      any vcores on the list for its physical cpu that could be
      piggybacked onto this vcore's execution.  If so, those additional
      vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
      threads are started as well as the original vcore, which is called
      the master vcore.
      
      After the vcores have exited the guest, the extra ones are put back
      onto the preempted list if any of their VCPUs are still runnable and
      not idle.
      
      This means that vcpu->arch.ptid is no longer necessarily the same as
      the physical thread that the vcpu runs on.  In order to make it easier
      for code that wants to send an IPI to know which CPU to target, we
      now store that in a new field in struct vcpu_arch, called thread_cpu.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ec257165
  7. 21 4月, 2015 3 次提交
    • P
      KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C · eddb60fb
      Paul Mackerras 提交于
      This replaces the assembler code for kvmhv_commence_exit() with C code
      in book3s_hv_builtin.c.  It also moves the IPI sending code that was
      in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
      can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      eddb60fb
    • S
      KVM: PPC: Book3S HV: Add ICP real mode counters · 6e0365b7
      Suresh Warrier 提交于
      Add two counters to count how often we generate real-mode ICS resend
      and reject events. The counters provide some performance statistics
      that could be used in the future to consider if the real mode functions
      need further optimizing. The counters are displayed as part of IPC and
      ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.
      
      Also added two counters that count (approximately) how many times we
      don't find an ICP or ICS we're looking for. These are not currently
      exposed through sysfs, but can be useful when debugging crashes.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6e0365b7
    • S
      KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode · b0221556
      Suresh Warrier 提交于
      Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
      to switch to the host to complete the rest of hypercall function in
      virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
      functions to be runnable in hypervisor real mode, thus avoiding the need
      to switch to the host to execute these functions in virtual mode. However,
      the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
      events - these events cannot be done in real mode and they will still need
      a switch to host virtual mode.
      
      There are sufficient differences between the real mode code and the
      virtual mode code for the ICS/ICP resend and reject functions that
      for now the code has been duplicated instead of sharing common code.
      In the future, we can look at creating common functions.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b0221556
  8. 19 1月, 2015 1 次提交
  9. 15 12月, 2014 1 次提交
    • S
      KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI · 5b88cda6
      Suresh E. Warrier 提交于
      This fixes some inaccuracies in the state machine for the virtualized
      ICP when implementing the H_IPI hcall (Set_MFFR and related states):
      
      1. The old code wipes out any pending interrupts when the new MFRR is
         more favored than the CPPR but less favored than a pending
         interrupt (by always modifying xisr and the pending_pri). This can
         cause us to lose a pending external interrupt.
      
         The correct code here is to only modify the pending_pri and xisr in
         the ICP if the MFRR is equal to or more favored than the current
         pending pri (since in this case, it is guaranteed that that there
         cannot be a pending external interrupt). The code changes are
         required in both kvmppc_rm_h_ipi and kvmppc_h_ipi.
      
      2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check
         for whether MFRR is being made less favored AND further if new MFFR
         is also less favored than the current CPPR, we check for any
         resends pending in the ICP. These checks look like they are
         designed to cover the case where if the MFRR is being made less
         favored, we opportunistically trigger a resend of any interrupts
         that had been previously rejected. Although, this is not a state
         described by PAPR, this is an action we actually need to do
         especially if the CPPR is already at 0xFF.  Because in this case,
         the resend bit will stay on until another ICP state change which
         may be a long time coming and the interrupt stays pending until
         then. The current code which checks for MFRR < CPPR is broken when
         CPPR is 0xFF since it will not get triggered in that case.
      
         Ideally, we would want to do a resend only if
      
         	prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr
      
         where pending interrupt is the one that was rejected. But we don't
         have the priority of the pending interrupt state saved, so we
         simply trigger a resend whenever the MFRR is made less favored.
      
      3. In kvmppc_rm_h_ipi, where we save state to pass resends to the
         virtual mode, we also need to save the ICP whose need_resend we
         reset since this does not need to be my ICP (vcpu->arch.icp) as is
         incorrectly assumed by the current code. A new field rm_resend_icp
         is added to the kvmppc_icp structure for this purpose.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5b88cda6
  10. 05 8月, 2014 1 次提交
    • P
      KVM: PPC: Enable IRQFD support for the XICS interrupt controller · 25a2150b
      Paul Mackerras 提交于
      This makes it possible to use IRQFDs on platforms that use the XICS
      interrupt controller.  To do this we implement kvm_irq_map_gsi() and
      kvm_irq_map_chip_pin() in book3s_xics.c, so as to provide a 1-1 mapping
      between global interrupt numbers and XICS interrupt source numbers.
      For now, all interrupts are mapped as "IRQCHIP" interrupts, and no
      MSI support is provided.
      
      This means that kvm_set_irq can now get called with level == 0 or 1
      as well as the powerpc-specific values KVM_INTERRUPT_SET,
      KVM_INTERRUPT_UNSET and KVM_INTERRUPT_SET_LEVEL.  We change
      ics_deliver_irq() to accept all those values, and remove its
      report_status argument, as it is always false, given that we don't
      support KVM_IRQ_LINE_STATUS.
      
      This also adds support for interrupt ack notifiers to the XICS code
      so that the IRQFD resampler functionality can be supported.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Tested-by: NEric Auger <eric.auger@linaro.org>
      Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      25a2150b
  11. 27 4月, 2013 1 次提交