1. 12 9月, 2016 2 次提交
    • S
      KVM: PPC: Book3S HV: Complete passthrough interrupt in host · f7af5209
      Suresh Warrier 提交于
      In existing real mode ICP code, when updating the virtual ICP
      state, if there is a required action that cannot be completely
      handled in real mode, as for instance, a VCPU needs to be woken
      up, flags are set in the ICP to indicate the required action.
      This is checked when returning from hypercalls to decide whether
      the call needs switch back to the host where the action can be
      performed in virtual mode. Note that if h_ipi_redirect is enabled,
      real mode code will first try to message a free host CPU to
      complete this job instead of returning the host to do it ourselves.
      
      Currently, the real mode PCI passthrough interrupt handling code
      checks if any of these flags are set and simply returns to the host.
      This is not good enough as the trap value (0x500) is treated as an
      external interrupt by the host code. It is only when the trap value
      is a hypercall that the host code searches for and acts on unfinished
      work by calling kvmppc_xics_rm_complete.
      
      This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
      which is returned by KVM if there is unfinished business to be
      completed in host virtual mode after handling a PCI passthrough
      interrupt. The host checks for this special interrupt condition
      and calls into the kvmppc_xics_rm_complete, which is made an
      exported function for this reason.
      
      [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
       in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f7af5209
    • S
      KVM: PPC: Book3S HV: Handle passthrough interrupts in guest · e3c13e56
      Suresh Warrier 提交于
      Currently, KVM switches back to the host to handle any external
      interrupt (when the interrupt is received while running in the
      guest). This patch updates real-mode KVM to check if an interrupt
      is generated by a passthrough adapter that is owned by this guest.
      If so, the real mode KVM will directly inject the corresponding
      virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
      in hardware. In short, the interrupt is handled entirely in real
      mode in the guest context without switching back to the host.
      
      In some rare cases, the interrupt cannot be completely handled in
      real mode, for instance, a VCPU that is sleeping needs to be woken
      up. In this case, KVM simply switches back to the host with trap
      reason set to 0x500. This works, but it is clearly not very efficient.
      A following patch will distinguish this case and handle it
      correctly in the host. Note that we can use the existing
      check_too_hard() routine even though we are not in a hypercall to
      determine if there is unfinished business that needs to be
      completed in host virtual mode.
      
      The patch assumes that the mapping between hardware interrupt IRQ
      and virtual IRQ to be injected to the guest already exists for the
      PCI passthrough interrupts that need to be handled in real mode.
      If the mapping does not exist, KVM falls back to the default
      existing behavior.
      
      The KVM real mode code reads mappings from the mapped array in the
      passthrough IRQ map without taking any lock.  We carefully order the
      loads and stores of the fields in the kvmppc_irq_map data structure
      using memory barriers to avoid an inconsistent mapping being seen by
      the reader. Thus, although it is possible to miss a map entry, it is
      not possible to read a stale value.
      
      [paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
       pulled out powernv eoi change into a separate patch, made
       kvmppc_read_intr get the vcpu from the paca rather than being
       passed in, rewrote the logic at the end of kvmppc_read_intr to
       avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
       since we were always restoring SRR0/1 anyway, get rid of the cached
       array (just use the mapped array), removed the kick_all_cpus_sync()
       call, clear saved_xirr PACA field when we handle the interrupt in
       real mode, fix compilation with CONFIG_KVM_XICS=n.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e3c13e56
  2. 09 9月, 2016 6 次提交
    • S
      KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap · 8daaafc8
      Suresh Warrier 提交于
      This patch introduces an IRQ mapping structure, the
      kvmppc_passthru_irqmap structure that is to be used
      to map the real hardware IRQ in the host with the virtual
      hardware IRQ (gsi) that is injected into a guest by KVM for
      passthrough adapters.
      
      Currently, we assume a separate IRQ mapping structure for
      each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
      containing all defined real<->virtual IRQs.
      
      [paulus@ozlabs.org - removed irq_chip field from struct
       kvmppc_passthru_irqmap; changed parameter for
       kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
       kvm *, removed small cached array.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8daaafc8
    • S
      KVM: PPC: select IRQ_BYPASS_MANAGER · 9576730d
      Suresh Warrier 提交于
      Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
      Add the PPC producer functions for add and del producer.
      
      [paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
       so booke compiles; added kvm_arch_has_irq_bypass implementation.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9576730d
    • P
      powerpc: move hmi.c to arch/powerpc/kvm/ · 3f257774
      Paolo Bonzini 提交于
      hmi.c functions are unused unless sibling_subcore_state is nonzero, and
      that in turn happens only if KVM is in use.  So move the code to
      arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
      rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
      included in struct paca_struct only if KVM is supported by the kernel.
      
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: kvm-ppc@vger.kernel.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3f257774
    • S
      powerpc/powernv: Provide facilities for EOI, usable from real mode · 4ee11c1a
      Suresh Warrier 提交于
      This adds a new function pnv_opal_pci_msi_eoi() which does the part of
      end-of-interrupt (EOI) handling of an MSI which involves doing an
      OPAL call.  This function can be called in real mode.  This doesn't
      just export pnv_ioda2_msi_eoi() because that does a call to
      icp_native_eoi(), which does not work in real mode.
      
      This also adds a function, is_pnv_opal_msi(), which KVM can call to
      check whether an interrupt is one for which we should be calling
      pnv_opal_pci_msi_eoi() when we need to do an EOI.
      
      [paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
       from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
       interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4ee11c1a
    • S
      powerpc: Add simple cache inhibited MMIO accessors · 07b1fdf5
      Suresh Warrier 提交于
      Add simple cache inhibited accessors for memory mapped I/O.
      Unlike the accessors built from the DEF_MMIO_* macros, these
      don't include any hardware memory barriers, callers need to
      manage memory barriers on their own. These can only be called
      in hypervisor real mode.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      [paulus@ozlabs.org - added line to comment]
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      07b1fdf5
    • P
      powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c
      Paul Mackerras 提交于
      This replaces a 2-D search through an array with a simple 8-bit table
      lookup for determining the actual and/or base page size for a HPT entry.
      
      The encoding in the second doubleword of the HPTE is designed to encode
      the actual and base page sizes without using any more bits than would be
      needed for a 4k page number, by using between 1 and 8 low-order bits of
      the RPN (real page number) field to encode the page sizes.  A single
      "large page" bit in the first doubleword indicates that these low-order
      bits are to be interpreted like this.
      
      We can determine the page sizes by using the low-order 8 bits of the RPN
      to look up a 256-entry table.  For actual page sizes less than 1MB, some
      of the upper bits of these 8 bits are going to be real address bits, but
      we can cope with that by replicating the entries for those smaller page
      sizes.
      
      While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
      functions from a KVM-specific header to a header for 64-bit HPT systems,
      since this computation doesn't have anything specifically to do with KVM.
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0eeede0c
  3. 08 9月, 2016 5 次提交
    • S
      KVM: PPC: Implement existing and add new halt polling vcpu stats · 2a27f514
      Suraj Jitindar Singh 提交于
      vcpu stats are used to collect information about a vcpu which can be viewed
      in the debugfs. For example halt_attempted_poll and halt_successful_poll
      are used to keep track of the number of times the vcpu attempts to and
      successfully polls. These stats are currently not used on powerpc.
      
      Implement incrementation of the halt_attempted_poll and
      halt_successful_poll vcpu stats for powerpc. Since these stats are summed
      over all the vcpus for all running guests it doesn't matter which vcpu
      they are attributed to, thus we choose the current runner vcpu of the
      vcore.
      
      Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and
      halt_wait_ns to be used to accumulate the total time spend polling
      successfully, polling unsuccessfully and waiting respectively, and
      halt_successful_wait to accumulate the number of times the vcpu waits.
      Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are
      expressed in nanoseconds it is necessary to represent these as 64-bit
      quantities, otherwise they would overflow after only about 4 seconds.
      
      Given that the total time spend either polling or waiting will be known and
      the number of times that each was done, it will be possible to determine
      the average poll and wait times. This will give the ability to tune the kvm
      module parameters based on the calculated average wait and poll times.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2a27f514
    • S
      KVM: Add provisioning for ulong vm stats and u64 vcpu stats · 8a7e75d4
      Suraj Jitindar Singh 提交于
      vms and vcpus have statistics associated with them which can be viewed
      within the debugfs. Currently it is assumed within the vcpu_stat_get() and
      vm_stat_get() functions that all of these statistics are represented as
      u32s, however the next patch adds some u64 vcpu statistics.
      
      Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
      Since vcpu statistics are per vcpu, they will only be updated by a single
      vcpu at a time so this shouldn't present a problem on 32-bit machines
      which can't atomically increment 64-bit numbers. However vm statistics
      could potentially be updated by multiple vcpus from that vm at a time.
      To avoid the overhead of atomics make all vm statistics ulong such that
      they are 64-bit on 64-bit systems where they can be atomically incremented
      and are 32-bit on 32-bit systems which may not be able to atomically
      increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8a7e75d4
    • S
      KVM: PPC: Book3S HV: Implement halt polling · 0cda69dd
      Suraj Jitindar Singh 提交于
      This patch introduces new halt polling functionality into the kvm_hv kernel
      module. When a vcore is idle it will poll for some period of time before
      scheduling itself out.
      
      When all of the runnable vcpus on a vcore have ceded (and thus the vcore is
      idle) we schedule ourselves out to allow something else to run. In the
      event that we need to wake up very quickly (for example an interrupt
      arrives), we are required to wait until we get scheduled again.
      
      Implement halt polling so that when a vcore is idle, and before scheduling
      ourselves, we poll for vcpus in the runnable_threads list which have
      pending exceptions or which leave the ceded state. If we poll successfully
      then we can get back into the guest very quickly without ever scheduling
      ourselves, otherwise we schedule ourselves out as before.
      
      There exists generic halt_polling code in virt/kvm_main.c, however on
      powerpc the polling conditions are different to the generic case. It would
      be nice if we could just implement an arch specific kvm_check_block()
      function, but there is still other arch specific things which need to be
      done for kvm_hv (for example manipulating vcore states) which means that a
      separate implementation is the best option.
      
      Testing of this patch with a TCP round robin test between two guests with
      virtio network interfaces has found a decrease in round trip time of ~15us
      on average. A performance gain is only seen when going out of and
      back into the guest often and quickly, otherwise there is no net benefit
      from the polling. The polling interval is adjusted such that when we are
      often scheduled out for long periods of time it is reduced, and when we
      often poll successfully it is increased. The rate at which the polling
      interval increases or decreases, and the maximum polling interval, can
      be set through module parameters.
      
      Based on the implementation in the generic kvm module by Wanpeng Li and
      Paolo Bonzini, and on direction from Paul Mackerras.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0cda69dd
    • S
      KVM: PPC: Book3S HV: Change vcore element runnable_threads from linked-list to array · 7b5f8272
      Suraj Jitindar Singh 提交于
      The struct kvmppc_vcore is a structure used to store various information
      about a virtual core for a kvm guest. The runnable_threads element of the
      struct provides a list of all of the currently runnable vcpus on the core
      (those in the KVMPPC_VCPU_RUNNABLE state). The previous implementation of
      this list was a linked_list. The next patch requires that the list be able
      to be iterated over without holding the vcore lock.
      
      Reimplement the runnable_threads list in the kvmppc_vcore struct as an
      array. Implement function to iterate over valid entries in the array and
      update access sites accordingly.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7b5f8272
    • S
      KVM: PPC: Book3S HV: Move struct kvmppc_vcore from kvm_host.h to kvm_book3s.h · e64fb7e2
      Suraj Jitindar Singh 提交于
      The next commit will introduce a member to the kvmppc_vcore struct which
      references MAX_SMT_THREADS which is defined in kvm_book3s_asm.h, however
      this file isn't included in kvm_host.h directly. Thus compiling for
      certain platforms such as pmac32_defconfig and ppc64e_defconfig with KVM
      fails due to MAX_SMT_THREADS not being defined.
      
      Move the struct kvmppc_vcore definition to kvm_book3s.h which explicitly
      includes kvm_book3s_asm.h.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e64fb7e2
  4. 19 8月, 2016 1 次提交
  5. 10 8月, 2016 2 次提交
    • B
      powerpc/32: Fix crash during static key init · 97f6e0cc
      Benjamin Herrenschmidt 提交于
      We cannot do those initializations from apply_feature_fixups() as
      this function runs in a very restricted environment on 32-bit where
      the kernel isn't running at its linked address and the PTRRELOC()
      macro must be used for any global accesss.
      
      Instead, split them into a separtate steup_feature_keys() function
      which is called in a more suitable spot on ppc32.
      
      Fixes: 309b315b ("powerpc: Call jump_label_init() in apply_feature_fixups()")
      Reported-and-tested-by: NChristian Kujau <lists@nerdbynature.de>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      97f6e0cc
    • C
      powerpc/ptrace: Fix coredump since ptrace TM changes · c7a318ba
      Cyril Bur 提交于
      Commit 8d460f61 ("powerpc/process: Add the function
      flush_tmregs_to_thread") added flush_tmregs_to_thread() and included
      the assumption that it would only be called for a task which is not
      current.
      
      Although this is correct for ptrace, when generating a core dump, some
      of the routines which call flush_tmregs_to_thread() are called. This
      leads to a WARNing such as:
      
        Not expecting ptrace on self: TM regs may be incorrect
        ------------[ cut here ]------------
        WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80
        CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d5 #1
        task: c000000fe631b600 task.stack: c000000fe63b0000
        NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780
        REGS: c000000fe63b3420 TRAP: 0700   Not tainted  (4.8.0-rc1-gcc6x-g61e8a0d5)
        MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 28004222  XER: 20000000
        ...
        NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80
        LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80
        Call Trace:
         flush_tmregs_to_thread+0x74/0x80 (unreliable)
         vsr_get+0x64/0x1a0
         elf_core_dump+0x604/0x1430
         do_coredump+0x5fc/0x1200
         get_signal+0x398/0x740
         do_signal+0x54/0x2b0
         do_notify_resume+0x98/0xb0
         ret_from_except_lite+0x70/0x74
      
      So fix flush_tmregs_to_thread() to detect the case where it is called on
      current, and a transaction is active, and in that case flush the TM regs
      to the thread_struct.
      
      This patch also moves flush_tmregs_to_thread() into ptrace.c as it is
      only called from that file.
      
      Fixes: 8d460f61 ("powerpc/process: Add the function flush_tmregs_to_thread")
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Flesh out change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c7a318ba
  6. 09 8月, 2016 2 次提交
    • M
      powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h · 98d8821a
      Mahesh Salgaonkar 提交于
      Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h so that MCE handler changes
      in subsequent patch can use it.
      
      No functionality change.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      98d8821a
    • B
      powerpc/xics: Properly set Edge/Level type and enable resend · 880a3d6a
      Benjamin Herrenschmidt 提交于
      This sets the type of the interrupt appropriately. We set it as follow:
      
       - If not mapped from the device-tree, we use edge. This is the case
      of the virtual interrupts and PCI MSIs for example.
      
       - If mapped from the device-tree and #interrupt-cells is 2 (PAPR
      compliant), we use the second cell to set the appropriate type
      
       - If mapped from the device-tree and #interrupt-cells is 1 (current
      OPAL on P8 does that), we assume level sensitive since those are
      typically going to be the PSI LSIs which are level sensitive.
      
      Additionally, we mark the interrupts requested via the opal_interrupts
      property all level. This is a bit fishy but the best we can do until we
      fix OPAL to properly expose them with a complete descriptor. It is also
      correct for the current HW anyway as OPAL interrupts are currently PCI
      error and PSI interrupts which are level.
      
      Finally now that edge interrupts are properly identified, we can enable
      CONFIG_HARDIRQS_SW_RESEND which will make the core re-send them if
      they occur while masked, which some drivers rely upon.
      
      This fixes issues with lost interrupts on some Mellanox adapters.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      880a3d6a
  7. 04 8月, 2016 3 次提交
  8. 03 8月, 2016 2 次提交
  9. 01 8月, 2016 17 次提交