1. 11 5月, 2016 2 次提交
  2. 22 3月, 2016 3 次提交
  3. 08 3月, 2016 1 次提交
  4. 03 3月, 2016 1 次提交
  5. 02 3月, 2016 3 次提交
  6. 01 3月, 2016 1 次提交
  7. 29 2月, 2016 6 次提交
    • S
      KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection · 520fe9c6
      Suresh E. Warrier 提交于
      Redirecting the wakeup of a VCPU from the H_IPI hypercall to
      a core running in the host is usually a good idea, most workloads
      seemed to benefit. However, in one heavily interrupt-driven SMT1
      workload, some regression was observed. This patch adds a kvm_hv
      module parameter called h_ipi_redirect to control this feature.
      
      The default value for this tunable is 1 - that is enable the feature.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      520fe9c6
    • S
      KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU · e17769eb
      Suresh E. Warrier 提交于
      This patch adds support to real-mode KVM to search for a core
      running in the host partition and send it an IPI message with
      VCPU to be woken. This avoids having to switch to the host
      partition to complete an H_IPI hypercall when the VCPU which
      is the target of the the H_IPI is not loaded (is not running
      in the guest).
      
      The patch also includes the support in the IPI handler running
      in the host to do the wakeup by calling kvmppc_xics_ipi_action
      for the PPC_MSG_RM_HOST_ACTION message.
      
      When a guest is being destroyed, we need to ensure that there
      are no pending IPIs waiting to wake up a VCPU before we free
      the VCPUs of the guest. This is accomplished by:
      - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
        before freeing any VCPUs in kvm_arch_destroy_vm().
      - Any PPC_MSG_RM_HOST_ACTION messages must be executed first
        before any other PPC_MSG_CALL_FUNCTION messages.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e17769eb
    • S
      KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM · 0c2a6606
      Suresh Warrier 提交于
      This patch adds the support for the kick VCPU operation for
      kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function
      provides the function to be invoked for a host side operation
      when poked by the real mode KVM. This is initiated by KVM by
      sending an IPI to any free host core.
      
      KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and
      rm_data to point to the VCPU to be woken up before sending the IPI.
      Note that we have allocated one kvmppc_host_rm_core structure
      per core. The above values need to be set in the structure
      corresponding to the core to which the IPI will be sent.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0c2a6606
    • S
      KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs · 6f3bb809
      Suresh Warrier 提交于
      The kvmppc_host_rm_ops structure keeps track of which cores are
      are in the host by maintaining a bitmask of active/runnable
      online CPUs that have not entered the guest. This patch adds
      support to manage the bitmask when a CPU is offlined or onlined
      in the host.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      6f3bb809
    • S
      KVM: PPC: Book3S HV: Manage core host state · b8e6a87c
      Suresh Warrier 提交于
      Update the core host state in kvmppc_host_rm_ops whenever
      the primary thread of the core enters the guest or returns
      back.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b8e6a87c
    • S
      KVM: PPC: Book3S HV: Host-side RM data structures · 79b6c247
      Suresh Warrier 提交于
      This patch defines the data structures to support the setting up
      of host side operations while running in real mode in the guest,
      and also the functions to allocate and free it.
      
      The operations are for now limited to virtual XICS operations.
      Currently, we have only defined one operation in the data
      structure:
               - Wake up a VCPU sleeping in the host when it
                 receives a virtual interrupt
      
      The operations are assigned at the core level because PowerKVM
      requires that the host run in SMT off mode. For each core,
      we will need to manage its state atomically - where the state
      is defined by:
      1. Is the core running in the host?
      2. Is there a Real Mode (RM) operation pending on the host?
      
      Currently, core state is only managed at the whole-core level
      even when the system is in split-core mode. This just limits
      the number of free or "available" cores in the host to perform
      any host-side operations.
      
      The kvmppc_host_rm_core.rm_data allows any data to be passed by
      KVM in real mode to the host core along with the operation to
      be performed.
      
      The kvmppc_host_rm_ops structure is allocated the very first time
      a guest VM is started. Initial core state is also set - all online
      cores are in the host. This structure is never deleted, not even
      when there are no active guests. However, it needs to be freed
      when the module is unloaded because the kvmppc_host_rm_ops_hv
      can contain function pointers to kvm-hv.ko functions for the
      different supported host operations.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      79b6c247
  8. 25 2月, 2016 1 次提交
    • M
      KVM: Use simple waitqueue for vcpu->wq · 8577370f
      Marcelo Tosatti 提交于
      The problem:
      
      On -rt, an emulated LAPIC timer instances has the following path:
      
      1) hard interrupt
      2) ksoftirqd is scheduled
      3) ksoftirqd wakes up vcpu thread
      4) vcpu thread is scheduled
      
      This extra context switch introduces unnecessary latency in the
      LAPIC path for a KVM guest.
      
      The solution:
      
      Allow waking up vcpu thread from hardirq context,
      thus avoiding the need for ksoftirqd to be scheduled.
      
      Normal waitqueues make use of spinlocks, which on -RT
      are sleepable locks. Therefore, waking up a waitqueue
      waiter involves locking a sleeping lock, which
      is not allowed from hard interrupt context.
      
      cyclictest command line:
      
      This patch reduces the average latency in my tests from 14us to 11us.
      
      Daniel writes:
      Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
      benchmark on mainline. The test was run 1000 times on
      tip/sched/core 4.4.0-rc8-01134-g0905f04e:
      
        ./x86-run x86/tscdeadline_latency.flat -cpu host
      
      with idle=poll.
      
      The test seems not to deliver really stable numbers though most of
      them are smaller. Paolo write:
      
      "Anything above ~10000 cycles means that the host went to C1 or
      lower---the number means more or less nothing in that case.
      
      The mean shows an improvement indeed."
      
      Before:
      
                     min             max         mean           std
      count  1000.000000     1000.000000  1000.000000   1000.000000
      mean   5162.596000  2019270.084000  5824.491541  20681.645558
      std      75.431231   622607.723969    89.575700   6492.272062
      min    4466.000000    23928.000000  5537.926500    585.864966
      25%    5163.000000  16132529.750000  5790.132275  16683.745433
      50%    5175.000000  2281919.000000  5834.654000  23151.990026
      75%    5190.000000  2382865.750000  5861.412950  24148.206168
      max    5228.000000  4175158.000000  6254.827300  46481.048691
      
      After
                     min            max         mean           std
      count  1000.000000     1000.00000  1000.000000   1000.000000
      mean   5143.511000  2076886.10300  5813.312474  21207.357565
      std      77.668322   610413.09583    86.541500   6331.915127
      min    4427.000000    25103.00000  5529.756600    559.187707
      25%    5148.000000  1691272.75000  5784.889825  17473.518244
      50%    5160.000000  2308328.50000  5832.025000  23464.837068
      75%    5172.000000  2393037.75000  5853.177675  24223.969976
      max    5222.000000  3922458.00000  6186.720500  42520.379830
      
      [Patch was originaly based on the swait implementation found in the -rt
       tree. Daniel ported it to mainline's version and gathered the
       benchmark numbers for tscdeadline_latency test.]
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8577370f
  9. 16 2月, 2016 6 次提交
  10. 15 2月, 2016 1 次提交
    • D
      vfio: Enable VFIO device for powerpc · 178a7875
      David Gibson 提交于
      ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is
      used to handle any necessary interactions between KVM and VFIO.
      
      Currently that device is built on x86 and ARM, but not powerpc, although
      powerpc does support both KVM and VFIO.  This makes things awkward in
      userspace
      
      Currently qemu prints an alarming error message if you attempt to use VFIO
      and it can't initialize the KVM VFIO device.  We don't want to remove the
      warning, because lack of the KVM VFIO device could mean coherency problems
      on x86.  On powerpc, however, the error is harmless but looks disturbing,
      and a test based on host architecture in qemu would be ugly, and break if
      we do need the KVM VFIO device for something important in future.
      
      There's nothing preventing the KVM VFIO device from being built for
      powerpc, so this patch turns it on.  It won't actually do anything, since
      we don't define any of the arch_*() hooks, but it will make qemu happy and
      we can extend it in future if we need to.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      178a7875
  11. 16 1月, 2016 1 次提交
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
  12. 14 1月, 2016 1 次提交
  13. 10 12月, 2015 1 次提交
    • P
      KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR · c20875a3
      Paul Mackerras 提交于
      Currently it is possible for userspace (e.g. QEMU) to set a value
      for the MSR for a guest VCPU which has both of the TS bits set,
      which is an illegal combination.  The result of this is that when
      we execute a hrfid (hypervisor return from interrupt doubleword)
      instruction to enter the guest, the CPU will take a TM Bad Thing
      type of program interrupt (vector 0x700).
      
      Now, if PR KVM is configured in the kernel along with HV KVM, we
      actually handle this without crashing the host or giving hypervisor
      privilege to the guest; instead what happens is that we deliver a
      program interrupt to the guest, with SRR0 reflecting the address
      of the hrfid instruction and SRR1 containing the MSR value at that
      point.  If PR KVM is not configured in the kernel, then we try to
      run the host's program interrupt handler with the MMU set to the
      guest context, which almost certainly causes a host crash.
      
      This closes the hole by making kvmppc_set_msr_hv() check for the
      illegal combination and force the TS field to a safe value (00,
      meaning non-transactional).
      
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c20875a3
  14. 09 12月, 2015 3 次提交
    • G
      KVM: PPC: Book3S PR: Remove unused variable 'vcpu_book3s' · edfaff26
      Geyslan G. Bem 提交于
      The vcpu_book3s variable is assigned but never used. So remove it.
      Found using cppcheck.
      Signed-off-by: NGeyslan G. Bem <geyslan@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      edfaff26
    • T
      KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8 · 760a7364
      Thomas Huth 提交于
      In the old DABR register, the BT (Breakpoint Translation) bit
      is bit number 61. In the new DAWRX register, the WT (Watchpoint
      Translation) bit is bit number 59. So to move the DABR-BT bit
      into the position of the DAWRX-WT bit, it has to be shifted by
      two, not only by one. This fixes hardware watchpoints in gdb of
      older guests that only use the H_SET_DABR/X interface instead
      of the new H_SET_MODE interface.
      
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      760a7364
    • P
      KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better · 1c9e3d51
      Paul Mackerras 提交于
      As we saw with the TM Bad Thing type of program interrupt occurring
      on the hrfid that enters the guest, it is not completely impossible
      to have a trap occurring in the guest entry/exit code, despite the
      fact that the code has been written to avoid taking any traps.
      
      This adds a check in the kvmppc_handle_exit_hv() function to detect
      the case when a trap has occurred in the hypervisor-mode code, and
      instead of treating it just like a trap in guest code, we now print
      a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR
      exit reason.
      
      Of the various interrupts that get handled in the assembly code in
      the guest exit path and that can return directly to the guest, the
      only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check
      (other than system call, which we can avoid just by not doing a sc
      instruction).  Therefore this adds code to the machine check path to
      ensure that if the MCE occurred in hypervisor mode, we exit to the
      host rather than trying to continue the guest.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1c9e3d51
  15. 02 12月, 2015 2 次提交
  16. 01 12月, 2015 1 次提交
  17. 30 11月, 2015 1 次提交
  18. 26 11月, 2015 1 次提交
  19. 06 11月, 2015 2 次提交
    • P
      KVM: PPC: Book3S HV: Don't dynamically split core when already split · f74f2e2e
      Paul Mackerras 提交于
      In static micro-threading modes, the dynamic micro-threading code
      is supposed to be disabled, because subcores can't make independent
      decisions about what micro-threading mode to put the core in - there is
      only one micro-threading mode for the whole core.  The code that
      implements dynamic micro-threading checks for this, except that the
      check was missed in one case.  This means that it is possible for a
      subcore in static 2-way micro-threading mode to try to put the core
      into 4-way micro-threading mode, which usually leads to stuck CPUs,
      spinlock lockups, and other stalls in the host.
      
      The problem was in the can_split_piggybacked_subcores() function, which
      should always return false if the system is in a static micro-threading
      mode.  This fixes the problem by making can_split_piggybacked_subcores()
      use subcore_config_ok() for its checks, as subcore_config_ok() includes
      the necessary check for the static micro-threading modes.
      
      Credit to Gautham Shenoy for working out that the reason for the hangs
      and stalls we were seeing was that we were trying to do dynamic 4-way
      micro-threading while we were in static 2-way mode.
      
      Fixes: b4deba5c
      Cc: vger@stable.kernel.org # v4.3
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f74f2e2e
    • P
      KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails · cf29b215
      Paul Mackerras 提交于
      When handling a hypervisor data or instruction storage interrupt (HDSI
      or HISI), we look up the SLB entry for the address being accessed in
      order to translate the effective address to a virtual address which can
      be looked up in the guest HPT.  This lookup can occasionally fail due
      to the guest replacing an SLB entry without invalidating the evicted
      SLB entry.  In this situation an ERAT (effective to real address
      translation cache) entry can persist and be used by the hardware even
      though there is no longer a corresponding SLB entry.
      
      Previously we would just deliver a data or instruction storage interrupt
      (DSI or ISI) to the guest in this case.  However, this is not correct
      and has been observed to cause guests to crash, typically with a
      data storage protection interrupt on a store to the vmemmap area.
      
      Instead, what we do now is to synthesize a data or instruction segment
      interrupt.  That should cause the guest to reload an appropriate entry
      into the SLB and retry the faulting instruction.  If it still faults,
      we should find an appropriate SLB entry next time and be able to handle
      the fault.
      Tested-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cf29b215
  20. 21 10月, 2015 2 次提交
    • P
      powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8" · 23316316
      Paul Mackerras 提交于
      This reverts commit 9678cdaa ("Use the POWER8 Micro Partition
      Prefetch Engine in KVM HV on POWER8") because the original commit had
      multiple, partly self-cancelling bugs, that could cause occasional
      memory corruption.
      
      In fact the logmpp instruction was incorrectly using register r0 as the
      source of the buffer address and operation code, and depending on what
      was in r0, it would either do nothing or corrupt the 64k page pointed to
      by r0.
      
      The logmpp instruction encoding and the operation code definitions could
      be corrected, but then there is the problem that there is no clearly
      defined way to know when the hardware has finished writing to the
      buffer.
      
      The original commit attempted to work around this by aborting the
      write-out before starting the prefetch, but this is ineffective in the
      case where the virtual core is now executing on a different physical
      core from the one where the write-out was initiated.
      
      These problems plus advice from the hardware designers not to use the
      function (since the measured performance improvement from using the
      feature was actually mostly negative), mean that reverting the code is
      the best option.
      
      Fixes: 9678cdaa ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8")
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      23316316
    • G
      KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path · 70aa3961
      Gautham R. Shenoy 提交于
      Currently a CPU running a guest can receive a H_DOORBELL in the
      following two cases:
      1) When the CPU is napping due to CEDE or there not being a guest
      vcpu.
      2) The CPU is running the guest vcpu.
      
      Case 1), the doorbell message is not cleared since we were waking up
      from nap. Hence when the EE bit gets set on transition from guest to
      host, the H_DOORBELL interrupt is delivered to the host and the
      corresponding handler is invoked.
      
      However in Case 2), the message gets cleared by the action of taking
      the H_DOORBELL interrupt. Since the CPU was running a guest, instead
      of invoking the doorbell handler, the code invokes the second-level
      interrupt handler to switch the context from the guest to the host. At
      this point the setting of the EE bit doesn't result in the CPU getting
      the doorbell interrupt since it has already been delivered once. So,
      the handler for this doorbell is never invoked!
      
      This causes softlockups if the missed DOORBELL was an IPI sent from a
      sibling subcore on the same CPU.
      
      This patch fixes it by explitly invoking the doorbell handler on the
      exit path if the exit reason is H_DOORBELL similar to the way an
      EXTERNAL interrupt is handled. Since this will also handle Case 1), we
      can unconditionally clear the doorbell message in
      kvmppc_check_wake_reason.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      70aa3961