1. 27 4月, 2017 2 次提交
    • P
      KVM: mark requests that need synchronization · 7a97cec2
      Paolo Bonzini 提交于
      kvm_make_all_requests() provides a synchronization that waits until all
      kicked VCPUs have acknowledged the kick.  This is important for
      KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is
      underway.
      
      This patch adds the synchronization property into all requests that are
      currently being used with kvm_make_all_requests() in order to preserve
      the current behavior and only introduce a new framework.  Removing it
      from requests where it is not necessary is left for future patches.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a97cec2
    • R
      KVM: mark requests that do not need a wakeup · 930f7fd6
      Radim Krčmář 提交于
      Some operations must ensure that the guest is not running with stale
      data, but if the guest is halted, then the update can wait until another
      event happens.  kvm_make_all_requests() currently doesn't wake up, so we
      can mark all requests used with it.
      
      First 8 bits were arbitrarily reserved for request numbers.
      
      Most uses of requests have the request type as a constant, so a compiler
      will optimize the '&'.
      
      An alternative would be to have an inline function that would return
      whether the request needs a wake-up or not, but I like this one better
      even though it might produce worse assembly.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      930f7fd6
  2. 07 4月, 2017 1 次提交
  3. 09 3月, 2017 1 次提交
  4. 08 2月, 2017 1 次提交
  5. 05 11月, 2016 1 次提交
    • M
      arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU · 94d0e598
      Marc Zyngier 提交于
      Architecturally, TLBs are private to the (physical) CPU they're
      associated with. But when multiple vcpus from the same VM are
      being multiplexed on the same CPU, the TLBs are not private
      to the vcpus (and are actually shared across the VMID).
      
      Let's consider the following scenario:
      
      - vcpu-0 maps PA to VA
      - vcpu-1 maps PA' to VA
      
      If run on the same physical CPU, vcpu-1 can hit TLB entries generated
      by vcpu-0 accesses, and access the wrong physical page.
      
      The solution to this is to keep a per-VM map of which vcpu ran last
      on each given physical CPU, and invalidate local TLBs when switching
      to a different vcpu from the same VM.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      94d0e598
  6. 22 9月, 2016 1 次提交
  7. 08 9月, 2016 1 次提交
    • S
      KVM: Add provisioning for ulong vm stats and u64 vcpu stats · 8a7e75d4
      Suraj Jitindar Singh 提交于
      vms and vcpus have statistics associated with them which can be viewed
      within the debugfs. Currently it is assumed within the vcpu_stat_get() and
      vm_stat_get() functions that all of these statistics are represented as
      u32s, however the next patch adds some u64 vcpu statistics.
      
      Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
      Since vcpu statistics are per vcpu, they will only be updated by a single
      vcpu at a time so this shouldn't present a problem on 32-bit machines
      which can't atomically increment 64-bit numbers. However vm statistics
      could potentially be updated by multiple vcpus from that vm at a time.
      To avoid the overhead of atomics make all vm statistics ulong such that
      they are 64-bit on 64-bit systems where they can be atomically incremented
      and are 32-bit on 32-bit systems which may not be able to atomically
      increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8a7e75d4
  8. 19 7月, 2016 1 次提交
  9. 04 7月, 2016 3 次提交
  10. 20 5月, 2016 2 次提交
    • C
      KVM: arm/arm64: vgic-new: Synchronize changes to active state · 35a2d585
      Christoffer Dall 提交于
      When modifying the active state of an interrupt via the MMIO interface,
      we should ensure that the write has the intended effect.
      
      If a guest sets an interrupt to active, but that interrupt is already
      flushed into a list register on a running VCPU, then that VCPU will
      write the active state back into the struct vgic_irq upon returning from
      the guest and syncing its state.  This is a non-benign race, because the
      guest can observe that an interrupt is not active, and it can have a
      reasonable expectations that other VCPUs will not ack any IRQs, and then
      set the state to active, and expect it to stay that way.  Currently we
      are not honoring this case.
      
      Thefore, change both the SACTIVE and CACTIVE mmio handlers to stop the
      world, change the irq state, potentially queue the irq if we're setting
      it to active, and then continue.
      
      We take this chance to slightly optimize these functions by not stopping
      the world when touching private interrupts where there is inherently no
      possible race.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      35a2d585
    • C
      KVM: arm/arm64: Provide functionality to pause and resume a guest · b13216cf
      Christoffer Dall 提交于
      For some rare corner cases in our VGIC emulation later we have to stop
      the guest to make sure the VGIC state is consistent.
      Provide the necessary framework to pause and resume a guest.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      b13216cf
  11. 13 5月, 2016 1 次提交
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  12. 28 4月, 2016 1 次提交
    • A
      arm64: kvm: allows kvm cpu hotplug · 67f69197
      AKASHI Takahiro 提交于
      The current kvm implementation on arm64 does cpu-specific initialization
      at system boot, and has no way to gracefully shutdown a core in terms of
      kvm. This prevents kexec from rebooting the system at EL2.
      
      This patch adds a cpu tear-down function and also puts an existing cpu-init
      code into a separate function, kvm_arch_hardware_disable() and
      kvm_arch_hardware_enable() respectively.
      We don't need the arm64 specific cpu hotplug hook any more.
      
      Since this patch modifies common code between arm and arm64, one stub
      definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
      compilation errors.
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      [Rebase, added separate VHE init/exit path, changed resets use of
       kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
       added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed
       guest-enter after teardown handling]
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      67f69197
  13. 01 3月, 2016 9 次提交
  14. 14 12月, 2015 1 次提交
  15. 23 10月, 2015 4 次提交
    • E
      KVM: arm/arm64: implement kvm_arm_[halt,resume]_guest · 3b92830a
      Eric Auger 提交于
      We introduce kvm_arm_halt_guest and resume functions. They
      will be used for IRQ forward state change.
      
      Halt is synchronous and prevents the guest from being re-entered.
      We use the same mechanism put in place for PSCI former pause,
      now renamed power_off. A new flag is introduced in arch vcpu state,
      pause, only meant to be used by those functions.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3b92830a
    • E
      KVM: arm/arm64: rename pause into power_off · 3781528e
      Eric Auger 提交于
      The kvm_vcpu_arch pause field is renamed into power_off to prepare
      for the introduction of a new pause field. Also vcpu_pause is renamed
      into vcpu_sleep since we will sleep until both power_off and pause are
      false.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3781528e
    • C
      arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block · d35268da
      Christoffer Dall 提交于
      We currently schedule a soft timer every time we exit the guest if the
      timer did not expire while running the guest.  This is really not
      necessary, because the only work we do in the timer work function is to
      kick the vcpu.
      
      Kicking the vcpu does two things:
      (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
      from the waitqueue.
      (2) If the vcpu is running on a different physical CPU from the one
      doing the kick, it sends a reschedule IPI.
      
      The second case cannot happen, because the soft timer is only ever
      scheduled when the vcpu is not running.  The first case is only relevant
      when the vcpu thread is on a waitqueue, which is only the case when the
      vcpu thread has called kvm_vcpu_block().
      
      Therefore, we only need to make sure a timer is scheduled for
      kvm_vcpu_block(), which we do by encapsulating all calls to
      kvm_vcpu_block() with kvm_timer_{un}schedule calls.
      
      Additionally, we only schedule a soft timer if the timer is enabled and
      unmasked, since it is useless otherwise.
      
      Note that theoretically userspace can use the SET_ONE_REG interface to
      change registers that should cause the timer to fire, even if the vcpu
      is blocked without a scheduled timer, but this case was not supported
      before this patch and we leave it for future work for now.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d35268da
    • C
      KVM: Add kvm_arch_vcpu_{un}blocking callbacks · 3217f7c2
      Christoffer Dall 提交于
      Some times it is useful for architecture implementations of KVM to know
      when the VCPU thread is about to block or when it comes back from
      blocking (arm/arm64 needs to know this to properly implement timers, for
      example).
      
      Therefore provide a generic architecture callback function in line with
      what we do elsewhere for KVM generic-arch interactions.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3217f7c2
  16. 25 9月, 2015 1 次提交
  17. 17 9月, 2015 1 次提交
    • M
      arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' · ef748917
      Ming Lei 提交于
      This patch removes config option of KVM_ARM_MAX_VCPUS,
      and like other ARCHs, just choose the maximum allowed
      value from hardware, and follows the reasons:
      
      1) from distribution view, the option has to be
      defined as the max allowed value because it need to
      meet all kinds of virtulization applications and
      need to support most of SoCs;
      
      2) using a bigger value doesn't introduce extra memory
      consumption, and the help text in Kconfig isn't accurate
      because kvm_vpu structure isn't allocated until request
      of creating VCPU is sent from QEMU;
      
      3) the main effect is that the field of vcpus[] in 'struct kvm'
      becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
      lines to hold the structure, but 'struct kvm' is one generic struct,
      and it has worked well on other ARCHs already in this way. Also,
      the world switch frequecy is often low, for example, it is ~2000
      when running kernel building load in VM from APM xgene KVM host,
      so the effect is very small, and the difference can't be observed
      in my test at all.
      
      Cc: Dann Frazier <dann.frazier@canonical.com>
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ef748917
  18. 16 9月, 2015 1 次提交
    • P
      KVM: add halt_attempted_poll to VCPU stats · 62bea5bf
      Paolo Bonzini 提交于
      This new statistic can help diagnosing VCPUs that, for any reason,
      trigger bad behavior of halt_poll_ns autotuning.
      
      For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
      like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
      10+20+40+80+160+320+480 = 1110 microseconds out of every
      479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
      is consuming about 30% more CPU than it would use without
      polling.  This would show as an abnormally high number of
      attempted polling compared to the successful polls.
      
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      62bea5bf
  19. 21 7月, 2015 2 次提交
    • A
      KVM: arm64: introduce vcpu->arch.debug_ptr · 84e690bf
      Alex Bennée 提交于
      This introduces a level of indirection for the debug registers. Instead
      of using the sys_regs[] directly we store registers in a structure in
      the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the
      guest context.
      
      Because we no longer give the sys_regs offset for the sys_reg_desc->reg
      field, but instead the index into a debug-specific struct we need to
      add a number of additional trap functions for each register. Also as the
      generic generic user-space access code no longer works we have
      introduced a new pair of function pointers to the sys_reg_desc structure
      to override the generic code when needed.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      84e690bf
    • A
      KVM: arm: introduce kvm_arm_init/setup/clear_debug · 56c7f5e7
      Alex Bennée 提交于
      This is a precursor for later patches which will need to do more to
      setup debug state before entering the hyp.S switch code. The existing
      functionality for setting mdcr_el2 has been moved out of hyp.S and now
      uses the value kept in vcpu->arch.mdcr_el2.
      
      As the assembler used to previously mask and preserve MDCR_EL2.HPMN I've
      had to add a mechanism to save the value of mdcr_el2 as a per-cpu
      variable during the initialisation code. The kernel never sets this
      number so we are assuming the bootcode has set up the correct value
      here.
      
      This also moves the conditional setting of the TDA bit from the hyp code
      into the C code which is currently used for the lazy debug register
      context switch code.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      56c7f5e7
  20. 12 6月, 2015 1 次提交
  21. 13 3月, 2015 1 次提交
    • M
      arm/arm64: KVM: Implement Stage-2 page aging · 35307b9a
      Marc Zyngier 提交于
      Until now, KVM/arm didn't care much for page aging (who was swapping
      anyway?), and simply provided empty hooks to the core KVM code. With
      server-type systems now being available, things are quite different.
      
      This patch implements very simple support for page aging, by clearing
      the Access flag in the Stage-2 page tables. On access fault, the current
      fault handling will write the PTE or PMD again, putting the Access flag
      back on.
      
      It should be possible to implement a much faster handling for Access
      faults, but that's left for a later patch.
      
      With this in place, performance in VMs is degraded much more gracefully.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      35307b9a
  22. 12 3月, 2015 1 次提交
  23. 06 2月, 2015 1 次提交
    • P
      kvm: add halt_poll_ns module parameter · f7819512
      Paolo Bonzini 提交于
      This patch introduces a new module parameter for the KVM module; when it
      is present, KVM attempts a bit of polling on every HLT before scheduling
      itself out via kvm_vcpu_block.
      
      This parameter helps a lot for latency-bound workloads---in particular
      I tested it with O_DSYNC writes with a battery-backed disk in the host.
      In this case, writes are fast (because the data doesn't have to go all
      the way to the platters) but they cannot be merged by either the host or
      the guest.  KVM's performance here is usually around 30% of bare metal,
      or 50% if you use cache=directsync or cache=writethrough (these
      parameters avoid that the guest sends pointless flush requests, and
      at the same time they are not slow because of the battery-backed cache).
      The bad performance happens because on every halt the host CPU decides
      to halt itself too.  When the interrupt comes, the vCPU thread is then
      migrated to a new physical CPU, and in general the latency is horrible
      because the vCPU thread has to be scheduled back in.
      
      With this patch performance reaches 60-65% of bare metal and, more
      important, 99% of what you get if you use idle=poll in the guest.  This
      means that the tunable gets rid of this particular bottleneck, and more
      work can be done to improve performance in the kernel or QEMU.
      
      Of course there is some price to pay; every time an otherwise idle vCPUs
      is interrupted by an interrupt, it will poll unnecessarily and thus
      impose a little load on the host.  The above results were obtained with
      a mostly random value of the parameter (500000), and the load was around
      1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
      
      The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
      that can be used to tune the parameter.  It counts how many HLT
      instructions received an interrupt during the polling period; each
      successful poll avoids that Linux schedules the VCPU thread out and back
      in, and may also avoid a likely trip to C1 and back for the physical CPU.
      
      While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
      Of these halts, almost all are failed polls.  During the benchmark,
      instead, basically all halts end within the polling period, except a more
      or less constant stream of 50 per second coming from vCPUs that are not
      running the benchmark.  The wasted time is thus very low.  Things may
      be slightly different for Windows VMs, which have a ~10 ms timer tick.
      
      The effect is also visible on Marcelo's recently-introduced latency
      test for the TSC deadline timer.  Though of course a non-RT kernel has
      awful latency bounds, the latency of the timer is around 8000-10000 clock
      cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
      deadline timer, thus, the effect is both a smaller average latency and
      a smaller variance.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7819512
  24. 30 1月, 2015 1 次提交
    • M
      arm/arm64: KVM: Use set/way op trapping to track the state of the caches · 3c1e7165
      Marc Zyngier 提交于
      Trying to emulate the behaviour of set/way cache ops is fairly
      pointless, as there are too many ways we can end-up missing stuff.
      Also, there is some system caches out there that simply ignore
      set/way operations.
      
      So instead of trying to implement them, let's convert it to VA ops,
      and use them as a way to re-enable the trapping of VM ops. That way,
      we can detect the point when the MMU/caches are turned off, and do
      a full VM flush (which is what the guest was trying to do anyway).
      
      This allows a 32bit zImage to boot on the APM thingy, and will
      probably help bootloaders in general.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3c1e7165