1. 06 2月, 2015 1 次提交
    • P
      kvm: add halt_poll_ns module parameter · f7819512
      Paolo Bonzini 提交于
      This patch introduces a new module parameter for the KVM module; when it
      is present, KVM attempts a bit of polling on every HLT before scheduling
      itself out via kvm_vcpu_block.
      
      This parameter helps a lot for latency-bound workloads---in particular
      I tested it with O_DSYNC writes with a battery-backed disk in the host.
      In this case, writes are fast (because the data doesn't have to go all
      the way to the platters) but they cannot be merged by either the host or
      the guest.  KVM's performance here is usually around 30% of bare metal,
      or 50% if you use cache=directsync or cache=writethrough (these
      parameters avoid that the guest sends pointless flush requests, and
      at the same time they are not slow because of the battery-backed cache).
      The bad performance happens because on every halt the host CPU decides
      to halt itself too.  When the interrupt comes, the vCPU thread is then
      migrated to a new physical CPU, and in general the latency is horrible
      because the vCPU thread has to be scheduled back in.
      
      With this patch performance reaches 60-65% of bare metal and, more
      important, 99% of what you get if you use idle=poll in the guest.  This
      means that the tunable gets rid of this particular bottleneck, and more
      work can be done to improve performance in the kernel or QEMU.
      
      Of course there is some price to pay; every time an otherwise idle vCPUs
      is interrupted by an interrupt, it will poll unnecessarily and thus
      impose a little load on the host.  The above results were obtained with
      a mostly random value of the parameter (500000), and the load was around
      1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
      
      The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
      that can be used to tune the parameter.  It counts how many HLT
      instructions received an interrupt during the polling period; each
      successful poll avoids that Linux schedules the VCPU thread out and back
      in, and may also avoid a likely trip to C1 and back for the physical CPU.
      
      While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
      Of these halts, almost all are failed polls.  During the benchmark,
      instead, basically all halts end within the polling period, except a more
      or less constant stream of 50 per second coming from vCPUs that are not
      running the benchmark.  The wasted time is thus very low.  Things may
      be slightly different for Windows VMs, which have a ~10 ms timer tick.
      
      The effect is also visible on Marcelo's recently-introduced latency
      test for the TSC deadline timer.  Though of course a non-RT kernel has
      awful latency bounds, the latency of the timer is around 8000-10000 clock
      cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
      deadline timer, thus, the effect is both a smaller average latency and
      a smaller variance.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7819512
  2. 29 8月, 2014 3 次提交
  3. 30 6月, 2014 2 次提交
  4. 30 5月, 2014 9 次提交
    • J
      MIPS: KVM: Whitespace fixes in kvm_mips_callbacks · 2dca3725
      James Hogan 提交于
      Fix whitespace in struct kvm_mips_callbacks function pointers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2dca3725
    • J
      MIPS: KVM: Add count frequency KVM register · f74a8e22
      James Hogan 提交于
      Expose the KVM guest CP0_Count frequency to userland via a new
      KVM_REG_MIPS_COUNT_HZ register accessible with the KVM_{GET,SET}_ONE_REG
      ioctls.
      
      When the frequency is altered the bias is adjusted such that the guest
      CP0_Count doesn't jump discontinuously or lose any timer interrupts.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f74a8e22
    • J
      MIPS: KVM: Add master disable count interface · f8239342
      James Hogan 提交于
      Expose two new virtual registers to userland via the
      KVM_{GET,SET}_ONE_REG ioctls.
      
      KVM_REG_MIPS_COUNT_CTL is for timer configuration fields and just
      contains a master disable count bit. This can be used by userland to
      freeze the timer in order to read a consistent state from the timer
      count value and timer interrupt pending bit. This cannot be done with
      the CP0_Cause.DC bit because the timer interrupt pending bit (TI) is
      also in CP0_Cause so it would be impossible to stop the timer without
      also risking a race with an hrtimer interrupt and having to explicitly
      check whether an interrupt should have occurred.
      
      When the timer is re-enabled it resumes without losing time, i.e. the
      CP0_Count value jumps to what it would have been had the timer not been
      disabled, which would also be impossible to do from userland with
      CP0_Cause.DC. The timer interrupt also cannot be lost, i.e. if a timer
      interrupt would have occurred had the timer not been disabled it is
      queued when the timer is re-enabled.
      
      This works by storing the nanosecond monotonic time when the master
      disable is set, and using it for various operations instead of the
      current monotonic time (e.g. when recalculating the bias when the
      CP0_Count is set), until the master disable is cleared again, i.e. the
      timer state is read/written as it would have been at that time. This
      state is exposed to userland via the read-only KVM_REG_MIPS_COUNT_RESUME
      virtual register so that userland can determine the exact time the
      master disable took effect.
      
      This should allow userland to atomically save the state of the timer,
      and later restore it.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f8239342
    • J
      MIPS: KVM: Rewrite count/compare timer emulation · e30492bb
      James Hogan 提交于
      Previously the emulation of the CPU timer was just enough to get a Linux
      guest running but some shortcuts were taken:
       - The guest timer interrupt was hard coded to always happen every 10 ms
         rather than being timed to when CP0_Count would match CP0_Compare.
       - The guest's CP0_Count register was based on the host's CP0_Count
         register. This isn't very portable and fails on cores without a
         CP_Count register implemented such as Ingenic XBurst. It also meant
         that the guest's CP0_Cause.DC bit to disable the CP0_Count register
         took no effect.
       - The guest's CP0_Count register was emulated by just dividing the
         host's CP0_Count register by 4. This resulted in continuity problems
         when used as a clock source, since when the host CP0_Count overflows
         from 0x7fffffff to 0x80000000, the guest CP0_Count transitions
         discontinuously from 0x1fffffff to 0xe0000000.
      
      Therefore rewrite & fix emulation of the guest timer based on the
      monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias
      value is added to the frequency scaled nanosecond monotonic time to get
      the guest's CP0_Count. The frequency of the timer is initialised to
      100MHz and cannot yet be changed, but a later patch will allow the
      frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl
      interface.
      
      The timer can now be stopped via the CP0_Cause.DC bit (by the guest or
      via the KVM_SET_ONE_REG ioctl interface), at which point the current
      CP0_Count is stored and can be read directly. When it is restarted the
      bias is recalculated such that the CP0_Count value is continuous.
      
      Due to the nature of hrtimer interrupts any read of the guest's
      CP0_Count register while it is running triggers a check for whether the
      hrtimer has expired, so that the guest/userland cannot observe the
      CP0_Count passing CP0_Compare without queuing a timer interrupt. This is
      also taken advantage of when stopping the timer to ensure that a pending
      timer interrupt is queued.
      
      This replaces the implementation of:
       - Guest read of CP0_Count
       - Guest write of CP0_Count
       - Guest write of CP0_Compare
       - Guest write of CP0_Cause
       - Guest read of HWR 2 (CC) with RDHWR
       - Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface
       - Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e30492bb
    • J
      MIPS: KVM: Fix timer race modifying guest CP0_Cause · c73c99b0
      James Hogan 提交于
      The hrtimer callback for guest timer timeouts sets the guest's
      CP0_Cause.TI bit to indicate to the guest that a timer interrupt is
      pending, however there is no mutual exclusion implemented to prevent
      this occurring while the guest's CP0_Cause register is being
      read-modify-written elsewhere.
      
      When this occurs the setting of the CP0_Cause.TI bit is undone and the
      guest misses the timer interrupt and doesn't reprogram the CP0_Compare
      register for the next timeout. Currently another timer interrupt will be
      triggered again in another 10ms anyway due to the way timers are
      emulated, but after the MIPS timer emulation is fixed this would result
      in Linux guest time standing still and the guest scheduler not being
      invoked until the guest CP0_Count has looped around again, which at
      100MHz takes just under 43 seconds.
      
      Currently this is the only asynchronous modification of guest registers,
      therefore it is fixed by adjusting the implementations of the
      kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and
      kvm_change_c0_guest_cause() macros which are used for modifying the
      guest CP0_Cause register to use ll/sc to ensure atomic modification.
      This should work in both UP and SMP cases without requiring interrupts
      to be disabled.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c73c99b0
    • J
      MIPS: KVM: Add CP0_UserLocal KVM register access · 7767b7d2
      James Hogan 提交于
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      UserLocal register. This is so that userland can save and restore its
      value.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7767b7d2
    • J
      MIPS: KVM: Add CP0_Count/Compare KVM register access · f8be02da
      James Hogan 提交于
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      Count and Compare registers. These registers are special in that writing
      to them has side effects (adjusting the time until the next timer
      interrupt) and reading of Count depends on the time. Therefore add a
      couple of callbacks so that different implementations (trap & emulate or
      VZ) can implement them differently depending on what the hardware
      provides.
      
      The trap & emulate versions mostly duplicate what happens when a T&E
      guest reads or writes these registers, so it inherits the same
      limitations which can be fixed in later patches.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f8be02da
    • J
      MIPS: KVM: Move KVM_{GET,SET}_ONE_REG definitions into kvm_host.h · 48a3c4e4
      James Hogan 提交于
      Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of
      kvm_mips.c to kvm_host.h so that they can be shared between multiple
      source files. This allows register access to be indirected depending on
      the underlying implementation (trap & emulate or VZ).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      48a3c4e4
    • J
      MIPS: KVM: Use local_flush_icache_range to fix RI on XBurst · facaaec1
      James Hogan 提交于
      MIPS KVM uses mips32_SyncICache to synchronise the icache with the
      dcache after dynamically modifying guest instructions or writing guest
      exception vector. However this uses rdhwr to get the SYNCI step, which
      causes a reserved instruction exception on Ingenic XBurst cores.
      
      It would seem to make more sense to use local_flush_icache_range()
      instead which does the same thing but is more portable.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      facaaec1
  5. 20 3月, 2014 2 次提交
    • J
      MIPS: KVM: Consult HWREna before emulating RDHWR · 26f4f3b5
      James Hogan 提交于
      The ability to read hardware registers from userland with the RDHWR
      instruction should depend upon the corresponding bit of the HWREna
      register being set, otherwise a reserved instruction exception should be
      generated.
      
      However KVM's current emulation ignores the guest's HWREna and always
      emulates RDHWR instructions even if the guest OS has disallowed them.
      
      Therefore rework the RDHWR emulation code to check for privilege or the
      corresponding bit in the guest HWREna bit. Also remove the #if 0 case
      for the UserLocal register. I presume it was there for debug purposes
      but it seems unnecessary now that the guest can control whether it
      causes a guest exception.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      26f4f3b5
    • J
      MIPS: KVM: asm/kvm_host.h: Clean up whitespace · 22027945
      James Hogan 提交于
      The whitespace in asm/kvm_host.h is quite inconsistent in places. Clean
      up the whole file to use tabs more consistently.
      
      When you use the --ignore-space-change argument to git diff this patch
      only changes line wrapping in TLB_IS_GLOBAL and TLB_IS_VALID macros.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      22027945
  6. 25 1月, 2014 1 次提交
    • J
      MIPS: KVM: remove shadow_tlb code · 08596b0a
      James Hogan 提交于
      The kvm_mips_init_shadow_tlb() function is called from
      kvm_arch_vcpu_init() and initialises entries 0 to
      current_cpu_data.tlbsize-1 of the virtual cpu's shadow_tlb[64] array.
      
      However newer cores with FTLBs can have a tlbsize > 64, for example the
      ProAptiv I'm testing on has a total tlbsize of 576. This causes
      kvm_mips_init_shadow_tlb() to overflow the shadow_tlb[64] array and
      overwrite the comparecount_timer among other things, causing a lock up
      when starting a KVM guest.
      
      Aside from kvm_mips_init_shadow_tlb() which only initialises it, the
      shadow_tlb[64] array is only actually used by the following functions:
       - kvm_shadow_tlb_put() & kvm_shadow_tlb_load()
           These are never called. The only call sites are #if 0'd out.
       - kvm_mips_dump_shadow_tlbs()
           This is never called.
      
      It was originally added for trap & emulate, but turned out to be
      unnecessary so it was disabled.
      
      So instead of fixing the shadow_tlb initialisation code, lets just
      remove the shadow_tlb[64] array and the above functions entirely. The
      only functional change here is the removal of broken shadow_tlb
      initialisation. The rest just deletes dead code.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJohn Crispin <blogic@openwrt.org>
      Patchwork: http://patchwork.linux-mips.org/patch/6384/
      08596b0a
  7. 14 10月, 2013 1 次提交
  8. 03 6月, 2013 1 次提交
  9. 17 5月, 2013 1 次提交
  10. 08 5月, 2013 1 次提交