1. 13 5月, 2016 1 次提交
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  2. 10 5月, 2016 1 次提交
    • J
      MIPS: KVM: Fix timer IRQ race when writing CP0_Compare · b45bacd2
      James Hogan 提交于
      Writing CP0_Compare clears the timer interrupt pending bit
      (CP0_Cause.TI), but this wasn't being done atomically. If a timer
      interrupt raced with the write of the guest CP0_Compare, the timer
      interrupt could end up being pending even though the new CP0_Compare is
      nowhere near CP0_Count.
      
      We were already updating the hrtimer expiry with
      kvm_mips_update_hrtimer(), which used both kvm_mips_freeze_hrtimer() and
      kvm_mips_resume_hrtimer(). Close the race window by expanding out
      kvm_mips_update_hrtimer(), and clearing CP0_Cause.TI and setting
      CP0_Compare between the freeze and resume. Since the pending timer
      interrupt should not be cleared when CP0_Compare is written via the KVM
      user API, an ack argument is added to distinguish the source of the
      write.
      
      Fixes: e30492bb ("MIPS: KVM: Rewrite count/compare timer emulation")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.16.x-
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b45bacd2
  3. 24 1月, 2016 5 次提交
  4. 16 1月, 2016 1 次提交
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
  5. 23 10月, 2015 1 次提交
  6. 25 9月, 2015 1 次提交
  7. 16 9月, 2015 1 次提交
    • P
      KVM: add halt_attempted_poll to VCPU stats · 62bea5bf
      Paolo Bonzini 提交于
      This new statistic can help diagnosing VCPUs that, for any reason,
      trigger bad behavior of halt_poll_ns autotuning.
      
      For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
      like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
      10+20+40+80+160+320+480 = 1110 microseconds out of every
      479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
      is consuming about 30% more CPU than it would use without
      polling.  This would show as an abnormally high number of
      attempted polling compared to the successful polls.
      
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      62bea5bf
  8. 26 5月, 2015 1 次提交
  9. 28 3月, 2015 11 次提交
    • J
      MIPS: KVM: Add MSA exception handling · c2537ed9
      James Hogan 提交于
      Add guest exception handling for MIPS SIMD Architecture (MSA) floating
      point exceptions and MSA disabled exceptions.
      
      MSA floating point exceptions from the guest need passing to the guest
      kernel, so for these a guest MSAFPE is emulated.
      
      MSA disabled exceptions are normally handled by passing a reserved
      instruction exception to the guest (because no guest MSA was supported),
      but the hypervisor can now handle them if the guest has MSA by passing
      an MSA disabled exception to the guest, or if the guest has MSA enabled
      by transparently restoring the guest MSA context and enabling MSA and
      the FPU.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      c2537ed9
    • J
      MIPS: KVM: Add base guest MSA support · 539cb89f
      James Hogan 提交于
      Add base code for supporting the MIPS SIMD Architecture (MSA) in MIPS
      KVM guests. MSA cannot yet be enabled in the guest, we're just laying
      the groundwork.
      
      As with the FPU, whether the guest's MSA context is loaded is stored in
      another bit in the fpu_inuse vcpu member. This allows MSA to be disabled
      when the guest disables it, but keeping the MSA context loaded so it
      doesn't have to be reloaded if the guest re-enables it.
      
      New assembly code is added for saving and restoring the MSA context,
      restoring only the upper half of the MSA context (for if the FPU context
      is already loaded) and for saving/clearing and restoring MSACSR (which
      can itself cause an MSA FP exception depending on the value). The MSACSR
      is restored before returning to the guest if MSA is already enabled, and
      the existing FP exception die notifier is extended to catch the possible
      MSA FP exception and step over the ctcmsa instruction.
      
      The helper function kvm_own_msa() is added to enable MSA and restore
      the MSA context if it isn't already loaded, which will be used in a
      later patch when the guest attempts to use MSA for the first time and
      triggers an MSA disabled exception.
      
      The existing FPU helpers are extended to handle MSA. kvm_lose_fpu()
      saves the full MSA context if it is loaded (which includes the FPU
      context) and both kvm_lose_fpu() and kvm_drop_fpu() disable MSA.
      
      kvm_own_fpu() also needs to lose any MSA context if FR=0, since there
      would be a risk of getting reserved instruction exceptions if CU1 is
      enabled and we later try and save the MSA context. We shouldn't usually
      hit this case since it will be handled when emulating CU1 changes,
      however there's nothing to stop the guest modifying the Status register
      directly via the comm page, which will cause this case to get hit.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      539cb89f
    • J
      MIPS: KVM: Add FP exception handling · 1c0cd66a
      James Hogan 提交于
      Add guest exception handling for floating point exceptions and
      coprocessor 1 unusable exceptions.
      
      Floating point exceptions from the guest need passing to the guest
      kernel, so for these a guest FPE is emulated.
      
      Also, coprocessor 1 unusable exceptions are normally passed straight
      through to the guest (because no guest FPU was supported), but the
      hypervisor can now handle them if the guest has its FPU enabled by
      restoring the guest FPU context and enabling the FPU.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      1c0cd66a
    • J
      MIPS: KVM: Add base guest FPU support · 98e91b84
      James Hogan 提交于
      Add base code for supporting FPU in MIPS KVM guests. The FPU cannot yet
      be enabled in the guest, we're just laying the groundwork.
      
      Whether the guest's FPU context is loaded is stored in a bit in the
      fpu_inuse vcpu member. This allows the FPU to be disabled when the guest
      disables it, but keeping the FPU context loaded so it doesn't have to be
      reloaded if the guest re-enables it.
      
      An fpu_enabled vcpu member stores whether userland has enabled the FPU
      capability (which will be wired up in a later patch).
      
      New assembly code is added for saving and restoring the FPU context, and
      for saving/clearing and restoring FCSR (which can itself cause an FP
      exception depending on the value). The FCSR is restored before returning
      to the guest if the FPU is already enabled, and a die notifier is
      registered to catch the possible FP exception and step over the ctc1
      instruction.
      
      The helper function kvm_lose_fpu() is added to save FPU context and
      disable the FPU, which is used when saving hardware state before a
      context switch or KVM exit (the vcpu_get_regs() callback).
      
      The helper function kvm_own_fpu() is added to enable the FPU and restore
      the FPU context if it isn't already loaded, which will be used in a
      later patch when the guest attempts to use the FPU for the first time
      and triggers a co-processor unusable exception.
      
      The helper function kvm_drop_fpu() is added to discard the FPU context
      and disable the FPU, which will be used in a later patch when the FPU
      state will become architecturally UNPREDICTABLE (change of FR mode) to
      force a reload of [stale] context in the new FR mode.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      98e91b84
    • J
      MIPS: KVM: Add vcpu_get_regs/vcpu_set_regs callback · b86ecb37
      James Hogan 提交于
      Add a vcpu_get_regs() and vcpu_set_regs() callbacks for loading and
      restoring context which may be in hardware registers. This may include
      floating point and MIPS SIMD Architecture (MSA) state which may be
      accessed directly by the guest (but restored lazily by the hypervisor),
      and also dedicated guest registers as provided by the VZ ASE.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      b86ecb37
    • J
      MIPS: KVM: Add Config4/5 and writing of Config registers · c771607a
      James Hogan 提交于
      Add Config4 and Config5 co-processor 0 registers, and add capability to
      write the Config1, Config3, Config4, and Config5 registers using the KVM
      API.
      
      Only supported bits can be written, to minimise the chances of the guest
      being given a configuration from e.g. QEMU that is inconsistent with
      that being emulated, and as such the handling is in trap_emul.c as it
      may need to be different for VZ. Currently the only modification
      permitted is to make Config4 and Config5 exist via the M bits, but other
      bits will be added for FPU and MSA support in future patches.
      
      Care should be taken by userland not to change bits without fully
      handling the possible extra state that may then exist and which the
      guest may begin to use and depend on.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      c771607a
    • J
      MIPS: KVM: Simplify default guest Config registers · 2211ee81
      James Hogan 提交于
      Various semi-used definitions exist in kvm_host.h for the default guest
      config registers. Remove them and use the appropriate values directly
      when initialising the Config registers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      2211ee81
    • J
      MIPS: KVM: Clean up register definitions a little · 7bd4acec
      James Hogan 提交于
      Clean up KVM_GET_ONE_REG / KVM_SET_ONE_REG register definitions for
      MIPS, to prepare for adding a new group for FPU & MSA vector registers.
      
      Definitions are added for common bits in each group of registers, e.g.
      KVM_REG_MIPS_CP0 = KVM_REG_MIPS | 0x10000, for the coprocessor 0
      registers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      7bd4acec
    • J
      MIPS: KVM: Implement PRid CP0 register access · 1068eaaf
      James Hogan 提交于
      Implement access to the guest Processor Identification CP0 register
      using the KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctls. This allows the
      owning process to modify and read back the value that is exposed to the
      guest in this register.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      1068eaaf
    • J
      MIPS: KVM: Handle TRAP exceptions from guest kernel · 0a560427
      James Hogan 提交于
      Trap instructions are used by Linux to implement BUG_ON(), however KVM
      doesn't pass trap exceptions on to the guest if they occur in guest
      kernel mode, instead triggering an internal error "Exception Code: 13,
      not yet handled". The guest kernel then doesn't get a chance to print
      the usual BUG message and stack trace.
      
      Implement handling of the trap exception so that it gets passed to the
      guest and the user is left with a more useful log message.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      0a560427
    • J
      MIPS: KVM: Handle MSA Disabled exceptions from guest · 98119ad5
      James Hogan 提交于
      Guest user mode can generate a guest MSA Disabled exception on an MSA
      capable core by simply trying to execute an MSA instruction. Since this
      exception is unknown to KVM it will be passed on to the guest kernel.
      However guest Linux kernels prior to v3.15 do not set up an exception
      handler for the MSA Disabled exception as they don't support any MSA
      capable cores. This results in a guest OS panic.
      
      Since an older processor ID may be being emulated, and MSA support is
      not advertised to the guest, the correct behaviour is to generate a
      Reserved Instruction exception in the guest kernel so it can send the
      guest process an illegal instruction signal (SIGILL), as would happen
      with a non-MSA-capable core.
      
      Fix this as minimally as reasonably possible by preventing
      kvm_mips_check_privilege() from relaying MSA Disabled exceptions from
      guest user mode to the guest kernel, and handling the MSA Disabled
      exception by emulating a Reserved Instruction exception in the guest,
      via a new handle_msa_disabled() KVM callback.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v3.15+
      98119ad5
  10. 06 2月, 2015 1 次提交
    • P
      kvm: add halt_poll_ns module parameter · f7819512
      Paolo Bonzini 提交于
      This patch introduces a new module parameter for the KVM module; when it
      is present, KVM attempts a bit of polling on every HLT before scheduling
      itself out via kvm_vcpu_block.
      
      This parameter helps a lot for latency-bound workloads---in particular
      I tested it with O_DSYNC writes with a battery-backed disk in the host.
      In this case, writes are fast (because the data doesn't have to go all
      the way to the platters) but they cannot be merged by either the host or
      the guest.  KVM's performance here is usually around 30% of bare metal,
      or 50% if you use cache=directsync or cache=writethrough (these
      parameters avoid that the guest sends pointless flush requests, and
      at the same time they are not slow because of the battery-backed cache).
      The bad performance happens because on every halt the host CPU decides
      to halt itself too.  When the interrupt comes, the vCPU thread is then
      migrated to a new physical CPU, and in general the latency is horrible
      because the vCPU thread has to be scheduled back in.
      
      With this patch performance reaches 60-65% of bare metal and, more
      important, 99% of what you get if you use idle=poll in the guest.  This
      means that the tunable gets rid of this particular bottleneck, and more
      work can be done to improve performance in the kernel or QEMU.
      
      Of course there is some price to pay; every time an otherwise idle vCPUs
      is interrupted by an interrupt, it will poll unnecessarily and thus
      impose a little load on the host.  The above results were obtained with
      a mostly random value of the parameter (500000), and the load was around
      1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
      
      The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
      that can be used to tune the parameter.  It counts how many HLT
      instructions received an interrupt during the polling period; each
      successful poll avoids that Linux schedules the VCPU thread out and back
      in, and may also avoid a likely trip to C1 and back for the physical CPU.
      
      While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
      Of these halts, almost all are failed polls.  During the benchmark,
      instead, basically all halts end within the polling period, except a more
      or less constant stream of 50 per second coming from vCPUs that are not
      running the benchmark.  The wasted time is thus very low.  Things may
      be slightly different for Windows VMs, which have a ~10 ms timer tick.
      
      The effect is also visible on Marcelo's recently-introduced latency
      test for the TSC deadline timer.  Though of course a non-RT kernel has
      awful latency bounds, the latency of the timer is around 8000-10000 clock
      cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
      deadline timer, thus, the effect is both a smaller average latency and
      a smaller variance.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7819512
  11. 29 8月, 2014 3 次提交
  12. 30 6月, 2014 2 次提交
  13. 30 5月, 2014 9 次提交
    • J
      MIPS: KVM: Whitespace fixes in kvm_mips_callbacks · 2dca3725
      James Hogan 提交于
      Fix whitespace in struct kvm_mips_callbacks function pointers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2dca3725
    • J
      MIPS: KVM: Add count frequency KVM register · f74a8e22
      James Hogan 提交于
      Expose the KVM guest CP0_Count frequency to userland via a new
      KVM_REG_MIPS_COUNT_HZ register accessible with the KVM_{GET,SET}_ONE_REG
      ioctls.
      
      When the frequency is altered the bias is adjusted such that the guest
      CP0_Count doesn't jump discontinuously or lose any timer interrupts.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f74a8e22
    • J
      MIPS: KVM: Add master disable count interface · f8239342
      James Hogan 提交于
      Expose two new virtual registers to userland via the
      KVM_{GET,SET}_ONE_REG ioctls.
      
      KVM_REG_MIPS_COUNT_CTL is for timer configuration fields and just
      contains a master disable count bit. This can be used by userland to
      freeze the timer in order to read a consistent state from the timer
      count value and timer interrupt pending bit. This cannot be done with
      the CP0_Cause.DC bit because the timer interrupt pending bit (TI) is
      also in CP0_Cause so it would be impossible to stop the timer without
      also risking a race with an hrtimer interrupt and having to explicitly
      check whether an interrupt should have occurred.
      
      When the timer is re-enabled it resumes without losing time, i.e. the
      CP0_Count value jumps to what it would have been had the timer not been
      disabled, which would also be impossible to do from userland with
      CP0_Cause.DC. The timer interrupt also cannot be lost, i.e. if a timer
      interrupt would have occurred had the timer not been disabled it is
      queued when the timer is re-enabled.
      
      This works by storing the nanosecond monotonic time when the master
      disable is set, and using it for various operations instead of the
      current monotonic time (e.g. when recalculating the bias when the
      CP0_Count is set), until the master disable is cleared again, i.e. the
      timer state is read/written as it would have been at that time. This
      state is exposed to userland via the read-only KVM_REG_MIPS_COUNT_RESUME
      virtual register so that userland can determine the exact time the
      master disable took effect.
      
      This should allow userland to atomically save the state of the timer,
      and later restore it.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f8239342
    • J
      MIPS: KVM: Rewrite count/compare timer emulation · e30492bb
      James Hogan 提交于
      Previously the emulation of the CPU timer was just enough to get a Linux
      guest running but some shortcuts were taken:
       - The guest timer interrupt was hard coded to always happen every 10 ms
         rather than being timed to when CP0_Count would match CP0_Compare.
       - The guest's CP0_Count register was based on the host's CP0_Count
         register. This isn't very portable and fails on cores without a
         CP_Count register implemented such as Ingenic XBurst. It also meant
         that the guest's CP0_Cause.DC bit to disable the CP0_Count register
         took no effect.
       - The guest's CP0_Count register was emulated by just dividing the
         host's CP0_Count register by 4. This resulted in continuity problems
         when used as a clock source, since when the host CP0_Count overflows
         from 0x7fffffff to 0x80000000, the guest CP0_Count transitions
         discontinuously from 0x1fffffff to 0xe0000000.
      
      Therefore rewrite & fix emulation of the guest timer based on the
      monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias
      value is added to the frequency scaled nanosecond monotonic time to get
      the guest's CP0_Count. The frequency of the timer is initialised to
      100MHz and cannot yet be changed, but a later patch will allow the
      frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl
      interface.
      
      The timer can now be stopped via the CP0_Cause.DC bit (by the guest or
      via the KVM_SET_ONE_REG ioctl interface), at which point the current
      CP0_Count is stored and can be read directly. When it is restarted the
      bias is recalculated such that the CP0_Count value is continuous.
      
      Due to the nature of hrtimer interrupts any read of the guest's
      CP0_Count register while it is running triggers a check for whether the
      hrtimer has expired, so that the guest/userland cannot observe the
      CP0_Count passing CP0_Compare without queuing a timer interrupt. This is
      also taken advantage of when stopping the timer to ensure that a pending
      timer interrupt is queued.
      
      This replaces the implementation of:
       - Guest read of CP0_Count
       - Guest write of CP0_Count
       - Guest write of CP0_Compare
       - Guest write of CP0_Cause
       - Guest read of HWR 2 (CC) with RDHWR
       - Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface
       - Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e30492bb
    • J
      MIPS: KVM: Fix timer race modifying guest CP0_Cause · c73c99b0
      James Hogan 提交于
      The hrtimer callback for guest timer timeouts sets the guest's
      CP0_Cause.TI bit to indicate to the guest that a timer interrupt is
      pending, however there is no mutual exclusion implemented to prevent
      this occurring while the guest's CP0_Cause register is being
      read-modify-written elsewhere.
      
      When this occurs the setting of the CP0_Cause.TI bit is undone and the
      guest misses the timer interrupt and doesn't reprogram the CP0_Compare
      register for the next timeout. Currently another timer interrupt will be
      triggered again in another 10ms anyway due to the way timers are
      emulated, but after the MIPS timer emulation is fixed this would result
      in Linux guest time standing still and the guest scheduler not being
      invoked until the guest CP0_Count has looped around again, which at
      100MHz takes just under 43 seconds.
      
      Currently this is the only asynchronous modification of guest registers,
      therefore it is fixed by adjusting the implementations of the
      kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and
      kvm_change_c0_guest_cause() macros which are used for modifying the
      guest CP0_Cause register to use ll/sc to ensure atomic modification.
      This should work in both UP and SMP cases without requiring interrupts
      to be disabled.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c73c99b0
    • J
      MIPS: KVM: Add CP0_UserLocal KVM register access · 7767b7d2
      James Hogan 提交于
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      UserLocal register. This is so that userland can save and restore its
      value.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7767b7d2
    • J
      MIPS: KVM: Add CP0_Count/Compare KVM register access · f8be02da
      James Hogan 提交于
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      Count and Compare registers. These registers are special in that writing
      to them has side effects (adjusting the time until the next timer
      interrupt) and reading of Count depends on the time. Therefore add a
      couple of callbacks so that different implementations (trap & emulate or
      VZ) can implement them differently depending on what the hardware
      provides.
      
      The trap & emulate versions mostly duplicate what happens when a T&E
      guest reads or writes these registers, so it inherits the same
      limitations which can be fixed in later patches.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f8be02da
    • J
      MIPS: KVM: Move KVM_{GET,SET}_ONE_REG definitions into kvm_host.h · 48a3c4e4
      James Hogan 提交于
      Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of
      kvm_mips.c to kvm_host.h so that they can be shared between multiple
      source files. This allows register access to be indirected depending on
      the underlying implementation (trap & emulate or VZ).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      48a3c4e4
    • J
      MIPS: KVM: Use local_flush_icache_range to fix RI on XBurst · facaaec1
      James Hogan 提交于
      MIPS KVM uses mips32_SyncICache to synchronise the icache with the
      dcache after dynamically modifying guest instructions or writing guest
      exception vector. However this uses rdhwr to get the SYNCI step, which
      causes a reserved instruction exception on Ingenic XBurst cores.
      
      It would seem to make more sense to use local_flush_icache_range()
      instead which does the same thing but is more portable.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      facaaec1
  14. 20 3月, 2014 2 次提交
    • J
      MIPS: KVM: Consult HWREna before emulating RDHWR · 26f4f3b5
      James Hogan 提交于
      The ability to read hardware registers from userland with the RDHWR
      instruction should depend upon the corresponding bit of the HWREna
      register being set, otherwise a reserved instruction exception should be
      generated.
      
      However KVM's current emulation ignores the guest's HWREna and always
      emulates RDHWR instructions even if the guest OS has disallowed them.
      
      Therefore rework the RDHWR emulation code to check for privilege or the
      corresponding bit in the guest HWREna bit. Also remove the #if 0 case
      for the UserLocal register. I presume it was there for debug purposes
      but it seems unnecessary now that the guest can control whether it
      causes a guest exception.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      26f4f3b5
    • J
      MIPS: KVM: asm/kvm_host.h: Clean up whitespace · 22027945
      James Hogan 提交于
      The whitespace in asm/kvm_host.h is quite inconsistent in places. Clean
      up the whole file to use tabs more consistently.
      
      When you use the --ignore-space-change argument to git diff this patch
      only changes line wrapping in TLB_IS_GLOBAL and TLB_IS_VALID macros.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      22027945