1. 11 7月, 2012 6 次提交
  2. 30 5月, 2012 2 次提交
    • B
      KVM: PPC: booke: Added DECAR support · 21bd000a
      Bharat Bhushan 提交于
      Added the decrementer auto-reload support. DECAR is readable
      on e500v2/e500mc and later cpus.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      21bd000a
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
  3. 16 5月, 2012 5 次提交
    • P
      KVM: PPC: Book3S HV: Fix bug leading to deadlock in guest HPT updates · 51bfd299
      Paul Mackerras 提交于
      When handling the H_BULK_REMOVE hypercall, we were forgetting to
      invalidate and unlock the hashed page table entry (HPTE) in the case
      where the page had been paged out.  This fixes it by clearing the
      first doubleword of the HPTE in that case.
      
      This fixes a regression introduced in commit a92bce95 ("KVM: PPC:
      Book3S HV: Keep HPTE locked when invalidating").  The effect of the
      regression is that the host kernel will sometimes hang when under
      memory pressure.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      51bfd299
    • B
      powerpc/kvm: Fix VSID usage in 64-bit "PR" KVM · ffe36492
      Benjamin Herrenschmidt 提交于
      The code forgot to scramble the VSIDs the way we normally do
      and was basically using the "proto VSID" directly with the MMU.
      
      This means that in practice, KVM used random VSIDs that could
      collide with segments used by other user space programs.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [agraf: simplify ppc32 case]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ffe36492
    • A
      KVM: PPC: Book3S: PR: Fix hsrr code · 32c7dbfd
      Alexander Graf 提交于
      When jumping back into the kernel to code that knows that it would be
      using HSRR registers instead of SRR registers, we need to make sure we
      pass it all information on where to jump to in HSRR registers.
      
      Unfortunately, we used r10 to store the information to distinguish between
      the HSRR and SRR case. That register got clobbered in between though,
      rendering the later comparison invalid.
      
      Instead, let's use cr1 to store this information. That way we don't
      need yet another register and everyone's happy.
      
      This fixes PR KVM on POWER7 bare metal for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32c7dbfd
    • A
      KVM: PPC: Fix PR KVM on POWER7 bare metal · 56e13dba
      Alexander Graf 提交于
      When running on a system that is HV capable, some interrupts use HSRR
      SPRs instead of the normal SRR SPRs. These are also used in the Linux
      handlers to jump back to code after an interrupt got processed.
      
      Unfortunately, in our "jump back to the real host handler after we've
      done the context switch" code, we were only setting the SRR SPRs,
      rendering Linux to jump back to some invalid IP after it's processed
      the interrupt.
      
      This fixes random crashes on p7 opal mode with PR KVM for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      56e13dba
    • A
      KVM: PPC: Book3S: PR: Handle EMUL_ASSIST · 7ef4e985
      Alexander Graf 提交于
      In addition to normal "priviledged instruction" traps, we can also receive
      "emulation assist" traps on newer hardware that has the HV bit set.
      
      Handle that one the same way as a privileged instruction, including the
      instruction fetching. That way we don't execute old instructions that we
      happen to still leave in that field when an emul assist trap comes.
      
      This fixes -M mac99 / -M g3beige on p7 bare metal for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7ef4e985
  4. 08 5月, 2012 1 次提交
    • D
      KVM: PPC: Book3S HV: Fix refcounting of hugepages · de6c0b02
      David Gibson 提交于
      The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
      guest memory pages into host memory so that they can be safely accessed
      from usermode.  It does this used get_user_pages_fast().  When the VPA is
      unregistered, or the VCPUs are cleaned up, these pages are released using
      put_page().
      
      However, the get_user_pages() is invoked on the specific memory are of the
      VPA which could lie within hugepages.  In case the pinned page is huge,
      we explicitly find the head page of the compound page before calling
      put_page() on it.
      
      At least with the latest kernel, this is not correct.  put_page() already
      handles finding the correct head page of a compound, and also deals with
      various counts on the individual tail page which are important for
      transparent huge pages.  We don't support transparent hugepages on Power,
      but even so, bypassing this count maintenance can lead (when the VM ends)
      to a hugepage being released back to the pool with a non-zero mapcount on
      one of the tail pages.  This can then lead to a bad_page() when the page
      is released from the hugepage pool.
      
      This removes the explicit compound_head() call to correct this bug.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      de6c0b02
  5. 06 5月, 2012 15 次提交
  6. 08 4月, 2012 11 次提交
    • B
      powerpc/kvm: Fix magic page vs. 32-bit RTAS on ppc64 · bbcc9c06
      Benjamin Herrenschmidt 提交于
      When the kernel calls into RTAS, it switches to 32-bit mode. The
      magic page was is longer accessible in that case, causing the
      patched instructions in the RTAS call wrapper to crash.
      
      This fixes it by making available a 32-bit mapping of the magic
      page in that case. This mapping is flushed whenever we switch
      the kernel back to 64-bit mode.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [agraf: add a check if the magic page is mapped]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      bbcc9c06
    • A
      KVM: PPC: Ignore unhalt request from kvm_vcpu_block · 966cd0f3
      Alexander Graf 提交于
      When running kvm_vcpu_block and it realizes that the CPU is actually good
      to run, we get a request bit set for KVM_REQ_UNHALT. Right now, there's
      nothing we can do with that bit, so let's unset it right after the call
      again so we don't get confused in our later checks for pending work.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      966cd0f3
    • A
      KVM: PPC: Book3s: PR: Add HV traps so we can run in HV=1 mode on p7 · 4f225ae0
      Alexander Graf 提交于
      When running PR KVM on a p7 system in bare metal, we get HV exits instead
      of normal supervisor traps. Semantically they are identical though and the
      HSRR vs SRR difference is already taken care of in the exit code.
      
      So all we need to do is handle them in addition to our normal exits.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4f225ae0
    • A
      KVM: PPC: Emulate tw and td instructions · 6df79df5
      Alexander Graf 提交于
      There are 4 conditional trapping instructions: tw, twi, td, tdi. The
      ones with an i take an immediate comparison, the others compare two
      registers. All of them arrive in the emulator when the condition to
      trap was successfully fulfilled.
      
      Unfortunately, we were only implementing the i versions so far, so
      let's also add support for the other two.
      
      This fixes kernel booting with recents book3s_32 guest kernels.
      Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6df79df5
    • A
      KVM: PPC: Pass EA to updating emulation ops · 6020c0f6
      Alexander Graf 提交于
      When emulating updating load/store instructions (lwzu, stwu, ...) we need to
      write the effective address of the load/store into a register.
      
      Currently, we write the physical address in there, which is very wrong. So
      instead let's save off where the virtual fault was on MMIO and use that
      information as value to put into the register.
      
      While at it, also move the XOP variants of the above instructions to the new
      scheme of using the already known vaddr instead of calculating it themselves.
      Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6020c0f6
    • P
      KVM: PPC: Work around POWER7 DABR corruption problem · 8943633c
      Paul Mackerras 提交于
      It turns out that on POWER7, writing to the DABR can cause a corrupted
      value to be written if the PMU is active and updating SDAR in continuous
      sampling mode.  To work around this, we make sure that the PMU is inactive
      and SDAR updates are disabled (via MMCRA) when we are context-switching
      DABR.
      
      When the guest sets DABR via the H_SET_DABR hypercall, we use a slightly
      different workaround, which is to read back the DABR and write it again
      if it got corrupted.
      
      While we are at it, make it consistent that the saving and restoring
      of the guest's non-volatile GPRs and the FPRs are done with the guest
      setup of the PMU active.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8943633c
    • B
      Restore guest CR after exit timing calculation · c0fe7b09
      Bharat Bhushan 提交于
      No instruction which can change Condition Register (CR) should be executed after
      Guest CR is loaded. So the guest CR is restored after the Exit Timing in
      lightweight_exit executes cmpw, which can clobber CR.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c0fe7b09
    • P
      KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace log · 0456ec4f
      Paul Mackerras 提交于
      This adds code to measure "stolen" time per virtual core in units of
      timebase ticks, and to report the stolen time to the guest using the
      dispatch trace log (DTL).  The guest can register an area of memory
      for the DTL for a given vcpu.  The DTL is a ring buffer where KVM
      fills in one entry every time it enters the guest for that vcpu.
      
      Stolen time is measured as time when the virtual core is not running,
      either because the vcore is not runnable (e.g. some of its vcpus are
      executing elsewhere in the kernel or in userspace), or when the vcpu
      thread that is running the vcore is preempted.  This includes time
      when all the vcpus are idle (i.e. have executed the H_CEDE hypercall),
      which is OK because the guest accounts stolen time while idle as idle
      time.
      
      Each vcpu keeps a record of how much stolen time has been reported to
      the guest for that vcpu so far.  When we are about to enter the guest,
      we create a new DTL entry (if the guest vcpu has a DTL) and report the
      difference between total stolen time for the vcore and stolen time
      reported so far for the vcpu as the "enqueue to dispatch" time in the
      DTL entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0456ec4f
    • P
      KVM: PPC: Book3S HV: Make virtual processor area registration more robust · 2e25aa5f
      Paul Mackerras 提交于
      The PAPR API allows three sorts of per-virtual-processor areas to be
      registered (VPA, SLB shadow buffer, and dispatch trace log), and
      furthermore, these can be registered and unregistered for another
      virtual CPU.  Currently we just update the vcpu fields pointing to
      these areas at the time of registration or unregistration.  If this
      is done on another vcpu, there is the possibility that the target vcpu
      is using those fields at the time and could end up using a bogus
      pointer and corrupting memory.
      
      This fixes the race by making the target cpu itself do the update, so
      we can be sure that the update happens at a time when the fields
      aren't being used.  Each area now has a struct kvmppc_vpa which is
      used to manage these updates.  There is also a spinlock which protects
      access to all of the kvmppc_vpa structs, other than to the pinned_addr
      fields.  (We could have just taken the spinlock when using the vpa,
      slb_shadow or dtl fields, but that would mean taking the spinlock on
      every guest entry and exit.)
      
      This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry',
      which is what the rest of the kernel uses.
      
      Thanks to Michael Ellerman <michael@ellerman.id.au> for pointing out
      the need to initialize vcpu->arch.vpa_update_lock.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2e25aa5f
    • P
      KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs · f0888f70
      Paul Mackerras 提交于
      Currently on POWER7, if we are running the guest on a core and we don't
      need all the hardware threads, we do nothing to ensure that the unused
      threads aren't executing in the kernel (other than checking that they
      are offline).  We just assume they're napping and we don't do anything
      to stop them trying to enter the kernel while the guest is running.
      This means that a stray IPI can wake up the hardware thread and it will
      then try to enter the kernel, but since the core is in guest context,
      it will execute code from the guest in hypervisor mode once it turns the
      MMU on, which tends to lead to crashes or hangs in the host.
      
      This fixes the problem by adding two new one-byte flags in the
      kvmppc_host_state structure in the PACA which are used to interlock
      between the primary thread and the unused secondary threads when entering
      the guest.  With these flags, the primary thread can ensure that the
      unused secondaries are not already in kernel mode (i.e. handling a stray
      IPI) and then indicate that they should not try to enter the kernel
      if they do get woken for any reason.  Instead they will go into KVM code,
      find that there is no vcpu to run, acknowledge and clear the IPI and go
      back to nap mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f0888f70
    • A
      KVM: PPC: Save/Restore CR over vcpu_run · f6127716
      Alexander Graf 提交于
      On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls.
      We didn't respect that for any architecture until Paul spotted it in his
      patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f6127716