1. 30 5月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
  2. 16 5月, 2012 5 次提交
    • P
      KVM: PPC: Book3S HV: Fix bug leading to deadlock in guest HPT updates · 51bfd299
      Paul Mackerras 提交于
      When handling the H_BULK_REMOVE hypercall, we were forgetting to
      invalidate and unlock the hashed page table entry (HPTE) in the case
      where the page had been paged out.  This fixes it by clearing the
      first doubleword of the HPTE in that case.
      
      This fixes a regression introduced in commit a92bce95 ("KVM: PPC:
      Book3S HV: Keep HPTE locked when invalidating").  The effect of the
      regression is that the host kernel will sometimes hang when under
      memory pressure.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      51bfd299
    • B
      powerpc/kvm: Fix VSID usage in 64-bit "PR" KVM · ffe36492
      Benjamin Herrenschmidt 提交于
      The code forgot to scramble the VSIDs the way we normally do
      and was basically using the "proto VSID" directly with the MMU.
      
      This means that in practice, KVM used random VSIDs that could
      collide with segments used by other user space programs.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [agraf: simplify ppc32 case]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ffe36492
    • A
      KVM: PPC: Book3S: PR: Fix hsrr code · 32c7dbfd
      Alexander Graf 提交于
      When jumping back into the kernel to code that knows that it would be
      using HSRR registers instead of SRR registers, we need to make sure we
      pass it all information on where to jump to in HSRR registers.
      
      Unfortunately, we used r10 to store the information to distinguish between
      the HSRR and SRR case. That register got clobbered in between though,
      rendering the later comparison invalid.
      
      Instead, let's use cr1 to store this information. That way we don't
      need yet another register and everyone's happy.
      
      This fixes PR KVM on POWER7 bare metal for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32c7dbfd
    • A
      KVM: PPC: Fix PR KVM on POWER7 bare metal · 56e13dba
      Alexander Graf 提交于
      When running on a system that is HV capable, some interrupts use HSRR
      SPRs instead of the normal SRR SPRs. These are also used in the Linux
      handlers to jump back to code after an interrupt got processed.
      
      Unfortunately, in our "jump back to the real host handler after we've
      done the context switch" code, we were only setting the SRR SPRs,
      rendering Linux to jump back to some invalid IP after it's processed
      the interrupt.
      
      This fixes random crashes on p7 opal mode with PR KVM for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      56e13dba
    • A
      KVM: PPC: Book3S: PR: Handle EMUL_ASSIST · 7ef4e985
      Alexander Graf 提交于
      In addition to normal "priviledged instruction" traps, we can also receive
      "emulation assist" traps on newer hardware that has the HV bit set.
      
      Handle that one the same way as a privileged instruction, including the
      instruction fetching. That way we don't execute old instructions that we
      happen to still leave in that field when an emul assist trap comes.
      
      This fixes -M mac99 / -M g3beige on p7 bare metal for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7ef4e985
  3. 08 5月, 2012 1 次提交
    • D
      KVM: PPC: Book3S HV: Fix refcounting of hugepages · de6c0b02
      David Gibson 提交于
      The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
      guest memory pages into host memory so that they can be safely accessed
      from usermode.  It does this used get_user_pages_fast().  When the VPA is
      unregistered, or the VCPUs are cleaned up, these pages are released using
      put_page().
      
      However, the get_user_pages() is invoked on the specific memory are of the
      VPA which could lie within hugepages.  In case the pinned page is huge,
      we explicitly find the head page of the compound page before calling
      put_page() on it.
      
      At least with the latest kernel, this is not correct.  put_page() already
      handles finding the correct head page of a compound, and also deals with
      various counts on the individual tail page which are important for
      transparent huge pages.  We don't support transparent hugepages on Power,
      but even so, bypassing this count maintenance can lead (when the VM ends)
      to a hugepage being released back to the pool with a non-zero mapcount on
      one of the tail pages.  This can then lead to a bad_page() when the page
      is released from the hugepage pool.
      
      This removes the explicit compound_head() call to correct this bug.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      de6c0b02
  4. 06 5月, 2012 15 次提交
  5. 08 4月, 2012 18 次提交