1. 21 4月, 2015 3 次提交
    • P
      KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C · eddb60fb
      Paul Mackerras 提交于
      This replaces the assembler code for kvmhv_commence_exit() with C code
      in book3s_hv_builtin.c.  It also moves the IPI sending code that was
      in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
      can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      eddb60fb
    • P
      KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da
      Paul Mackerras 提交于
      Currently, the entry_exit_count field in the kvmppc_vcore struct
      contains two 8-bit counts, one of the threads that have started entering
      the guest, and one of the threads that have started exiting the guest.
      This changes it to an entry_exit_map field which contains two bitmaps
      of 8 bits each.  The advantage of doing this is that it gives us a
      bitmap of which threads need to be signalled when exiting the guest.
      That means that we no longer need to use the trick of setting the
      HDEC to 0 to pull the other threads out of the guest, which led in
      some cases to a spurious HDEC interrupt on the next guest entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7d6c40da
    • M
      KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. · e928e9cb
      Michael Ellerman 提交于
      Some PowerNV systems include a hardware random-number generator.
      This HWRNG is present on POWER7+ and POWER8 chips and is capable of
      generating one 64-bit random number every microsecond.  The random
      numbers are produced by sampling a set of 64 unstable high-frequency
      oscillators and are almost completely entropic.
      
      PAPR defines an H_RANDOM hypercall which guests can use to obtain one
      64-bit random sample from the HWRNG.  This adds a real-mode
      implementation of the H_RANDOM hypercall.  This hypercall was
      implemented in real mode because the latency of reading the HWRNG is
      generally small compared to the latency of a guest exit and entry for
      all the threads in the same virtual core.
      
      Userspace can detect the presence of the HWRNG and the H_RANDOM
      implementation by querying the KVM_CAP_PPC_HWRNG capability.  The
      H_RANDOM hypercall implementation will only be invoked when the guest
      does an H_RANDOM hypercall if userspace first enables the in-kernel
      H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e928e9cb
  2. 17 12月, 2014 2 次提交
    • S
      KVM: PPC: Book3S HV: Improve H_CONFER implementation · 90fd09f8
      Sam Bobroff 提交于
      Currently the H_CONFER hcall is implemented in kernel virtual mode,
      meaning that whenever a guest thread does an H_CONFER, all the threads
      in that virtual core have to exit the guest.  This is bad for
      performance because it interrupts the other threads even if they
      are doing useful work.
      
      The H_CONFER hcall is called by a guest VCPU when it is spinning on a
      spinlock and it detects that the spinlock is held by a guest VCPU that
      is currently not running on a physical CPU.  The idea is to give this
      VCPU's time slice to the holder VCPU so that it can make progress
      towards releasing the lock.
      
      To avoid having the other threads exit the guest unnecessarily,
      we add a real-mode implementation of H_CONFER that checks whether
      the other threads are doing anything.  If all the other threads
      are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
      it returns H_TOO_HARD which causes a guest exit and allows the
      H_CONFER to be handled in virtual mode.
      
      Otherwise it spins for a short time (up to 10 microseconds) to give
      other threads the chance to observe that this thread is trying to
      confer.  The spin loop also terminates when any thread exits the guest
      or when all other threads are idle or trying to confer.  If the
      timeout is reached, the H_CONFER returns H_SUCCESS.  In this case the
      guest VCPU will recheck the spinlock word and most likely call
      H_CONFER again.
      
      This also improves the implementation of the H_CONFER virtual mode
      handler.  If the VCPU is part of a virtual core (vcore) which is
      runnable, there will be a 'runner' VCPU which has taken responsibility
      for running the vcore.  In this case we yield to the runner VCPU
      rather than the target VCPU.
      
      We also introduce a check on the target VCPU's yield count: if it
      differs from the yield count passed to H_CONFER, the target VCPU
      has run since H_CONFER was called and may have already released
      the lock.  This check is required by PAPR.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      90fd09f8
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  3. 10 11月, 2014 2 次提交
  4. 29 9月, 2014 1 次提交
  5. 19 8月, 2014 1 次提交
  6. 07 8月, 2014 2 次提交
  7. 28 7月, 2014 1 次提交
  8. 28 5月, 2014 1 次提交
  9. 08 7月, 2013 2 次提交
  10. 06 10月, 2012 1 次提交
  11. 30 5月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
  12. 03 4月, 2012 1 次提交
  13. 05 3月, 2012 3 次提交
  14. 27 12月, 2011 1 次提交
  15. 01 11月, 2011 1 次提交
  16. 12 7月, 2011 3 次提交
    • P
      KVM: PPC: book3s_hv: Add support for PPC970-family processors · 9e368f29
      Paul Mackerras 提交于
      This adds support for running KVM guests in supervisor mode on those
      PPC970 processors that have a usable hypervisor mode.  Unfortunately,
      Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
      1), but the YDL PowerStation does have a usable hypervisor mode.
      
      There are several differences between the PPC970 and POWER7 in how
      guests are managed.  These differences are accommodated using the
      CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
      bits.  Notably, on PPC970:
      
      * The LPCR, LPID or RMOR registers don't exist, and the functions of
        those registers are provided by bits in HID4 and one bit in HID0.
      
      * External interrupts can be directed to the hypervisor, but unlike
        POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
        SRR0/1 not HSRR0/1.
      
      * There is no virtual RMA (VRMA) mode; the guest must use an RMO
        (real mode offset) area.
      
      * The TLB entries are not tagged with the LPID, so it is necessary to
        flush the whole TLB on partition switch.  Furthermore, when switching
        partitions we have to ensure that no other CPU is executing the tlbie
        or tlbsync instructions in either the old or the new partition,
        otherwise undefined behaviour can occur.
      
      * The PMU has 8 counters (PMC registers) rather than 6.
      
      * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.
      
      * The SLB has 64 entries rather than 32.
      
      * There is no mediated external interrupt facility, so if we switch to
        a guest that has a virtual external interrupt pending but the guest
        has MSR[EE] = 0, we have to arrange to have an interrupt pending for
        it so that we can get control back once it re-enables interrupts.  We
        do that by sending ourselves an IPI with smp_send_reschedule after
        hard-disabling interrupts.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9e368f29
    • P
      powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits · 969391c5
      Paul Mackerras 提交于
      This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
      indicate that we have a usable hypervisor mode, and another to indicate
      that the processor conforms to PowerISA version 2.06.  We also add
      another bit to indicate that the processor conforms to ISA version 2.01
      and set that for PPC970 and derivatives.
      
      Some PPC970 chips (specifically those in Apple machines) have a
      hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
      is not useful in the sense that there is no way to run any code in
      supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
      bits in HID4 are always 0, and we use that as a way of detecting that
      hypervisor mode is not useful.
      
      Where we have a feature section in assembly code around code that
      only applies on POWER7 in hypervisor mode, we use a construct like
      
      END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
      
      The definition of END_FTR_SECTION_IFSET is such that the code will
      be enabled (not overwritten with nops) only if all bits in the
      provided mask are set.
      
      Note that the CPU feature check in __tlbie() only needs to check the
      ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
      if we are running bare-metal, i.e. in hypervisor mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      969391c5
    • P
      KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests · aa04b4cc
      Paul Mackerras 提交于
      This adds infrastructure which will be needed to allow book3s_hv KVM to
      run on older POWER processors, including PPC970, which don't support
      the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
      Offset (RMO) facility.  These processors require a physically
      contiguous, aligned area of memory for each guest.  When the guest does
      an access in real mode (MMU off), the address is compared against a
      limit value, and if it is lower, the address is ORed with an offset
      value (from the Real Mode Offset Register (RMOR)) and the result becomes
      the real address for the access.  The size of the RMA has to be one of
      a set of supported values, which usually includes 64MB, 128MB, 256MB
      and some larger powers of 2.
      
      Since we are unlikely to be able to allocate 64MB or more of physically
      contiguous memory after the kernel has been running for a while, we
      allocate a pool of RMAs at boot time using the bootmem allocator.  The
      size and number of the RMAs can be set using the kvm_rma_size=xx and
      kvm_rma_count=xx kernel command line options.
      
      KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
      of the pool of preallocated RMAs.  The capability value is 1 if the
      processor can use an RMA but doesn't require one (because it supports
      the VRMA facility), or 2 if the processor requires an RMA for each guest.
      
      This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
      pool and returns a file descriptor which can be used to map the RMA.  It
      also returns the size of the RMA in the argument structure.
      
      Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
      ioctl calls from userspace.  To cope with this, we now preallocate the
      kvm->arch.ram_pginfo array when the VM is created with a size sufficient
      for up to 64GB of guest memory.  Subsequently we will get rid of this
      array and use memory associated with each memslot instead.
      
      This moves most of the code that translates the user addresses into
      host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
      to kvmppc_core_prepare_memory_region.  Also, instead of having to look
      up the VMA for each page in order to check the page size, we now check
      that the pages we get are compound pages of 16MB.  However, if we are
      adding memory that is mapped to an RMA, we don't bother with calling
      get_user_pages_fast and instead just offset from the base pfn for the
      RMA.
      
      Typically the RMA gets added after vcpus are created, which makes it
      inconvenient to have the LPCR (logical partition control register) value
      in the vcpu->arch struct, since the LPCR controls whether the processor
      uses RMA or VRMA for the guest.  This moves the LPCR value into the
      kvm->arch struct and arranges for the MER (mediated external request)
      bit, which is the only bit that varies between vcpus, to be set in
      assembly code when going into the guest if there is a pending external
      interrupt request.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      aa04b4cc