1. 12 2月, 2013 1 次提交
  2. 24 1月, 2013 3 次提交
  3. 10 1月, 2013 1 次提交
    • A
      KVM: PPC: BookE: Implement EPR exit · 1c810636
      Alexander Graf 提交于
      The External Proxy Facility in FSL BookE chips allows the interrupt
      controller to automatically acknowledge an interrupt as soon as a
      core gets its pending external interrupt delivered.
      
      Today, user space implements the interrupt controller, so we need to
      check on it during such a cycle.
      
      This patch implements logic for user space to enable EPR exiting,
      disable EPR exiting and EPR exiting itself, so that user space can
      acknowledge an interrupt when an external interrupt has successfully
      been delivered into the guest vcpu.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1c810636
  4. 08 1月, 2013 3 次提交
  5. 06 12月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT · a2932923
      Paul Mackerras 提交于
      A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
      this fd return the contents of the HPT (hashed page table), writes
      create and/or remove entries in the HPT.  There is a new capability,
      KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
      takes an argument structure with the index of the first HPT entry to
      read out and a set of flags.  The flags indicate whether the user is
      intending to read or write the HPT, and whether to return all entries
      or only the "bolted" entries (those with the bolted bit, 0x10, set in
      the first doubleword).
      
      This is intended for use in implementing qemu's savevm/loadvm and for
      live migration.  Therefore, on reads, the first pass returns information
      about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
      end of the HPT, it returns from the read.  Subsequent reads only return
      information about HPTEs that have changed since they were last read.
      A read that finds no changed HPTEs in the HPT following where the last
      read finished will return 0 bytes.
      
      The format of the data provides a simple run-length compression of the
      invalid entries.  Each block of data starts with a header that indicates
      the index (position in the HPT, which is just an array), the number of
      valid entries starting at that index (may be zero), and the number of
      invalid entries following those valid entries.  The valid entries, 16
      bytes each, follow the header.  The invalid entries are not explicitly
      represented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2932923
  6. 13 10月, 2012 1 次提交
  7. 06 10月, 2012 3 次提交
  8. 23 9月, 2012 1 次提交
    • A
      KVM: Add resampling irqfds for level triggered interrupts · 7a84428a
      Alex Williamson 提交于
      To emulate level triggered interrupts, add a resample option to
      KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
      the user when the irqchip has been resampled by the VM.  This may, for
      instance, indicate an EOI.  Also in this mode, posting of an interrupt
      through an irqfd only asserts the interrupt.  On resampling, the
      interrupt is automatically de-asserted prior to user notification.
      This enables level triggered interrupts to be posted and re-enabled
      from vfio with no userspace intervention.
      
      All resampling irqfds can make use of a single irq source ID, so we
      reserve a new one for this interface.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7a84428a
  9. 22 8月, 2012 2 次提交
  10. 30 5月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
  11. 18 5月, 2012 1 次提交
  12. 06 5月, 2012 1 次提交
  13. 24 4月, 2012 1 次提交
  14. 08 4月, 2012 1 次提交
  15. 08 3月, 2012 1 次提交
  16. 05 3月, 2012 10 次提交
  17. 26 12月, 2011 1 次提交
    • J
      KVM: Don't automatically expose the TSC deadline timer in cpuid · 4d25a066
      Jan Kiszka 提交于
      Unlike all of the other cpuid bits, the TSC deadline timer bit is set
      unconditionally, regardless of what userspace wants.
      
      This is broken in several ways:
       - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
         deadline timer feature, a guest that uses the feature will break
       - live migration to older host kernels that don't support the TSC deadline
         timer will cause the feature to be pulled from under the guest's feet;
         breaking it
       - guests that are broken wrt the feature will fail.
      
      Fix by not enabling the feature automatically; instead report it to userspace.
      Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
      will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
      KVM_GET_SUPPORTED_CPUID.
      
      Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.
      
      [avi: add the KVM_CAP + documentation]
      Reported-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4d25a066
  18. 17 11月, 2011 1 次提交
  19. 30 10月, 2011 1 次提交
  20. 26 9月, 2011 3 次提交
    • A
      KVM: PPC: Enable the PAPR CAP for Book3S · 930b412a
      Alexander Graf 提交于
      Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and
      enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR
      mode.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      930b412a
    • A
      KVM: PPC: Add support for explicit HIOR setting · a15bd354
      Alexander Graf 提交于
      Until now, we always set HIOR based on the PVR, but this is just wrong.
      Instead, we should be setting HIOR explicitly, so user space can decide
      what the initial HIOR value is - just like on real hardware.
      
      We keep the old PVR based way around for backwards compatibility, but
      once user space uses the SREGS based method, we drop the PVR logic.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a15bd354
    • S
      KVM: x86: Raise the hard VCPU count limit · 8c3ba334
      Sasha Levin 提交于
      The patch raises the hard limit of VCPU count to 254.
      
      This will allow developers to easily work on scalability
      and will allow users to test high VCPU setups easily without
      patching the kernel.
      
      To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
      now returns the recommended VCPU limit (which is still 64) - this
      should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
      returns the hard limit which is now 254.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Suggested-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      8c3ba334
  21. 20 9月, 2011 1 次提交
  22. 12 7月, 2011 1 次提交
    • P
      KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests · aa04b4cc
      Paul Mackerras 提交于
      This adds infrastructure which will be needed to allow book3s_hv KVM to
      run on older POWER processors, including PPC970, which don't support
      the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
      Offset (RMO) facility.  These processors require a physically
      contiguous, aligned area of memory for each guest.  When the guest does
      an access in real mode (MMU off), the address is compared against a
      limit value, and if it is lower, the address is ORed with an offset
      value (from the Real Mode Offset Register (RMOR)) and the result becomes
      the real address for the access.  The size of the RMA has to be one of
      a set of supported values, which usually includes 64MB, 128MB, 256MB
      and some larger powers of 2.
      
      Since we are unlikely to be able to allocate 64MB or more of physically
      contiguous memory after the kernel has been running for a while, we
      allocate a pool of RMAs at boot time using the bootmem allocator.  The
      size and number of the RMAs can be set using the kvm_rma_size=xx and
      kvm_rma_count=xx kernel command line options.
      
      KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
      of the pool of preallocated RMAs.  The capability value is 1 if the
      processor can use an RMA but doesn't require one (because it supports
      the VRMA facility), or 2 if the processor requires an RMA for each guest.
      
      This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
      pool and returns a file descriptor which can be used to map the RMA.  It
      also returns the size of the RMA in the argument structure.
      
      Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
      ioctl calls from userspace.  To cope with this, we now preallocate the
      kvm->arch.ram_pginfo array when the VM is created with a size sufficient
      for up to 64GB of guest memory.  Subsequently we will get rid of this
      array and use memory associated with each memslot instead.
      
      This moves most of the code that translates the user addresses into
      host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
      to kvmppc_core_prepare_memory_region.  Also, instead of having to look
      up the VMA for each page in order to check the page size, we now check
      that the pages we get are compound pages of 16MB.  However, if we are
      adding memory that is mapped to an RMA, we don't bother with calling
      get_user_pages_fast and instead just offset from the base pfn for the
      RMA.
      
      Typically the RMA gets added after vcpus are created, which makes it
      inconvenient to have the LPCR (logical partition control register) value
      in the vcpu->arch struct, since the LPCR controls whether the processor
      uses RMA or VRMA for the guest.  This moves the LPCR value into the
      kvm->arch struct and arranges for the MER (mediated external request)
      bit, which is the only bit that varies between vcpus, to be set in
      assembly code when going into the guest if there is a pending external
      interrupt request.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      aa04b4cc