1. 01 6月, 2020 3 次提交
  2. 16 5月, 2020 1 次提交
    • K
      KVM: arm64: Support enabling dirty log gradually in small chunks · c862626e
      Keqian Zhu 提交于
      There is already support of enabling dirty log gradually in small chunks
      for x86 in commit 3c9bd400 ("KVM: x86: enable dirty log gradually in
      small chunks"). This adds support for arm64.
      
      x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET
      is enabled. However, for arm64, both huge pages and normal pages can be
      write protected gradually by userspace.
      
      Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G
      Linux VMs with different page size. The memory pressure is 127G in each
      case. The time taken of memory_global_dirty_log_start in QEMU is listed
      below:
      
      Page Size      Before    After Optimization
        4K            650ms         1.8ms
        2M             4ms          1.8ms
        1G             2ms          1.8ms
      
      Besides the time reduction, the biggest improvement is that we will minimize
      the performance side effect (because of dissolving huge pages and marking
      memslots dirty) on guest after enabling dirty log.
      Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200413122023.52583-1-zhukeqian1@huawei.com
      c862626e
  3. 05 5月, 2020 1 次提交
  4. 25 4月, 2020 1 次提交
  5. 27 3月, 2020 1 次提交
  6. 26 3月, 2020 1 次提交
    • P
      KVM: PPC: Book3S HV: Add a capability for enabling secure guests · 9a5788c6
      Paul Mackerras 提交于
      At present, on Power systems with Protected Execution Facility
      hardware and an ultravisor, a KVM guest can transition to being a
      secure guest at will.  Userspace (QEMU) has no way of knowing
      whether a host system is capable of running secure guests.  This
      will present a problem in future when the ultravisor is capable of
      migrating secure guests from one host to another, because
      virtualization management software will have no way to ensure that
      secure guests only run in domains where all of the hosts can
      support secure guests.
      
      This adds a VM capability which has two functions: (a) userspace
      can query it to find out whether the host can support secure guests,
      and (b) userspace can enable it for a guest, which allows that
      guest to become a secure guest.  If userspace does not enable it,
      KVM will return an error when the ultravisor does the hypercall
      that indicates that the guest is starting to transition to a
      secure guest.  The ultravisor will then abort the transition and
      the guest will terminate.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NRam Pai <linuxram@us.ibm.com>
      9a5788c6
  7. 17 3月, 2020 2 次提交
    • S
      KVM: x86: Remove stateful CPUID handling · 7ff6c035
      Sean Christopherson 提交于
      Remove the code for handling stateful CPUID 0x2 and mark the associated
      flags as deprecated.  WARN if host CPUID 0x2.0.AL > 1, i.e. if by some
      miracle a host with stateful CPUID 0x2 is encountered.
      
      No known CPU exists that supports hardware accelerated virtualization
      _and_ a stateful CPUID 0x2.  Barring an extremely contrived nested
      virtualization scenario, stateful CPUID support is dead code.
      Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ff6c035
    • J
      KVM: x86: enable dirty log gradually in small chunks · 3c9bd400
      Jay Zhou 提交于
      It could take kvm->mmu_lock for an extended period of time when
      enabling dirty log for the first time. The main cost is to clear
      all the D-bits of last level SPTEs. This situation can benefit from
      manual dirty log protect as well, which can reduce the mmu_lock
      time taken. The sequence is like this:
      
      1. Initialize all the bits of the dirty bitmap to 1 when enabling
         dirty log for the first time
      2. Only write protect the huge pages
      3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
      4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
         SPTEs gradually in small chunks
      
      Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
      I did some tests with a 128G windows VM and counted the time taken
      of memory_global_dirty_log_start, here is the numbers:
      
      VM Size        Before    After optimization
      128G           460ms     10ms
      Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3c9bd400
  8. 28 2月, 2020 2 次提交
  9. 25 2月, 2020 1 次提交
  10. 13 2月, 2020 1 次提交
  11. 31 1月, 2020 1 次提交
  12. 23 1月, 2020 1 次提交
  13. 30 11月, 2019 1 次提交
  14. 28 11月, 2019 1 次提交
  15. 22 10月, 2019 2 次提交
    • C
      KVM: arm/arm64: Allow user injection of external data aborts · da345174
      Christoffer Dall 提交于
      In some scenarios, such as buggy guest or incorrect configuration of the
      VMM and firmware description data, userspace will detect a memory access
      to a portion of the IPA, which is not mapped to any MMIO region.
      
      For this purpose, the appropriate action is to inject an external abort
      to the guest.  The kernel already has functionality to inject an
      external abort, but we need to wire up a signal from user space that
      lets user space tell the kernel to do this.
      
      It turns out, we already have the set event functionality which we can
      perfectly reuse for this.
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      da345174
    • C
      KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d
      Christoffer Dall 提交于
      For a long time, if a guest accessed memory outside of a memslot using
      any of the load/store instructions in the architecture which doesn't
      supply decoding information in the ESR_EL2 (the ISV bit is not set), the
      kernel would print the following message and terminate the VM as a
      result of returning -ENOSYS to userspace:
      
        load/store instruction decoding not implemented
      
      The reason behind this message is that KVM assumes that all accesses
      outside a memslot is an MMIO access which should be handled by
      userspace, and we originally expected to eventually implement some sort
      of decoding of load/store instructions where the ISV bit was not set.
      
      However, it turns out that many of the instructions which don't provide
      decoding information on abort are not safe to use for MMIO accesses, and
      the remaining few that would potentially make sense to use on MMIO
      accesses, such as those with register writeback, are not used in
      practice.  It also turns out that fetching an instruction from guest
      memory can be a pretty horrible affair, involving stopping all CPUs on
      SMP systems, handling multiple corner cases of address translation in
      software, and more.  It doesn't appear likely that we'll ever implement
      this in the kernel.
      
      What is much more common is that a user has misconfigured his/her guest
      and is actually not accessing an MMIO region, but just hitting some
      random hole in the IPA space.  In this scenario, the error message above
      is almost misleading and has led to a great deal of confusion over the
      years.
      
      It is, nevertheless, ABI to userspace, and we therefore need to
      introduce a new capability that userspace explicitly enables to change
      behavior.
      
      This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
      which does exactly that, and introduces a new exit reason to report the
      event to userspace.  User space can then emulate an exception to the
      guest, restart the guest, suspend the guest, or take any other
      appropriate action as per the policy of the running system.
      Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      c726200d
  16. 21 10月, 2019 1 次提交
    • F
      KVM: PPC: Report single stepping capability · 1a9167a2
      Fabiano Rosas 提交于
      When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
      the next instruction to be single stepped via the
      KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
      
      This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
      to inform userspace about the state of single stepping support.
      
      We currently don't have support for guest single stepping implemented
      in Book3S HV so the capability is only present for Book3S PR and
      BookE.
      Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1a9167a2
  17. 24 9月, 2019 1 次提交
  18. 11 9月, 2019 1 次提交
  19. 09 9月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier 提交于
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      92f35b75
  20. 29 8月, 2019 1 次提交
  21. 24 7月, 2019 1 次提交
  22. 20 7月, 2019 1 次提交
  23. 11 7月, 2019 1 次提交
    • E
      KVM: x86: PMU Event Filter · 66bb8a06
      Eric Hankland 提交于
      Some events can provide a guest with information about other guests or the
      host (e.g. L3 cache stats); providing the capability to restrict access
      to a "safe" set of events would limit the potential for the PMU to be used
      in any side channel attacks. This change introduces a new VM ioctl that
      sets an event filter. If the guest attempts to program a counter for
      any blacklisted or non-whitelisted event, the kernel counter won't be
      created, so any RDPMC/RDMSR will show 0 instances of that event.
      Signed-off-by: NEric Hankland <ehankland@google.com>
      [Lots of changes. All remaining bugs are probably mine. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66bb8a06
  24. 19 6月, 2019 1 次提交
    • L
      KVM: x86: Modify struct kvm_nested_state to have explicit fields for data · 6ca00dfa
      Liran Alon 提交于
      Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format
      of VMX nested state data in a struct.
      
      In order to avoid changing the ioctl values of
      KVM_{GET,SET}_NESTED_STATE, there is a need to preserve
      sizeof(struct kvm_nested_state). This is done by defining the data
      struct as "data.vmx[0]". It was the most elegant way I found to
      preserve struct size while still keeping struct readable and easy to
      maintain. It does have a misfortunate side-effect that now it has to be
      accessed as "data.vmx[0]" rather than just "data.vmx".
      
      Because we are already modifying these structs, I also modified the
      following:
      * Define the "format" field values as macros.
      * Rename vmcs_pa to vmcs12_pa for better readability.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      [Remove SVM stubs, add KVM_STATE_NESTED_VMX_VMCS12_SIZE. - Paolo]
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ca00dfa
  25. 18 6月, 2019 1 次提交
  26. 15 6月, 2019 1 次提交
  27. 05 6月, 2019 2 次提交
  28. 08 5月, 2019 1 次提交
  29. 01 5月, 2019 3 次提交
  30. 30 4月, 2019 2 次提交
  31. 29 4月, 2019 1 次提交