1. 30 11月, 2022 1 次提交
    • D
      KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4
      David Woodhouse 提交于
      Closer inspection of the Xen code shows that we aren't supposed to be
      using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
      explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
      If we randomly set the top bit of ->state_entry_time for a guest that
      hasn't asked for it and doesn't expect it, that could make the runtimes
      fail to add up and confuse the guest. Without the flag it's perfectly
      safe for a vCPU to read its own vcpu_runstate_info; just not for one
      vCPU to read *another's*.
      
      I briefly pondered adding a word for the whole set of VMASST_TYPE_*
      flags but the only one we care about for HVM guests is this, so it
      seemed a bit pointless.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d8ba8ba4
  2. 23 11月, 2022 2 次提交
    • C
      KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE · 8c516b25
      Claudio Imbrenda 提交于
      Add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE to signal that the
      KVM_PV_ASYNC_DISABLE and KVM_PV_ASYNC_DISABLE_PREPARE commands for the
      KVM_S390_PV_COMMAND ioctl are available.
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
      Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
      Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221111170632.77622-4-imbrenda@linux.ibm.com
      Message-Id: <20221111170632.77622-4-imbrenda@linux.ibm.com>
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      8c516b25
    • C
      KVM: s390: pv: asynchronous destroy for reboot · fb491d55
      Claudio Imbrenda 提交于
      Until now, destroying a protected guest was an entirely synchronous
      operation that could potentially take a very long time, depending on
      the size of the guest, due to the time needed to clean up the address
      space from protected pages.
      
      This patch implements an asynchronous destroy mechanism, that allows a
      protected guest to reboot significantly faster than previously.
      
      This is achieved by clearing the pages of the old guest in background.
      In case of reboot, the new guest will be able to run in the same
      address space almost immediately.
      
      The old protected guest is then only destroyed when all of its memory
      has been destroyed or otherwise made non protected.
      
      Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:
      
      KVM_PV_ASYNC_CLEANUP_PREPARE: set aside the current protected VM for
      later asynchronous teardown. The current KVM VM will then continue
      immediately as non-protected. If a protected VM had already been
      set aside for asynchronous teardown, but without starting the teardown
      process, this call will fail. There can be at most one VM set aside at
      any time. Once it is set aside, the protected VM only exists in the
      context of the Ultravisor, it is not associated with the KVM VM
      anymore. Its protected CPUs have already been destroyed, but not its
      memory. This command can be issued again immediately after starting
      KVM_PV_ASYNC_CLEANUP_PERFORM, without having to wait for completion.
      
      KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously
      set aside using KVM_PV_ASYNC_CLEANUP_PREPARE. Ideally the
      KVM_PV_ASYNC_CLEANUP_PERFORM PV command should be issued by userspace
      from a separate thread. If a fatal signal is received (or if the
      process terminates naturally), the command will terminate immediately
      without completing. All protected VMs whose teardown was interrupted
      will be put in the need_cleanup list. The rest of the normal KVM
      teardown process will take care of properly cleaning up all remaining
      protected VMs, including the ones on the need_cleanup list.
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
      Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221111170632.77622-2-imbrenda@linux.ibm.com
      Message-Id: <20221111170632.77622-2-imbrenda@linux.ibm.com>
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      fb491d55
  3. 10 11月, 2022 1 次提交
  4. 29 9月, 2022 1 次提交
  5. 29 7月, 2022 1 次提交
    • A
      RISC-V: KVM: Add extensible CSR emulation framework · 8a061562
      Anup Patel 提交于
      We add an extensible CSR emulation framework which is based upon the
      existing system instruction emulation. This will be useful to upcoming
      AIA, PMU, Nested and other virtualization features.
      
      The CSR emulation framework also has provision to emulate CSR in user
      space but this will be used only in very specific cases such as AIA
      IMSIC CSR emulation in user space or vendor specific CSR emulation
      in user space.
      
      By default, all CSRs not handled by KVM RISC-V will be redirected back
      to Guest VCPU as illegal instruction trap.
      Signed-off-by: NAnup Patel <apatel@ventanamicro.com>
      Signed-off-by: NAnup Patel <anup@brainfault.org>
      8a061562
  6. 20 7月, 2022 1 次提交
  7. 19 7月, 2022 1 次提交
  8. 14 7月, 2022 1 次提交
    • P
      kvm: stats: tell userspace which values are boolean · 1b870fa5
      Paolo Bonzini 提交于
      Some of the statistics values exported by KVM are always only 0 or 1.
      It can be useful to export this fact to userspace so that it can track
      them specially (for example by polling the value every now and then to
      compute a % of time spent in a specific state).
      
      Therefore, add "boolean value" as a new "unit".  While it is not exactly
      a unit, it walks and quacks like one.  In particular, using the type
      would be wrong because boolean values could be instantaneous or peak
      values (e.g. "is the rmap allocated?") or even two-bucket histograms
      (e.g. "number of posted vs. non-posted interrupt injections").
      Suggested-by: NAmneesh Singh <natto@weirdnatto.in>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1b870fa5
  9. 11 7月, 2022 1 次提交
  10. 29 6月, 2022 1 次提交
  11. 24 6月, 2022 1 次提交
    • B
      KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis · 084cc29f
      Ben Gardon 提交于
      In some cases, the NX hugepage mitigation for iTLB multihit is not
      needed for all guests on a host. Allow disabling the mitigation on a
      per-VM basis to avoid the performance hit of NX hugepages on trusted
      workloads.
      
      In order to disable NX hugepages on a VM, ensure that the userspace
      actor has permission to reboot the system. Since disabling NX hugepages
      would allow a guest to crash the system, it is similar to reboot
      permissions.
      
      Ideally, KVM would require userspace to prove it has access to KVM's
      nx_huge_pages module param, e.g. so that userspace can opt out without
      needing full reboot permissions.  But getting access to the module param
      file info is difficult because it is buried in layers of sysfs and module
      glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases.
      Suggested-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20220613212523.3436117-9-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      084cc29f
  12. 08 6月, 2022 2 次提交
    • T
      KVM: VMX: Enable Notify VM exit · 2f4073e0
      Tao Xu 提交于
      There are cases that malicious virtual machines can cause CPU stuck (due
      to event windows don't open up), e.g., infinite loop in microcode when
      nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
      IRQ) can be delivered. It leads the CPU to be unavailable to host or
      other VMs.
      
      VMM can enable notify VM exit that a VM exit generated if no event
      window occurs in VM non-root mode for a specified amount of time (notify
      window).
      
      Feature enabling:
      - The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
        enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
        the expected notify window.
      - Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
        can query and enable this feature in per-VM scope. The argument is a
        64bit value: bits 63:32 are used for notify window, and bits 31:0 are
        for flags. Current supported flags:
        - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
          window provided.
        - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
      - It's safe to even set notify window to zero since an internal hardware
        threshold is added to vmcs.notify_window.
      
      VM exit handling:
      - Introduce a vcpu state notify_window_exits to records the count of
        notify VM exits and expose it through the debugfs.
      - Notify VM exit can happen incident to delivery of a vector event.
        Allow it in KVM.
      - Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
        bit is set.
      
      Nested handling
      - Nested notify VM exits are not supported yet. Keep the same notify
        window control in vmcs02 as vmcs01, so that L1 can't escape the
        restriction of notify VM exits through launching L2 VM.
      
      Notify VM exit is defined in latest Intel Architecture Instruction Set
      Extensions Programming Reference, chapter 9.2.
      Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NTao Xu <tao3.xu@intel.com>
      Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2f4073e0
    • C
      KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault · ed235117
      Chenyi Qiang 提交于
      For the triple fault sythesized by KVM, e.g. the RSM path or
      nested_vmx_abort(), if KVM exits to userspace before the request is
      serviced, userspace could migrate the VM and lose the triple fault.
      
      Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
      new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
      restore the triple fault event. This extension is guarded by a new KVM
      capability KVM_CAP_TRIPLE_FAULT_EVENT.
      
      Note that in the set_vcpu_events path, userspace is able to set/clear
      the triple fault request through triple_fault.pending field.
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed235117
  13. 01 6月, 2022 5 次提交
  14. 04 5月, 2022 2 次提交
    • O
      KVM: arm64: Implement PSCI SYSTEM_SUSPEND · bfbab445
      Oliver Upton 提交于
      ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
      software to request that a system be placed in the deepest possible
      low-power state. Effectively, software can use this to suspend itself to
      RAM.
      
      Unfortunately, there really is no good way to implement a system-wide
      PSCI call in KVM. Any precondition checks done in the kernel will need
      to be repeated by userspace since there is no good way to protect a
      critical section that spans an exit to userspace. SYSTEM_RESET and
      SYSTEM_OFF are equally plagued by this issue, although no users have
      seemingly cared for the relatively long time these calls have been
      supported.
      
      The solution is to just make the whole implementation userspace's
      problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that
      indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND.
      Additionally, add a CAP to get buy-in from userspace for this new exit
      type.
      
      Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in.
      If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide
      explicit documentation of userspace's responsibilites for the exit and
      point to the PSCI specification to describe the actual PSCI call.
      Reviewed-by: NReiji Watanabe <reijiw@google.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
      bfbab445
    • O
      KVM: arm64: Add support for userspace to suspend a vCPU · 7b33a09d
      Oliver Upton 提交于
      Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
      is in a suspended state. In the suspended state the vCPU will block
      until a wakeup event (pending interrupt) is recognized.
      
      Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
      userspace that KVM has recognized one such wakeup event. It is the
      responsibility of userspace to then make the vCPU runnable, or leave it
      suspended until the next wakeup event.
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
      7b33a09d
  15. 30 4月, 2022 1 次提交
    • P
      KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT · d495f942
      Paolo Bonzini 提交于
      When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
      member that at the time was unused.  Unfortunately this extensibility
      mechanism has several issues:
      
      - x86 is not writing the member, so it would not be possible to use it
        on x86 except for new events
      
      - the member is not aligned to 64 bits, so the definition of the
        uAPI struct is incorrect for 32- on 64-bit userspace.  This is a
        problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
        usage of flags was only introduced in 5.18.
      
      Since padding has to be introduced, place a new field in there
      that tells if the flags field is valid.  To allow further extensibility,
      in fact, change flags to an array of 16 values, and store how many
      of the values are valid.  The availability of the new ndata field
      is tied to a system capability; all architectures are changed to
      fill in the field.
      
      To avoid breaking compilation of userspace that was using the flags
      field, provide a userspace-only union to overlap flags with data[0].
      The new field is placed at the same offset for both 32- and 64-bit
      userspace.
      
      Cc: Will Deacon <will@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d495f942
  16. 14 4月, 2022 1 次提交
  17. 02 4月, 2022 8 次提交
  18. 21 3月, 2022 1 次提交
    • O
      KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 · 6d849191
      Oliver Upton 提交于
      KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
      advertise the set of quirks which may be disabled to userspace, so it is
      impossible to predict the behavior of KVM. Worse yet,
      KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
      it fails to reject attempts to set invalid quirk bits.
      
      The only valid workaround for the quirky quirks API is to add a new CAP.
      Actually advertise the set of quirks that can be disabled to userspace
      so it can predict KVM's behavior. Reject values for cap->args[0] that
      contain invalid bits.
      
      Finally, add documentation for the new capability and describe the
      existing quirks.
      Signed-off-by: NOliver Upton <oupton@google.com>
      Message-Id: <20220301060351.442881-5-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6d849191
  19. 25 2月, 2022 1 次提交
  20. 22 2月, 2022 1 次提交
  21. 14 2月, 2022 4 次提交
  22. 31 1月, 2022 1 次提交
  23. 28 1月, 2022 1 次提交
    • P
      KVM: x86: add system attribute to retrieve full set of supported xsave states · dd6e6312
      Paolo Bonzini 提交于
      Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
      VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
      have not been enabled.  Probing those, for example so that they can be
      passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
      The latter is in fact worse, even though that is what the rest of the
      API uses, because it would require supported_xcr0 to be moved from the
      KVM module to the kernel just for this use.  In addition, the value
      would be nonsensical (or an error would have to be returned) until
      the KVM module is loaded in.
      
      Therefore, to limit the growth of system ioctls, add a /dev/kvm
      variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
      with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dd6e6312