1. 14 4月, 2022 2 次提交
    • S
      KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 · 45846661
      Sean Christopherson 提交于
      Remove WARNs that sanity check that KVM never lets a triple fault for L2
      escape and incorrectly end up in L1.  In normal operation, the sanity
      check is perfectly valid, but it incorrectly assumes that it's impossible
      for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through
      KVM_RUN (which guarantees kvm_check_nested_state() will see and handle
      the triple fault).
      
      The WARN can currently be triggered if userspace injects a machine check
      while L2 is active and CR4.MCE=0.  And a future fix to allow save/restore
      of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't
      lost on migration, will make it trivially easy for userspace to trigger
      the WARN.
      
      Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is
      tempting, but wrong, especially if/when the request is saved/restored,
      e.g. if userspace restores events (including a triple fault) and then
      restores nested state (which may forcibly leave guest mode).  Ignoring
      the fact that KVM doesn't currently provide the necessary APIs, it's
      userspace's responsibility to manage pending events during save/restore.
      
        ------------[ cut here ]------------
        WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Modules linked in: kvm_intel kvm irqbypass
        CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Call Trace:
         <TASK>
         vmx_leave_nested+0x30/0x40 [kvm_intel]
         vmx_set_nested_state+0xca/0x3e0 [kvm_intel]
         kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm]
         kvm_vcpu_ioctl+0x4b9/0x660 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        ---[ end trace 0000000000000000 ]---
      
      Fixes: cb6a32c2 ("KVM: x86: Handle triple fault in L2 without killing L1")
      Cc: stable@vger.kernel.org
      Cc: Chenyi Qiang <chenyi.qiang@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220407002315.78092-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45846661
    • L
      KVM: x86: Move .pmu_ops to kvm_x86_init_ops and tag as __initdata · 34886e79
      Like Xu 提交于
      The pmu_ops should be moved to kvm_x86_init_ops and tagged as __initdata.
      That'll save those precious few bytes, and more importantly make
      the original ops unreachable, i.e. make it harder to sneak in post-init
      modification bugs.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220329235054.3534728-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      34886e79
  2. 12 4月, 2022 1 次提交
  3. 05 4月, 2022 1 次提交
  4. 02 4月, 2022 21 次提交
  5. 21 3月, 2022 1 次提交
  6. 08 3月, 2022 1 次提交
    • S
      KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255 · 4a204f78
      Suravee Suthikulpanit 提交于
      Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined
      by the architecture.  The number of bits consumed by hardware is model
      specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM
      to enumerate the "true" size.  So, KVM must allow using all bits, else it
      risks rejecting completely legal x2APIC IDs on newer CPUs.
      
      This means KVM relies on hardware to not assign x2APIC IDs that exceed the
      "true" width of the field, but presumably hardware is smart enough to tie
      the width to the max x2APIC ID.  KVM also relies on hardware to support at
      least 8 bits, as the legacy xAPIC ID is writable by software.  But, those
      assumptions are unavoidable due to the lack of any way to enumerate the
      "true" width.
      
      Cc: stable@vger.kernel.org
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Message-Id: <20220211000851.185799-1-suravee.suthikulpanit@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a204f78
  7. 02 3月, 2022 1 次提交
  8. 01 3月, 2022 3 次提交
    • S
      KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors · aa9f5841
      Sean Christopherson 提交于
      Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
      copying data from guest (userspace) memory failed/faulted.  The OOM
      scenario is clearcut, it's userspace's decision as to whether it should
      terminate the guest, free memory, etc...
      
      As for -EFAULT, arguably, any guest issue is a violation of the guest's
      contract with userspace, and thus userspace needs to decide how to
      proceed.  E.g. userspace defines what is RAM vs. MMIO and communicates
      that directly to the guest, KVM is not involved in deciding what is/isn't
      RAM nor in communicating that information to the guest.  If the scratch
      GPA doesn't resolve to a memslot, then the guest is not honoring the
      memory configuration as defined by userspace.
      
      And if userspace unmaps an hva for whatever reason, then exiting to
      userspace with -EFAULT is absolutely the right thing to do.  KVM's ABI
      currently sucks and doesn't provide enough information to act on the
      -EFAULT, but that will hopefully be remedied in the future as there are
      multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
      require any work in KVM beyond returning -EFAULT with a small amount of
      metadata.
      
      KVM could define its ABI such that failure to access the scratch area is
      reflected into the guest, i.e. establish a contract with userspace, but
      that's undesirable as it limits KVM's options in the future, e.g. in the
      potential uffd case any failure on a uaccess needs to kick out to
      userspace.  KVM does have several cases where it reflects these errors
      into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
      KVM would preferably "fix" those instead of propagating the falsehood
      that any memory failure is the guest's fault.
      
      Lastly, returning a boolean as an "error" for that a helper that isn't
      named accordingly never works out well.
      
      Fixes: ad5b3532 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
      Cc: Alper Gun <alpergun@google.com>
      Cc: Peter Gonda <pgonda@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220225205209.3881130-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aa9f5841
    • S
      KVM: SVM: Don't rewrite guest ICR on AVIC IPI virtualization failure · b51818af
      Sean Christopherson 提交于
      Don't bother rewriting the ICR value into the vAPIC page on an AVIC IPI
      virtualization failure, the access is a trap, i.e. the value has already
      been written to the vAPIC page.  The one caveat is if hardware left the
      BUSY flag set (which appears to happen somewhat arbitrarily), in which
      case go through the "nodecode" APIC-write path in order to clear the BUSY
      flag.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b51818af
    • S
      KVM: SVM: Use common kvm_apic_write_nodecode() for AVIC write traps · ed60920e
      Sean Christopherson 提交于
      Use the common kvm_apic_write_nodecode() to handle AVIC/APIC-write traps
      instead of open coding the same exact code.  This will allow making the
      low level lapic helpers inaccessible outside of lapic.c code.
      
      Opportunistically clean up the params to eliminate a bunch of svm=>vcpu
      reflection.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed60920e
  9. 25 2月, 2022 3 次提交
    • P
      KVM: x86/mmu: load new PGD after the shadow MMU is initialized · 3cffc89d
      Paolo Bonzini 提交于
      Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
      shadow_root_level anymore, pull the PGD load after the initialization of
      the shadow MMUs.
      
      Besides being more intuitive, this enables future simplifications
      and optimizations because it's not necessary anymore to compute the
      role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
      attempting to use a cached PGD to avoid having to figure out the new role.
      With this change, it could follow what nested_{vmx,svm}_load_cr3 are doing,
      and avoid unloading all the cached roots.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3cffc89d
    • D
      KVM: x86: Provide per VM capability for disabling PMU virtualization · ba7bb663
      David Dunn 提交于
      Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of
      settings/features to allow userspace to configure PMU virtualization on
      a per-VM basis.  For now, support a single flag, KVM_PMU_CAP_DISABLE,
      to allow disabling PMU virtualization for a VM even when KVM is configured
      with enable_pmu=true a module level.
      
      To keep KVM simple, disallow changing VM's PMU configuration after vCPUs
      have been created.
      Signed-off-by: NDavid Dunn <daviddunn@google.com>
      Message-Id: <20220223225743.2703915-2-daviddunn@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ba7bb663
    • M
      KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non... · e910a53f
      Maxim Levitsky 提交于
      KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled
      
      If nested tsc scaling is disabled, MSR_AMD64_TSC_RATIO should
      never have non default value.
      
      Due to way nested tsc scaling support was implmented in qemu,
      it would set this msr to 0 when nested tsc scaling was disabled.
      Ignore that value for now, as it causes no harm.
      
      Fixes: 5228eb96 ("KVM: x86: nSVM: implement nested TSC scaling")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220223115649.319134-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e910a53f
  10. 19 2月, 2022 3 次提交
  11. 18 2月, 2022 1 次提交
    • P
      KVM: SEV: Allow SEV intra-host migration of VM with mirrors · b2125513
      Peter Gonda 提交于
      For SEV-ES VMs with mirrors to be intra-host migrated they need to be
      able to migrate with the mirror. This is due to that fact that all VMSAs
      need to be added into the VM with LAUNCH_UPDATE_VMSA before
      lAUNCH_FINISH. Allowing migration with mirrors allows users of SEV-ES to
      keep the mirror VMs VMSAs during migration.
      
      Adds a list of mirror VMs for the original VM iterate through during its
      migration. During the iteration the owner pointers can be updated from
      the source to the destination. This fixes the ASID leaking issue which
      caused the blocking of migration of VMs with mirrors.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20220211193634.3183388-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2125513
  12. 14 2月, 2022 1 次提交
  13. 12 2月, 2022 1 次提交
    • M
      KVM: SVM: fix race between interrupt delivery and AVIC inhibition · 66fa226c
      Maxim Levitsky 提交于
      If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
      inhibited, it might read a stale value of vcpu->arch.apicv_active
      which can lead to the target vCPU not noticing the interrupt.
      
      To fix this use load-acquire/store-release so that, if the target vCPU
      is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
      AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
      KVM_REQ_EVENT-based delivery.
      
      Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and
      in fact it can be handled in exactly the same way; the only difference
      lies in who has set IRR, whether svm_deliver_interrupt or the processor.
      Therefore, svm_complete_interrupt_delivery can be used to fix incomplete
      IPI vmexits as well.
      Co-developed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66fa226c