1. 20 1月, 2022 1 次提交
    • S
      KVM: VMX: Reject KVM_RUN if emulation is required with pending exception · fc4fad79
      Sean Christopherson 提交于
      Reject KVM_RUN if emulation is required (because VMX is running without
      unrestricted guest) and an exception is pending, as KVM doesn't support
      emulating exceptions except when emulating real mode via vm86.  The vCPU
      is hosed either way, but letting KVM_RUN proceed triggers a WARN due to
      the impossible condition.  Alternatively, the WARN could be removed, but
      then userspace and/or KVM bugs would result in the vCPU silently running
      in a bad state, which isn't very friendly to users.
      
      Originally, the bug was hit by syzkaller with a nested guest as that
      doesn't require kvm_intel.unrestricted_guest=0.  That particular flavor
      is likely fixed by commit cd0e615c ("KVM: nVMX: Synthesize
      TRIPLE_FAULT for L2 if emulation is required"), but it's trivial to
      trigger the WARN with a non-nested guest, and userspace can likely force
      bad state via ioctls() for a nested guest as well.
      
      Checking for the impossible condition needs to be deferred until KVM_RUN
      because KVM can't force specific ordering between ioctls.  E.g. clearing
      exception.pending in KVM_SET_SREGS doesn't prevent userspace from setting
      it in KVM_SET_VCPU_EVENTS, and disallowing KVM_SET_VCPU_EVENTS with
      emulation_required would prevent userspace from queuing an exception and
      then stuffing sregs.  Note, if KVM were to try and detect/prevent the
      condition prior to KVM_RUN, handle_invalid_guest_state() and/or
      handle_emulation_failure() would need to be modified to clear the pending
      exception prior to exiting to userspace.
      
       ------------[ cut here ]------------
       WARNING: CPU: 6 PID: 137812 at arch/x86/kvm/vmx/vmx.c:1623 vmx_queue_exception+0x14f/0x160 [kvm_intel]
       CPU: 6 PID: 137812 Comm: vmx_invalid_nes Not tainted 5.15.2-7cc36c3e14ae-pop #279
       Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
       RIP: 0010:vmx_queue_exception+0x14f/0x160 [kvm_intel]
       Code: <0f> 0b e9 fd fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
       RSP: 0018:ffffa45c83577d38 EFLAGS: 00010202
       RAX: 0000000000000003 RBX: 0000000080000006 RCX: 0000000000000006
       RDX: 0000000000000000 RSI: 0000000000010002 RDI: ffff9916af734000
       RBP: ffff9916af734000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000006
       R13: 0000000000000000 R14: ffff9916af734038 R15: 0000000000000000
       FS:  00007f1e1a47c740(0000) GS:ffff99188fb80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f1e1a6a8008 CR3: 000000026f83b005 CR4: 00000000001726e0
       Call Trace:
        kvm_arch_vcpu_ioctl_run+0x13a2/0x1f20 [kvm]
        kvm_vcpu_ioctl+0x279/0x690 [kvm]
        __x64_sys_ioctl+0x83/0xb0
        do_syscall_64+0x3b/0xc0
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Reported-by: syzbot+82112403ace4cbd780d8@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211228232437.1875318-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc4fad79
  2. 18 1月, 2022 1 次提交
    • L
      KVM: x86: Making the module parameter of vPMU more common · 4732f244
      Like Xu 提交于
      The new module parameter to control PMU virtualization should apply
      to Intel as well as AMD, for situations where userspace is not trusted.
      If the module parameter allows PMU virtualization, there could be a
      new KVM_CAP or guest CPUID bits whereby userspace can enable/disable
      PMU virtualization on a per-VM basis.
      
      If the module parameter does not allow PMU virtualization, there
      should be no userspace override, since we have no precedent for
      authorizing that kind of override. If it's false, other counter-based
      profiling features (such as LBR including the associated CPUID bits
      if any) will not be exposed.
      
      Change its name from "pmu" to "enable_pmu" as we have temporary
      variables with the same name in our code like "struct kvm_pmu *pmu".
      
      Fixes: b1d66dad ("KVM: x86/svm: Add module param to control PMU virtualization")
      Suggested-by : Jim Mattson <jmattson@google.com>
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20220111073823.21885-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4732f244
  3. 07 1月, 2022 3 次提交
    • M
      KVM: SVM: include CR3 in initial VMSA state for SEV-ES guests · 405329fc
      Michael Roth 提交于
      Normally guests will set up CR3 themselves, but some guests, such as
      kselftests, and potentially CONFIG_PVH guests, rely on being booted
      with paging enabled and CR3 initialized to a pre-allocated page table.
      
      Currently CR3 updates via KVM_SET_SREGS* are not loaded into the guest
      VMCB until just prior to entering the guest. For SEV-ES/SEV-SNP, this
      is too late, since it will have switched over to using the VMSA page
      prior to that point, with the VMSA CR3 copied from the VMCB initial
      CR3 value: 0.
      
      Address this by sync'ing the CR3 value into the VMCB save area
      immediately when KVM_SET_SREGS* is issued so it will find it's way into
      the initial VMSA.
      Suggested-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NMichael Roth <michael.roth@amd.com>
      Message-Id: <20211216171358.61140-10-michael.roth@amd.com>
      [Remove vmx_post_set_cr3; add a remark about kvm_set_cr3 not calling the
       new hook. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      405329fc
    • L
      KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() · 6ed1298e
      Like Xu 提交于
      Since we set the same semantic event value for the fixed counter in
      pmc->eventsel, returning the perf_hw_id for the fixed counter via
      find_fixed_event() can be painlessly replaced by pmc_perf_hw_id()
      with the help of pmc_is_fixed() check.
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20211130074221.93635-4-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ed1298e
    • L
      KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() · 7c174f30
      Like Xu 提交于
      The find_arch_event() returns a "unsigned int" value,
      which is used by the pmc_reprogram_counter() to
      program a PERF_TYPE_HARDWARE type perf_event.
      
      The returned value is actually the kernel defined generic
      perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
      incoming parameters for better self-explanation.
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20211130074221.93635-3-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7c174f30
  4. 20 12月, 2021 1 次提交
    • M
      KVM: x86: Always set kvm_run->if_flag · c5063551
      Marc Orr 提交于
      The kvm_run struct's if_flag is a part of the userspace/kernel API. The
      SEV-ES patches failed to set this flag because it's no longer needed by
      QEMU (according to the comment in the source code). However, other
      hypervisors may make use of this flag. Therefore, set the flag for
      guests with encrypted registers (i.e., with guest_state_protected set).
      
      Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Message-Id: <20211209155257.128747-1-marcorr@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      c5063551
  5. 08 12月, 2021 17 次提交
  6. 05 12月, 2021 3 次提交
  7. 02 12月, 2021 2 次提交
    • P
      KVM: ensure APICv is considered inactive if there is no APIC · ef8b4b72
      Paolo Bonzini 提交于
      kvm_vcpu_apicv_active() returns false if a virtual machine has no in-kernel
      local APIC, however kvm_apicv_activated might still be true if there are
      no reasons to disable APICv; in fact it is quite likely that there is none
      because APICv is inhibited by specific configurations of the local APIC
      and those configurations cannot be programmed.  This triggers a WARN:
      
         WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
      
      To avoid this, introduce another cause for APICv inhibition, namely the
      absence of an in-kernel local APIC.  This cause is enabled by default,
      and is dropped by either KVM_CREATE_IRQCHIP or the enabling of
      KVM_CAP_IRQCHIP_SPLIT.
      Reported-by: NIgnat Korchagin <ignat@cloudflare.com>
      Fixes: ee49a893 ("KVM: x86: Move SVM's APICv sanity check to common x86", 2021-10-22)
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NIgnat Korchagin <ignat@cloudflare.com>
      Message-Id: <20211130123746.293379-1-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ef8b4b72
    • L
      KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register · cb1d220d
      Like Xu 提交于
      If we run the following perf command in an AMD Milan guest:
      
        perf stat \
        -e cpu/event=0x1d0/ \
        -e cpu/event=0x1c7/ \
        -e cpu/umask=0x1f,event=0x18e/ \
        -e cpu/umask=0x7,event=0x18e/ \
        -e cpu/umask=0x18,event=0x18e/ \
        ./workload
      
      dmesg will report a #GP warning from an unchecked MSR access
      error on MSR_F15H_PERF_CTLx.
      
      This is because according to APM (Revision: 4.03) Figure 13-7,
      the bits [35:32] of AMD PerfEvtSeln register is a part of the
      event select encoding, which extends the EVENT_SELECT field
      from 8 bits to 12 bits.
      
      Opportunistically update pmu->reserved_bits for reserved bit 19.
      Reported-by: NJim Mattson <jmattson@google.com>
      Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20211118130320.95997-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb1d220d
  8. 30 11月, 2021 10 次提交
    • P
      KVM: fix avic_set_running for preemptable kernels · 7cfc5c65
      Paolo Bonzini 提交于
      avic_set_running() passes the current CPU to avic_vcpu_load(), albeit
      via vcpu->cpu rather than smp_processor_id().  If the thread is migrated
      while avic_set_running runs, the call to avic_vcpu_load() can use a stale
      value for the processor id.  Avoid this by blocking preemption over the
      entire execution of avic_set_running().
      Reported-by: NSean Christopherson <seanjc@google.com>
      Fixes: 8221c137 ("svm: Manage vcpu load/unload when enable AVIC")
      Cc: stable@vger.kernel.org
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7cfc5c65
    • P
      KVM: SEV: accept signals in sev_lock_two_vms · c9d61dcb
      Paolo Bonzini 提交于
      Generally, kvm->lock is not taken for a long time, but
      sev_lock_two_vms is different: it takes vCPU locks
      inside, so userspace can hold it back just by calling
      a vCPU ioctl.  Play it safe and use mutex_lock_killable.
      
      Message-Id: <20211123005036.2954379-13-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9d61dcb
    • P
      KVM: SEV: do not take kvm->lock when destroying · 10a37929
      Paolo Bonzini 提交于
      Taking the lock is useless since there are no other references,
      and there are already accesses (e.g. to sev->enc_context_owner)
      that do not take it.  So get rid of it.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-12-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      10a37929
    • P
      KVM: SEV: Prohibit migration of a VM that has mirrors · 17d44a96
      Paolo Bonzini 提交于
      VMs that mirror an encryption context rely on the owner to keep the
      ASID allocated.  Performing a KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
      would cause a dangling ASID:
      
      1. copy context from A to B (gets ref to A)
      2. move context from A to L (moves ASID from A to L)
      3. close L (releases ASID from L, B still references it)
      
      The right way to do the handoff instead is to create a fresh mirror VM
      on the destination first:
      
      1. copy context from A to B (gets ref to A)
      [later] 2. close B (releases ref to A)
      3. move context from A to L (moves ASID from A to L)
      4. copy context from L to M
      
      So, catch the situation by adding a count of how many VMs are
      mirroring this one's encryption context.
      
      Fixes: 0b020f5a ("KVM: SEV: Add support for SEV-ES intra host migration")
      Message-Id: <20211123005036.2954379-11-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17d44a96
    • P
      KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked · bf42b02b
      Paolo Bonzini 提交于
      Now that we have a facility to lock two VMs with deadlock
      protection, use it for the creation of mirror VMs as well.  One of
      COPY_ENC_CONTEXT_FROM(dst, src) and COPY_ENC_CONTEXT_FROM(src, dst)
      would always fail, so the combination is nonsensical and it is okay to
      return -EBUSY if it is attempted.
      
      This sidesteps the question of what happens if a VM is
      MOVE_ENC_CONTEXT_FROM'd at the same time as it is
      COPY_ENC_CONTEXT_FROM'd: the locking prevents that from
      happening.
      
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-10-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bf42b02b
    • P
      KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM · 642525e3
      Paolo Bonzini 提交于
      Allow intra-host migration of a mirror VM; the destination VM will be
      a mirror of the same ASID as the source.
      
      Fixes: b5663931 ("KVM: SEV: Add support for SEV intra host migration")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-8-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      642525e3
    • P
      KVM: SEV: initialize regions_list of a mirror VM · 2b347a38
      Paolo Bonzini 提交于
      This was broken before the introduction of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM,
      but technically harmless because the region list was unused for a mirror
      VM.  However, it is untidy and it now causes a NULL pointer access when
      attempting to move the encryption context of a mirror VM.
      
      Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context")
      Message-Id: <20211123005036.2954379-7-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2b347a38
    • P
      KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM · 501b580c
      Paolo Bonzini 提交于
      Encapsulate the handling of the migration_in_progress flag for both VMs in
      two functions sev_lock_two_vms and sev_unlock_two_vms.  It does not matter
      if KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM locks the destination struct kvm a bit
      later, and this change 1) keeps the cleanup chain of labels smaller 2)
      makes it possible for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM to reuse the logic.
      
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-6-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      501b580c
    • P
      KVM: SEV: do not use list_replace_init on an empty list · 4674164f
      Paolo Bonzini 提交于
      list_replace_init cannot be used if the source is an empty list,
      because "new->next->prev = new" will overwrite "old->next":
      
      				new				old
      				prev = new, next = new		prev = old, next = old
      new->next = old->next		prev = new, next = old		prev = old, next = old
      new->next->prev = new		prev = new, next = old		prev = old, next = new
      new->prev = old->prev		prev = old, next = old		prev = old, next = old
      new->next->prev = new		prev = old, next = old		prev = new, next = new
      
      The desired outcome instead would be to leave both old and new the same
      as they were (two empty circular lists).  Use list_cut_before, which
      already has the necessary check and is documented to discard the
      previous contents of the list that will hold the result.
      
      Fixes: b5663931 ("KVM: SEV: Add support for SEV intra host migration")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-5-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4674164f
    • P
      KVM: x86: check PIR even for vCPUs with disabled APICv · 37c4dbf3
      Paolo Bonzini 提交于
      The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
      if APICv is disabled on the vCPU that receives it.  In that case, the
      interrupt will just cause a vmexit and leave the ON bit set together
      with the PIR bit corresponding to the interrupt.
      
      Right now, the interrupt would not be delivered until APICv is re-enabled.
      However, fixing this is just a matter of always doing the PIR->IRR
      synchronization, even if the vCPU has temporarily disabled APICv.
      
      This is not a problem for performance, or if anything it is an
      improvement.  First, in the common case where vcpu->arch.apicv_active is
      true, one fewer check has to be performed.  Second, static_call_cond will
      elide the function call if APICv is not present or disabled.  Finally,
      in the case for AMD hardware we can remove the sync_pir_to_irr callback:
      it is only needed for apic_has_interrupt_for_ppr, and that function
      already has a fallback for !APICv.
      
      Cc: stable@vger.kernel.org
      Co-developed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37c4dbf3
  9. 18 11月, 2021 2 次提交