1. 20 12月, 2021 1 次提交
    • M
      KVM: x86: Always set kvm_run->if_flag · c5063551
      Marc Orr 提交于
      The kvm_run struct's if_flag is a part of the userspace/kernel API. The
      SEV-ES patches failed to set this flag because it's no longer needed by
      QEMU (according to the comment in the source code). However, other
      hypervisors may make use of this flag. Therefore, set the flag for
      guests with encrypted registers (i.e., with guest_state_protected set).
      
      Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Message-Id: <20211209155257.128747-1-marcorr@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      c5063551
  2. 10 12月, 2021 1 次提交
    • V
      KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall · 1ebfaa11
      Vitaly Kuznetsov 提交于
      Prior to commit 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use
      tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH |
      KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs
      and KVM_REQ_TLB_FLUSH is/was defined as:
      
       (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
      
      so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that
      "This call guarantees that by the time control returns back to the
      caller, the observable effects of all flushes on the specified virtual
      processors have occurred." and without KVM_REQUEST_WAIT there's a small
      chance that the vCPU making the TLB flush will resume running before
      all IPIs get delivered to other vCPUs and a stale mapping can get read
      there.
      
      Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST:
      kvm_hv_flush_tlb() is the sole caller which uses it for
      kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where
      KVM_REQUEST_WAIT makes a difference.
      
      Cc: stable@kernel.org
      Fixes: 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211209102937.584397-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ebfaa11
  3. 08 12月, 2021 10 次提交
  4. 02 12月, 2021 1 次提交
  5. 18 11月, 2021 1 次提交
    • M
      KVM: x86/mmu: include EFER.LMA in extended mmu role · b8453cdc
      Maxim Levitsky 提交于
      Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
      guest root level and is not reflected in kvm_mmu_page_role.level when TDP
      is in use.  When simply running the guest, it is impossible for EFER.LMA
      and kvm_mmu.root_level to get out of sync, as the guest cannot transition
      from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
      first bouncing through a different MMU context.  And stuffing guest state
      via KVM_SET_SREGS{,2} also ensures a full MMU context reset.
      
      However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
      set guest state when migrating the VM while L2 is active, the vCPU state
      will reflect L2, not L1.  If L1 is using TDP for L2, then root_mmu will
      have been configured using L2's state, despite not being used for L2.  If
      L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
      be configured for guest PAE paging, but will match the mmu_role for 64-bit
      paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.
      
      Alternatively, the root_mmu's role could be invalidated after a successful
      KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
      i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
      tricky, and not even needed if L1 and L2 do have the same role (e.g., they
      are both 64-bit guests and run with the same CR4).
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b8453cdc
  6. 11 11月, 2021 5 次提交
    • V
      KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS · da1bfd52
      Vitaly Kuznetsov 提交于
      KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
      VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
      either num_online_cpus() or num_present_cpus(), s390 has multiple
      constants depending on hardware features. On x86, KVM reports an
      arbitrary value of '710' which is supposed to be the maximum tested
      value but it's possible to test all KVM_MAX_VCPUS even when there are
      less physical CPUs available.
      
      Drop the arbitrary '710' value and return num_online_cpus() on x86 as
      well. The recommendation will match other architectures and will mean
      'no CPU overcommit'.
      
      For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
      when the requested vCPU number exceeds it. The static limit of '710'
      is quite weird as smaller systems with just a few physical CPUs should
      certainly "recommend" less.
      Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da1bfd52
    • P
      KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES · 760849b1
      Paul Durrant 提交于
      Currently when kvm_update_cpuid_runtime() runs, it assumes that the
      KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
      however, if Hyper-V support is enabled. In this case the KVM leaves will
      be offset.
      
      This patch introdues as new 'kvm_cpuid_base' field into struct
      kvm_vcpu_arch to track the location of the KVM leaves and function
      kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
      leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
      definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
      target the correct leaf.
      
      NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
            into processor.h to avoid having duplicate code for the iteration
            over possible hypervisor base leaves.
      Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
      Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      760849b1
    • M
      KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc
      Maxim Levitsky 提交于
      KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
      standard kvm's inject_pending_event, and not via APICv/AVIC.
      
      Since this is a debug feature, just inhibit APICv/AVIC while
      KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
      
      Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cae72dcc
    • D
      KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb
      David Woodhouse 提交于
      In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
      not missed") we switched to using a gfn_to_pfn_cache for accessing the
      guest steal time structure in order to allow for an atomic xchg of the
      preempted field. This has a couple of problems.
      
      Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
      atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
      guest vCPU using an IOMEM page for its steal time would never have its
      preempted field set.
      
      Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
      should have been. There are two stages to the GFN->PFN conversion;
      first the GFN is converted to a userspace HVA, and then that HVA is
      looked up in the process page tables to find the underlying host PFN.
      Correct invalidation of the latter would require being hooked up to the
      MMU notifiers, but that doesn't happen---so it just keeps mapping and
      unmapping the *wrong* PFN after the userspace page tables change.
      
      In the !IOMEM case at least the stale page *is* pinned all the time it's
      cached, so it won't be freed and reused by anyone else while still
      receiving the steal time updates. The map/unmap dance only takes care
      of the KVM administrivia such as marking the page dirty.
      
      Until the gfn_to_pfn cache handles the remapping automatically by
      integrating with the MMU notifiers, we might as well not get a
      kernel mapping of it, and use the perfectly serviceable userspace HVA
      that we already have.  We just need to implement the atomic xchg on
      the userspace address with appropriate exception handling, which is
      fairly trivial.
      
      Cc: stable@vger.kernel.org
      Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
      [I didn't entirely agree with David's assessment of the
       usefulness of the gfn_to_pfn cache, and integrated the outcome
       of the discussion in the above commit message. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e2175eb
    • P
      KVM: SEV: Add support for SEV intra host migration · b5663931
      Peter Gonda 提交于
      For SEV to work with intra host migration, contents of the SEV info struct
      such as the ASID (used to index the encryption key in the AMD SP) and
      the list of memory regions need to be transferred to the target VM.
      This change adds a commands for a target VMM to get a source SEV VM's sev
      info.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20211021174303.385706-3-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5663931
  7. 25 10月, 2021 3 次提交
    • D
      KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d
      David Woodhouse 提交于
      On the preemption path when updating a Xen guest's runstate times, this
      lock is taken inside the scheduler rq->lock, which is a raw spinlock.
      This was shown in a lockdep warning:
      
      [   89.138354] =============================
      [   89.138356] [ BUG: Invalid wait context ]
      [   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
      [   89.138360] -----------------------------
      [   89.138361] xen_shinfo_test/2575 is trying to lock:
      [   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138442] other info that might help us debug this:
      [   89.138444] context-{5:5}
      [   89.138445] 4 locks held by xen_shinfo_test/2575:
      [   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
      [   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
      [   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
      [   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
      ...
      [   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
      [   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
      [   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
      [   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
      [   89.138900]  __schedule+0x5de/0xbd0
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8228c77d
    • D
      KVM: x86: On emulation failure, convey the exit reason, etc. to userspace · e615e355
      David Edmondson 提交于
      Should instruction emulation fail, include the VM exit reason, etc. in
      the emulation_failure data passed to userspace, in order that the VMM
      can report it as a debugging aid when describing the failure.
      Suggested-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e615e355
    • D
      KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info · 0a62a031
      David Edmondson 提交于
      Extend the get_exit_info static call to provide the reason for the VM
      exit. Modify relevant trace points to use this rather than extracting
      the reason in the caller.
      Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0a62a031
  8. 23 10月, 2021 1 次提交
  9. 22 10月, 2021 6 次提交
  10. 19 10月, 2021 3 次提交
    • O
      KVM: x86: Expose TSC offset controls to userspace · 828ca896
      Oliver Upton 提交于
      To date, VMM-directed TSC synchronization and migration has been a bit
      messy. KVM has some baked-in heuristics around TSC writes to infer if
      the VMM is attempting to synchronize. This is problematic, as it depends
      on host userspace writing to the guest's TSC within 1 second of the last
      write.
      
      A much cleaner approach to configuring the guest's views of the TSC is to
      simply migrate the TSC offset for every vCPU. Offsets are idempotent,
      and thus not subject to change depending on when the VMM actually
      reads/writes values from/to KVM. The VMM can then read the TSC once with
      KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
      the guest is paused.
      
      Cc: David Matlack <dmatlack@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-8-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      828ca896
    • P
      kvm: x86: protect masterclock with a seqcount · 869b4421
      Paolo Bonzini 提交于
      Protect the reference point for kvmclock with a seqcount, so that
      kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
      updates will also run in parallel and not bounce the kvmclock cacheline.
      
      Of the variables that were protected by pvclock_gtod_sync_lock,
      nr_vcpus_matched_tsc is different because it is updated outside
      pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
      need to keep it protected by a spinlock.  In fact it must now
      be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
      write-side of a seqcount, is non-preemptible.  Since we already
      have tsc_write_lock which is a raw spinlock, we can just use
      tsc_write_lock as the lock that protects the write-side of the
      seqcount.
      Co-developed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20210916181538.968978-6-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      869b4421
    • O
      KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK · c68dc1b5
      Oliver Upton 提交于
      Handling the migration of TSCs correctly is difficult, in part because
      Linux does not provide userspace with the ability to retrieve a (TSC,
      realtime) clock pair for a single instant in time. In lieu of a more
      convenient facility, KVM can report similar information in the kvm_clock
      structure.
      
      Provide userspace with a host TSC & realtime pair iff the realtime clock
      is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
      realtime value, advance the KVM clock by the amount of elapsed time. Do
      not step the KVM clock backwards, though, as it is a monotonic
      oscillator.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-5-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c68dc1b5
  11. 01 10月, 2021 4 次提交
  12. 30 9月, 2021 3 次提交
  13. 06 9月, 2021 1 次提交