1. 17 11月, 2021 2 次提交
  2. 11 11月, 2021 5 次提交
    • V
      KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS · da1bfd52
      Vitaly Kuznetsov 提交于
      KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
      VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
      either num_online_cpus() or num_present_cpus(), s390 has multiple
      constants depending on hardware features. On x86, KVM reports an
      arbitrary value of '710' which is supposed to be the maximum tested
      value but it's possible to test all KVM_MAX_VCPUS even when there are
      less physical CPUs available.
      
      Drop the arbitrary '710' value and return num_online_cpus() on x86 as
      well. The recommendation will match other architectures and will mean
      'no CPU overcommit'.
      
      For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
      when the requested vCPU number exceeds it. The static limit of '710'
      is quite weird as smaller systems with just a few physical CPUs should
      certainly "recommend" less.
      Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da1bfd52
    • P
      KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES · 760849b1
      Paul Durrant 提交于
      Currently when kvm_update_cpuid_runtime() runs, it assumes that the
      KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
      however, if Hyper-V support is enabled. In this case the KVM leaves will
      be offset.
      
      This patch introdues as new 'kvm_cpuid_base' field into struct
      kvm_vcpu_arch to track the location of the KVM leaves and function
      kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
      leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
      definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
      target the correct leaf.
      
      NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
            into processor.h to avoid having duplicate code for the iteration
            over possible hypervisor base leaves.
      Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
      Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      760849b1
    • M
      KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc
      Maxim Levitsky 提交于
      KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
      standard kvm's inject_pending_event, and not via APICv/AVIC.
      
      Since this is a debug feature, just inhibit APICv/AVIC while
      KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
      
      Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cae72dcc
    • D
      KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb
      David Woodhouse 提交于
      In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
      not missed") we switched to using a gfn_to_pfn_cache for accessing the
      guest steal time structure in order to allow for an atomic xchg of the
      preempted field. This has a couple of problems.
      
      Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
      atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
      guest vCPU using an IOMEM page for its steal time would never have its
      preempted field set.
      
      Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
      should have been. There are two stages to the GFN->PFN conversion;
      first the GFN is converted to a userspace HVA, and then that HVA is
      looked up in the process page tables to find the underlying host PFN.
      Correct invalidation of the latter would require being hooked up to the
      MMU notifiers, but that doesn't happen---so it just keeps mapping and
      unmapping the *wrong* PFN after the userspace page tables change.
      
      In the !IOMEM case at least the stale page *is* pinned all the time it's
      cached, so it won't be freed and reused by anyone else while still
      receiving the steal time updates. The map/unmap dance only takes care
      of the KVM administrivia such as marking the page dirty.
      
      Until the gfn_to_pfn cache handles the remapping automatically by
      integrating with the MMU notifiers, we might as well not get a
      kernel mapping of it, and use the perfectly serviceable userspace HVA
      that we already have.  We just need to implement the atomic xchg on
      the userspace address with appropriate exception handling, which is
      fairly trivial.
      
      Cc: stable@vger.kernel.org
      Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
      [I didn't entirely agree with David's assessment of the
       usefulness of the gfn_to_pfn cache, and integrated the outcome
       of the discussion in the above commit message. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e2175eb
    • P
      KVM: SEV: Add support for SEV intra host migration · b5663931
      Peter Gonda 提交于
      For SEV to work with intra host migration, contents of the SEV info struct
      such as the ASID (used to index the encryption key in the AMD SP) and
      the list of memory regions need to be transferred to the target VM.
      This change adds a commands for a target VMM to get a source SEV VM's sev
      info.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20211021174303.385706-3-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5663931
  3. 25 10月, 2021 3 次提交
    • D
      KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d
      David Woodhouse 提交于
      On the preemption path when updating a Xen guest's runstate times, this
      lock is taken inside the scheduler rq->lock, which is a raw spinlock.
      This was shown in a lockdep warning:
      
      [   89.138354] =============================
      [   89.138356] [ BUG: Invalid wait context ]
      [   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
      [   89.138360] -----------------------------
      [   89.138361] xen_shinfo_test/2575 is trying to lock:
      [   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138442] other info that might help us debug this:
      [   89.138444] context-{5:5}
      [   89.138445] 4 locks held by xen_shinfo_test/2575:
      [   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
      [   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
      [   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
      [   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
      ...
      [   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
      [   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
      [   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
      [   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
      [   89.138900]  __schedule+0x5de/0xbd0
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8228c77d
    • D
      KVM: x86: On emulation failure, convey the exit reason, etc. to userspace · e615e355
      David Edmondson 提交于
      Should instruction emulation fail, include the VM exit reason, etc. in
      the emulation_failure data passed to userspace, in order that the VMM
      can report it as a debugging aid when describing the failure.
      Suggested-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e615e355
    • D
      KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info · 0a62a031
      David Edmondson 提交于
      Extend the get_exit_info static call to provide the reason for the VM
      exit. Modify relevant trace points to use this rather than extracting
      the reason in the caller.
      Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0a62a031
  4. 23 10月, 2021 1 次提交
  5. 22 10月, 2021 6 次提交
  6. 19 10月, 2021 3 次提交
    • O
      KVM: x86: Expose TSC offset controls to userspace · 828ca896
      Oliver Upton 提交于
      To date, VMM-directed TSC synchronization and migration has been a bit
      messy. KVM has some baked-in heuristics around TSC writes to infer if
      the VMM is attempting to synchronize. This is problematic, as it depends
      on host userspace writing to the guest's TSC within 1 second of the last
      write.
      
      A much cleaner approach to configuring the guest's views of the TSC is to
      simply migrate the TSC offset for every vCPU. Offsets are idempotent,
      and thus not subject to change depending on when the VMM actually
      reads/writes values from/to KVM. The VMM can then read the TSC once with
      KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
      the guest is paused.
      
      Cc: David Matlack <dmatlack@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-8-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      828ca896
    • P
      kvm: x86: protect masterclock with a seqcount · 869b4421
      Paolo Bonzini 提交于
      Protect the reference point for kvmclock with a seqcount, so that
      kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
      updates will also run in parallel and not bounce the kvmclock cacheline.
      
      Of the variables that were protected by pvclock_gtod_sync_lock,
      nr_vcpus_matched_tsc is different because it is updated outside
      pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
      need to keep it protected by a spinlock.  In fact it must now
      be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
      write-side of a seqcount, is non-preemptible.  Since we already
      have tsc_write_lock which is a raw spinlock, we can just use
      tsc_write_lock as the lock that protects the write-side of the
      seqcount.
      Co-developed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20210916181538.968978-6-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      869b4421
    • O
      KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK · c68dc1b5
      Oliver Upton 提交于
      Handling the migration of TSCs correctly is difficult, in part because
      Linux does not provide userspace with the ability to retrieve a (TSC,
      realtime) clock pair for a single instant in time. In lieu of a more
      convenient facility, KVM can report similar information in the kvm_clock
      structure.
      
      Provide userspace with a host TSC & realtime pair iff the realtime clock
      is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
      realtime value, advance the KVM clock by the amount of elapsed time. Do
      not step the KVM clock backwards, though, as it is a monotonic
      oscillator.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-5-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c68dc1b5
  7. 01 10月, 2021 4 次提交
  8. 30 9月, 2021 3 次提交
  9. 06 9月, 2021 3 次提交
    • E
      kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 · 1dbaf04c
      Eduardo Habkost 提交于
      Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
      so increase KVM_SOFT_MAX_VCPUS to 710.
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1dbaf04c
    • E
      kvm: x86: Increase MAX_VCPUS to 1024 · 074c82c8
      Eduardo Habkost 提交于
      Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.
      
      I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
      might involve complicated questions around the meaning of
      "supported" and "recommended" in the upstream tree.
      KVM_SOFT_MAX_VCPUS will be changed in a separate patch.
      
      For reference, visible effects of this change are:
      - KVM_CAP_MAX_VCPUS will now return 1024 (of course)
      - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
        will now be 1024
      - KVM_MAX_VCPU_ID will change from 1151 to 4096
      - Size of struct kvm will increase from 19328 to 22272 bytes
        (in x86_64)
      - Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
        (in x86_64)
      - Bitmap stack variables that will grow:
        - At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
          vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
        - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
          once patch "KVM: x86: Fix stack-out-of-bounds memory access
          from ioapic_write_indirect()" is applied
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      074c82c8
    • E
      kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS · 4ddacd52
      Eduardo Habkost 提交于
      Instead of requiring KVM_MAX_VCPU_ID to be manually increased
      every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS.
      This should be enough for CPU topologies where Cores-per-Package
      and Packages-per-Socket are not powers of 2.
      
      In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152.
      The only side effect of this change is making some fields in
      struct kvm_ioapic larger, increasing the struct size from 1628 to
      1780 bytes (in x86_64).
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4ddacd52
  10. 21 8月, 2021 8 次提交
    • W
      KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host · cb0f722a
      Wei Huang 提交于
      When the 5-level page table CPU flag is set in the host, but the guest
      has CR4.LA57=0 (including the case of a 32-bit guest), the top level of
      the shadow NPT page tables will be fixed, consisting of one pointer to
      a lower-level table and 511 non-present entries.  Extend the existing
      code that creates the fixed PML4 or PDP table, to provide a fixed PML5
      table if needed.
      
      This is not needed on EPT because the number of layers in the tables
      is specified in the EPTP instead of depending on the host CR4.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NWei Huang <wei.huang2@amd.com>
      Message-Id: <20210818165549.3771014-3-wei.huang2@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb0f722a
    • W
      KVM: x86: Allow CPU to force vendor-specific TDP level · 746700d2
      Wei Huang 提交于
      AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
      To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
      on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
      a fixed TDP level.
      Signed-off-by: NWei Huang <wei.huang2@amd.com>
      Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      746700d2
    • M
      KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ · 61e5f69e
      Maxim Levitsky 提交于
      KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts
      while running.
      
      This change is mostly intended for more robust single stepping
      of the guest and it has the following benefits when enabled:
      
      * Resuming from a breakpoint is much more reliable.
        When resuming execution from a breakpoint, with interrupts enabled,
        more often than not, KVM would inject an interrupt and make the CPU
        jump immediately to the interrupt handler and eventually return to
        the breakpoint, to trigger it again.
      
        From the user point of view it looks like the CPU never executed a
        single instruction and in some cases that can even prevent forward
        progress, for example, when the breakpoint is placed by an automated
        script (e.g lx-symbols), which does something in response to the
        breakpoint and then continues the guest automatically.
        If the script execution takes enough time for another interrupt to
        arrive, the guest will be stuck on the same breakpoint RIP forever.
      
      * Normal single stepping is much more predictable, since it won't
        land the debugger into an interrupt handler.
      
      * RFLAGS.TF has less chance to be leaked to the guest:
      
        We set that flag behind the guest's back to do single stepping
        but if single step lands us into an interrupt/exception handler
        it will be leaked to the guest in the form of being pushed
        to the stack.
        This doesn't completely eliminate this problem as exceptions
        can still happen, but at least this reduces the chances
        of this happening.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61e5f69e
    • M
      KVM: x86/mmu: Add detailed page size stats · 71f51d2c
      Mingwei Zhang 提交于
      Existing KVM code tracks the number of large pages regardless of their
      sizes. Therefore, when large page of 1GB (or larger) is adopted, the
      information becomes less useful because lpages counts a mix of 1G and 2M
      pages.
      
      So remove the lpages since it is easy for user space to aggregate the info.
      Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
      Suggested-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NMingwei Zhang <mizhang@google.com>
      Cc: Jing Zhang <jingzhangos@google.com>
      Cc: David Matlack <dmatlack@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Message-Id: <20210803044607.599629-4-mizhang@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      71f51d2c
    • V
      KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use · 0f250a64
      Vitaly Kuznetsov 提交于
      APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon
      SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is,
      however, possible to track whether the feature was actually used by the
      guest and only inhibit APICv/AVIC when needed.
      
      TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let
      Windows know that AutoEOI feature should be avoided. While it's up to
      KVM userspace to set the flag, KVM can help a bit by exposing global
      APICv/AVIC enablement.
      
      Maxim:
         - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid,
           since this feature can be used regardless of AVIC
      
      Paolo:
         - use arch.apicv_update_lock to protect the hv->synic_auto_eoi_used
           instead of atomic ops
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210810205251.424103-12-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f250a64
    • M
      KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM · b0a1637f
      Maxim Levitsky 提交于
      Currently on SVM, the kvm_request_apicv_update toggles the APICv
      memslot without doing any synchronization.
      
      If there is a mismatch between that memslot state and the AVIC state,
      on one of the vCPUs, an APIC mmio access can be lost:
      
      For example:
      
      VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
      VCPU1: access an APIC mmio register.
      
      Since AVIC is still disabled on VCPU1, the access will not be intercepted
      by it, and neither will it cause MMIO fault, but rather it will just be
      read/written from/to the dummy page mapped into the
      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.
      
      Fix that by adding a lock guarding the AVIC state changes, and carefully
      order the operations of kvm_request_apicv_update to avoid this race:
      
      1. Take the lock
      2. Send KVM_REQ_APICV_UPDATE
      3. Update the apic inhibit reason
      4. Release the lock
      
      This ensures that at (2) all vCPUs are kicked out of the guest mode,
      but don't yet see the new avic state.
      Then only after (4) all other vCPUs can update their AVIC state and resume.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210810205251.424103-10-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b0a1637f
    • M
      KVM: x86: don't disable APICv memslot when inhibited · 36222b11
      Maxim Levitsky 提交于
      Thanks to the former patches, it is now possible to keep the APICv
      memslot always enabled, and it will be invisible to the guest
      when it is inhibited
      
      This code is based on a suggestion from Sean Christopherson:
      https://lkml.org/lkml/2021/7/19/2970Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210810205251.424103-9-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      36222b11
    • P
      KVM: X86: Introduce kvm_mmu_slot_lpages() helpers · 4139b197
      Peter Xu 提交于
      Introduce kvm_mmu_slot_lpages() to calculcate lpage_info and rmap array size.
      The other __kvm_mmu_slot_lpages() can take an extra parameter of npages rather
      than fetching from the memslot pointer.  Start to use the latter one in
      kvm_alloc_memslot_metadata().
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20210730220455.26054-4-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4139b197
  11. 13 8月, 2021 2 次提交