1. 14 4月, 2022 11 次提交
  2. 12 4月, 2022 2 次提交
  3. 09 4月, 2022 2 次提交
    • H
      RISC-V: KVM: include missing hwcap.h into vcpu_fp · 4054eee9
      Heiko Stuebner 提交于
      vcpu_fp uses the riscv_isa_extension mechanism which gets
      defined in hwcap.h but doesn't include that head file.
      
      While it seems to work in most cases, in certain conditions
      this can lead to build failures like
      
      ../arch/riscv/kvm/vcpu_fp.c: In function ‘kvm_riscv_vcpu_fp_reset’:
      ../arch/riscv/kvm/vcpu_fp.c:22:13: error: implicit declaration of function ‘riscv_isa_extension_available’ [-Werror=implicit-function-declaration]
         22 |         if (riscv_isa_extension_available(&isa, f) ||
            |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ../arch/riscv/kvm/vcpu_fp.c:22:49: error: ‘f’ undeclared (first use in this function)
         22 |         if (riscv_isa_extension_available(&isa, f) ||
      
      Fix this by simply including the necessary header.
      
      Fixes: 0a86512d ("RISC-V: KVM: Factor-out FP virtualization into separate
      sources")
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NAnup Patel <anup@brainfault.org>
      4054eee9
    • A
      RISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put() · 8c3ce496
      Anup Patel 提交于
      We might have RISC-V systems (such as QEMU) where VMID is not part
      of the TLB entry tag so these systems will have to flush all TLB
      entries upon any change in hgatp.VMID.
      
      Currently, we zero-out hgatp CSR in kvm_arch_vcpu_put() and we
      re-program hgatp CSR in kvm_arch_vcpu_load(). For above described
      systems, this will flush all TLB entries whenever VCPU exits to
      user-space hence reducing performance.
      
      This patch fixes above described performance issue by not clearing
      hgatp CSR in kvm_arch_vcpu_put().
      
      Fixes: 34bde9d8 ("RISC-V: KVM: Implement VCPU world-switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnup Patel <apatel@ventanamicro.com>
      Signed-off-by: NAnup Patel <anup@brainfault.org>
      8c3ce496
  4. 06 4月, 2022 6 次提交
  5. 05 4月, 2022 3 次提交
    • L
      KVM: x86/mmu: remove unnecessary flush_workqueue() · 3203a56a
      Lv Ruyi 提交于
      All work currently pending will be done first by calling destroy_workqueue,
      so there is unnecessary to flush it explicitly.
      Reported-by: NZeal Robot <zealci@zte.com.cn>
      Signed-off-by: NLv Ruyi <lv.ruyi@zte.com.cn>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220401083530.2407703-1-lv.ruyi@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3203a56a
    • S
      KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded · 1d0e8480
      Sean Christopherson 提交于
      Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as
      -1 is technically undefined behavior when its value is read out by
      param_get_bool(), as boolean values are supposed to be '0' or '1'.
      
      Alternatively, KVM could define a custom getter for the param, but the
      auto value doesn't depend on the vendor module in any way, and printing
      "auto" would be unnecessarily unfriendly to the user.
      
      In addition to fixing the undefined behavior, resolving the auto value
      also fixes the scenario where the auto value resolves to N and no vendor
      module is loaded.  Previously, -1 would result in Y being printed even
      though KVM would ultimately disable the mitigation.
      
      Rename the existing MMU module init/exit helpers to clarify that they're
      invoked with respect to the vendor module, and add comments to document
      why KVM has two separate "module init" flows.
      
        =========================================================================
        UBSAN: invalid-load in kernel/params.c:320:33
        load of value 255 is not a valid value for type '_Bool'
        CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        Call Trace:
         <TASK>
         dump_stack_lvl+0x34/0x44
         ubsan_epilogue+0x5/0x40
         __ubsan_handle_load_invalid_value.cold+0x43/0x48
         param_get_bool.cold+0xf/0x14
         param_attr_show+0x55/0x80
         module_attr_show+0x1c/0x30
         sysfs_kf_seq_show+0x93/0xc0
         seq_read_iter+0x11c/0x450
         new_sync_read+0x11b/0x1a0
         vfs_read+0xf0/0x190
         ksys_read+0x5f/0xe0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        =========================================================================
      
      Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation")
      Cc: stable@vger.kernel.org
      Reported-by: NBruno Goncalves <bgoncalv@redhat.com>
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220331221359.3912754-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d0e8480
    • P
      KVM: SEV: Add cond_resched() to loop in sev_clflush_pages() · 00c22013
      Peter Gonda 提交于
      Add resched to avoid warning from sev_clflush_pages() with large number
      of pages.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      
      Message-Id: <20220330164306.2376085-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      00c22013
  6. 02 4月, 2022 16 次提交
    • H
      KVM: x86/mmu: Don't rebuild page when the page is synced and no tlb flushing is required · 8d5678a7
      Hou Wenlong 提交于
      Before Commit c3e5e415 ("KVM: X86: Change kvm_sync_page()
      to return true when remote flush is needed"), the return value
      of kvm_sync_page() indicates whether the page is synced, and
      kvm_mmu_get_page() would rebuild page when the sync fails.
      But now, kvm_sync_page() returns false when the page is
      synced and no tlb flushing is required, which leads to
      rebuild page in kvm_mmu_get_page(). So return the return
      value of mmu->sync_page() directly and check it in
      kvm_mmu_get_page(). If the sync fails, the page will be
      zapped and the invalid_list is not empty, so set flush as
      true is accepted in mmu_sync_children().
      
      Cc: stable@vger.kernel.org
      Fixes: c3e5e415 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed")
      Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com>
      Acked-by: NLai Jiangshan <jiangshanlai@gmail.com>
      Message-Id: <0dabeeb789f57b0d793f85d073893063e692032d.1647336064.git.houwenlong.hwl@antgroup.com>
      [mmu_sync_children should not flush if the page is zapped. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8d5678a7
    • J
      KVM: x86: optimize PKU branching in kvm_load_{guest|host}_xsave_state · 945024d7
      Jon Kohler 提交于
      kvm_load_{guest|host}_xsave_state handles xsave on vm entry and exit,
      part of which is managing memory protection key state. The latest
      arch.pkru is updated with a rdpkru, and if that doesn't match the base
      host_pkru (which about 70% of the time), we issue a __write_pkru.
      
      To improve performance, implement the following optimizations:
       1. Reorder if conditions prior to wrpkru in both
          kvm_load_{guest|host}_xsave_state.
      
          Flip the ordering of the || condition so that XFEATURE_MASK_PKRU is
          checked first, which when instrumented in our environment appeared
          to be always true and less overall work than kvm_read_cr4_bits.
      
          For kvm_load_guest_xsave_state, hoist arch.pkru != host_pkru ahead
          one position. When instrumented, I saw this be true roughly ~70% of
          the time vs the other conditions which were almost always true.
          With this change, we will avoid 3rd condition check ~30% of the time.
      
       2. Wrap PKU sections with CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS,
          as if the user compiles out this feature, we should not have
          these branches at all.
      Signed-off-by: NJon Kohler <jon@nutanix.com>
      Message-Id: <20220324004439.6709-1-jon@nutanix.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      945024d7
    • M
      KVM: x86: SVM: allow AVIC to co-exist with a nested guest running · f44509f8
      Maxim Levitsky 提交于
      Inhibit the AVIC of the vCPU that is running nested for the duration of the
      nested run, so that all interrupts arriving from both its vCPU siblings
      and from KVM are delivered using normal IPIs and cause that vCPU to vmexit.
      
      Note that unlike normal AVIC inhibition, there is no need to
      update the AVIC mmio memslot, because the nested guest uses its
      own set of paging tables.
      That also means that AVIC doesn't need to be inhibited VM wide.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322174050.241850-7-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f44509f8
    • M
      KVM: x86: allow per cpu apicv inhibit reasons · d5fa597e
      Maxim Levitsky 提交于
      Add optional callback .vcpu_get_apicv_inhibit_reasons returning
      extra inhibit reasons that prevent APICv from working on this vCPU.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322174050.241850-6-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d5fa597e
    • M
      KVM: x86: nSVM: implement nested vGIF · 0b349662
      Maxim Levitsky 提交于
      In case L1 enables vGIF for L2, the L2 cannot affect L1's GIF, regardless
      of STGI/CLGI intercepts, and since VM entry enables GIF, this means
      that L1's GIF is always 1 while L2 is running.
      
      Thus in this case leave L1's vGIF in vmcb01, while letting L2
      control the vGIF thus implementing nested vGIF.
      
      Also allow KVM to toggle L1's GIF during nested entry/exit
      by always using vmcb01.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322174050.241850-5-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b349662
    • M
      KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE · 74fd41ed
      Maxim Levitsky 提交于
      Expose the pause filtering and threshold in the guest CPUID
      and support PAUSE filtering when possible:
      
      - If the L0 doesn't intercept PAUSE (cpu_pm=on), then allow L1 to
        have full control over PAUSE filtering.
      
      - if the L1 doesn't intercept PAUSE, use host values and update
        the adaptive count/threshold even when running nested.
      
      - Otherwise always exit to L1; it is not really possible to merge
        the fields correctly.  It is expected that in this case, userspace
        will not enable this feature in the guest CPUID, to avoid having the
        guest update both fields pointlessly.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322174050.241850-4-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      74fd41ed
    • M
      KVM: x86: nSVM: implement nested LBR virtualization · d20c796c
      Maxim Levitsky 提交于
      This was tested with kvm-unit-test that was developed
      for this purpose.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322174050.241850-3-mlevitsk@redhat.com>
      [Copy all of DEBUGCTL except for reserved bits. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d20c796c
    • M
      KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running · 1d5a1b58
      Maxim Levitsky 提交于
      When L2 is running without LBR virtualization, we should ensure
      that L1's LBR msrs continue to update as usual.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322174050.241850-2-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d5a1b58
    • M
      KVM: x86: SVM: remove vgif_enabled() · ea91559b
      Maxim Levitsky 提交于
      KVM always uses vgif when allowed, thus there is
      no need to query current vmcb for it
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322172449.235575-9-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ea91559b
    • M
      kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense · db663af4
      Maxim Levitsky 提交于
      This makes the code a bit shorter and cleaner.
      
      No functional change intended.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322172449.235575-4-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db663af4
    • M
      KVM: x86: SVM: use vmcb01 in init_vmcb · 1ee73a33
      Maxim Levitsky 提交于
      Clarify that this function is not used to initialize any part of
      the vmcb02.  No functional change intended.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ee73a33
    • L
      KVM: x86: Support the vCPU preemption check with nopvspin and realtime hint · d063de55
      Li RongQing 提交于
      If guest kernel is configured with nopvspin, or CONFIG_PARAVIRT_SPINLOCK
      is disabled, or guest find its has dedicated pCPUs from realtime hint
      feature, the pvspinlock will be disabled, and vCPU preemption check
      is disabled too.
      
      Hoever, KVM still can emulating HLT for vCPU for both cases.  Checking if a vCPU
      is preempted or not can still boost performance in IPI-heavy scenarios such as
      unixbench file copy and pipe-based context switching tests:  Here the vCPU is
      running with a dedicated pCPU, so the guest kernel has nopvspin but is
      emulating HLT for the vCPU:
      
      Testcase                                  Base    with patch
      System Benchmarks Index Values            INDEX     INDEX
      Dhrystone 2 using register variables     3278.4    3277.7
      Double-Precision Whetstone                822.8     825.8
      Execl Throughput                         1296.5     941.1
      File Copy 1024 bufsize 2000 maxblocks    2124.2    2142.7
      File Copy 256 bufsize 500 maxblocks      1335.9    1353.6
      File Copy 4096 bufsize 8000 maxblocks    4256.3    4760.3
      Pipe Throughput                          1050.1    1054.0
      Pipe-based Context Switching              243.3     352.0
      Process Creation                          820.1     814.4
      Shell Scripts (1 concurrent)             2169.0    2086.0
      Shell Scripts (8 concurrent)             7710.3    7576.3
      System Call Overhead                      672.4     673.9
                                            ========    =======
      System Benchmarks Index Score             1467.2   1483.0
      
      Move the setting of pv_ops.lock.vcpu_is_preempted to kvm_guest_init, so
      that it does not depend on pvspinlock.
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Message-Id: <1646815610-43315-1-git-send-email-lirongqing@baidu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d063de55
    • S
      KVM: x86: Don't snapshot "max" TSC if host TSC is constant · 741e511b
      Sean Christopherson 提交于
      Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the
      host TSC is constant, in which case the actual TSC frequency will never
      change and thus capturing the "max" TSC during initialization is
      unnecessary, KVM can simply use tsc_khz during VM creation.
      
      On CPUs with constant TSC, but not a hardware-specified TSC frequency,
      snapshotting max_tsc_khz and using that to set a VM's default TSC
      frequency can lead to KVM thinking it needs to manually scale the guest's
      TSC if refining the TSC completes after KVM snapshots tsc_khz.  The
      actual frequency never changes, only the kernel's calculation of what
      that frequency is changes.  On systems without hardware TSC scaling, this
      either puts KVM into "always catchup" mode (extremely inefficient), or
      prevents creating VMs altogether.
      
      Ideally, KVM would not be able to race with TSC refinement, or would have
      a hook into tsc_refine_calibration_work() to get an alert when refinement
      is complete.  Avoiding the race altogether isn't practical as refinement
      takes a relative eternity; it's deliberately put on a work queue outside
      of the normal boot sequence to avoid unnecessarily delaying boot.
      
      Adding a hook is doable, but somewhat gross due to KVM's ability to be
      built as a module.  And if the TSC is constant, which is likely the case
      for every VMX/SVM-capable CPU produced in the last decade, the race can
      be hit if and only if userspace is able to create a VM before TSC
      refinement completes; refinement is slow, but not that slow.
      
      For now, punt on a proper fix, as not taking a snapshot can help some
      uses cases and not taking a snapshot is arguably correct irrespective of
      the race with refinement.
      
      [ dwmw2: Rebase on top of KVM-wide default_tsc_khz to ensure that all
               vCPUs get the same frequency even if we hit the race. ]
      
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Anton Romanov <romanton@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20220225145304.36166-3-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      741e511b
    • D
      KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl. · ffbb61d0
      David Woodhouse 提交于
      This sets the default TSC frequency for subsequently created vCPUs.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20220225145304.36166-2-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ffbb61d0
    • L
      KVM: x86/i8259: Remove a dead store of irq in a conditional block · fe3787a0
      Like Xu 提交于
      The [clang-analyzer-deadcode.DeadStores] helper reports
      that the value stored to 'irq' is never read.
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20220301120217.38092-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe3787a0
    • Z
      KVM: VMX: Prepare VMCS setting for posted interrupt enabling when APICv is available · 1421211a
      Zeng Guang 提交于
      Currently KVM setup posted interrupt VMCS only depending on
      per-vcpu APICv activation status at the vCPU creation time.
      However, this status can be toggled dynamically under some
      circumstance. So potentially, later posted interrupt enabling
      may be problematic without VMCS readiness.
      
      To fix this, always settle the VMCS setting for posted interrupt
      as long as APICv is available and lapic locates in kernel.
      Signed-off-by: NZeng Guang <guang.zeng@intel.com>
      Message-Id: <20220315145836.9910-1-guang.zeng@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1421211a