1. 11 6月, 2022 15 次提交
  2. 10 6月, 2022 8 次提交
  3. 09 6月, 2022 17 次提交
    • P
      Merge branch 'kvm-5.20-early' · e15f5e6f
      Paolo Bonzini 提交于
      s390:
      
      * add an interface to provide a hypervisor dump for secure guests
      
      * improve selftests to show tests
      
      x86:
      
      * Intel IPI virtualization
      
      * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
      
      * PEBS virtualization
      
      * Simplify PMU emulation by just using PERF_TYPE_RAW events
      
      * More accurate event reinjection on SVM (avoid retrying instructions)
      
      * Allow getting/setting the state of the speaker port data bit
      
      * Rewrite gfn-pfn cache refresh
      
      * Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent
      
      * "Notify" VM exit
      e15f5e6f
    • D
      KVM: selftests: Restrict test region to 48-bit physical addresses when using nested · e0f3f46e
      David Matlack 提交于
      The selftests nested code only supports 4-level paging at the moment.
      This means it cannot map nested guest physical addresses with more than
      48 bits. Allow perf_test_util nested mode to work on hosts with more
      than 48 physical addresses by restricting the guest test region to
      48-bits.
      
      While here, opportunistically fix an off-by-one error when dealing with
      vm_get_max_gfn(). perf_test_util.c was treating this as the maximum
      number of GFNs, rather than the maximum allowed GFN. This didn't result
      in any correctness issues, but it did end up shifting the test region
      down slightly when using huge pages.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-12-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0f3f46e
    • D
      KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 · 71d48966
      David Matlack 提交于
      Add an option to dirty_log_perf_test that configures the vCPUs to run in
      L2 instead of L1. This makes it possible to benchmark the dirty logging
      performance of nested virtualization, which is particularly interesting
      because KVM must shadow L1's EPT/NPT tables.
      
      For now this support only works on x86_64 CPUs with VMX. Otherwise
      passing -n results in the test being skipped.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-11-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      71d48966
    • D
      KVM: selftests: Clean up LIBKVM files in Makefile · cf97d5e9
      David Matlack 提交于
      Break up the long lines for LIBKVM and alphabetize each architecture.
      This makes reading the Makefile easier, and will make reading diffs to
      LIBKVM easier.
      
      No functional change intended.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-10-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cf97d5e9
    • D
      KVM: selftests: Link selftests directly with lib object files · cdc979da
      David Matlack 提交于
      The linker does obey strong/weak symbols when linking static libraries,
      it simply resolves an undefined symbol to the first-encountered symbol.
      This means that defining __weak arch-generic functions and then defining
      arch-specific strong functions to override them in libkvm will not
      always work.
      
      More specifically, if we have:
      
      lib/generic.c:
      
        void __weak foo(void)
        {
                pr_info("weak\n");
        }
      
        void bar(void)
        {
                foo();
        }
      
      lib/x86_64/arch.c:
      
        void foo(void)
        {
                pr_info("strong\n");
        }
      
      And a selftest that calls bar(), it will print "weak". Now if you make
      generic.o explicitly depend on arch.o (e.g. add function to arch.c that
      is called directly from generic.c) it will print "strong". In other
      words, it seems that the linker is free to throw out arch.o when linking
      because generic.o does not explicitly depend on it, which causes the
      linker to lose the strong symbol.
      
      One solution is to link libkvm.a with --whole-archive so that the linker
      doesn't throw away object files it thinks are unnecessary. However that
      is a bit difficult to plumb since we are using the common selftests
      makefile rules. An easier solution is to drop libkvm.a just link
      selftests with all the .o files that were originally in libkvm.a.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-9-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cdc979da
    • D
      KVM: selftests: Drop unnecessary rule for STATIC_LIBS · acf57736
      David Matlack 提交于
      Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend
      on $(STATIC_LIBS), so there is no reason to have an extra "all" rule.
      Suggested-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-8-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      acf57736
    • D
      KVM: selftests: Add a helper to check EPT/VPID capabilities · c363d959
      David Matlack 提交于
      Create a small helper function to check if a given EPT/VPID capability
      is supported. This will be re-used in a follow-up commit to check for 1G
      page support.
      
      No functional change intended.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-7-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c363d959
    • D
      KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h · b6c086d0
      David Matlack 提交于
      This is a VMX-related macro so move it to vmx.h. While here, open code
      the mask like the rest of the VMX bitmask macros.
      
      No functional change intended.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-6-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6c086d0
    • D
      KVM: selftests: Refactor nested_map() to specify target level · ce690e9c
      David Matlack 提交于
      Refactor nested_map() to specify that it explicityl wants 4K mappings
      (the existing behavior) and push the implementation down into
      __nested_map(), which can be used in subsequent commits to create huge
      page mappings.
      
      No function change intended.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-5-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ce690e9c
    • D
      KVM: selftests: Drop stale function parameter comment for nested_map() · b8ca01ea
      David Matlack 提交于
      nested_map() does not take a parameter named eptp_memslot. Drop the
      comment referring to it.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-4-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b8ca01ea
    • D
      KVM: selftests: Add option to create 2M and 1G EPT mappings · c5a0ccec
      David Matlack 提交于
      The current EPT mapping code in the selftests only supports mapping 4K
      pages. This commit extends that support with an option to map at 2M or
      1G. This will be used in a future commit to create large page mappings
      to test eager page splitting.
      
      No functional change intended.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-3-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c5a0ccec
    • D
      KVM: selftests: Replace x86_page_size with PG_LEVEL_XX · 4ee602e7
      David Matlack 提交于
      x86_page_size is an enum used to communicate the desired page size with
      which to map a range of memory. Under the hood they just encode the
      desired level at which to map the page. This ends up being clunky in a
      few ways:
      
       - The name suggests it encodes the size of the page rather than the
         level.
       - In other places in x86_64/processor.c we just use a raw int to encode
         the level.
      
      Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
      around raw ints when referring to the level. This makes the code easier
      to understand since these macros are very common in KVM MMU code.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220520233249.3776001-2-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4ee602e7
    • P
      KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE · e3cdaab5
      Paolo Bonzini 提交于
      Commit 74fd41ed ("KVM: x86: nSVM: support PAUSE filtering when L0
      doesn't intercept PAUSE") introduced passthrough support for nested pause
      filtering, (when the host doesn't intercept PAUSE) (either disabled with
      kvm module param, or disabled with '-overcommit cpu-pm=on')
      
      Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards,
      the feature was exposed as supported by KVM cpuid unconditionally, thus
      if L1 could try to use it even when the L0 KVM can't really support it.
      
      In this case the fallback caused KVM to intercept each PAUSE instruction;
      in some cases, such intercept can slow down the nested guest so much
      that it can fail to boot.  Instead, before the problematic commit KVM
      was already setting both thresholds to 0 in vmcb02, but after the first
      userspace VM exit shrink_ple_window was called and would reset the
      pause_filter_count to the default value.
      
      To fix this, change the fallback strategy - ignore the guest threshold
      values, but use/update the host threshold values unless the guest
      specifically requests disabling PAUSE filtering (either simple or
      advanced).
      
      Also fix a minor bug: on nested VM exit, when PAUSE filter counter
      were copied back to vmcb01, a dirty bit was not set.
      
      Thanks a lot to Suravee Suthikulpanit for debugging this!
      
      Fixes: 74fd41ed ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE")
      Reported-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Tested-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Co-developed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e3cdaab5
    • M
      KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put · ba8ec273
      Maxim Levitsky 提交于
      Now that these functions are always called with preemption disabled,
      remove the preempt_disable()/preempt_enable() pair inside them.
      
      No functional change intended.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ba8ec273
    • M
      KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking · 18869f26
      Maxim Levitsky 提交于
      On SVM, if preemption happens right after the call to finish_rcuwait
      but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself
      will re-enable AVIC, and then we will try to re-enable it again
      in kvm_arch_vcpu_unblocking which will lead to a warning
      in __avic_vcpu_load.
      
      The same problem can happen if the vCPU is preempted right after the call
      to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait
      and in this case, we will end up with AVIC enabled during sleep -
      Ooops.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      18869f26
    • M
      KVM: x86: disable preemption while updating apicv inhibition · 66c768d3
      Maxim Levitsky 提交于
      Currently nothing prevents preemption in kvm_vcpu_update_apicv.
      
      On SVM, If the preemption happens after we update the
      vcpu->arch.apicv_active, the preemption itself will
      'update' the inhibition since the AVIC will be first disabled
      on vCPU unload and then enabled, when the current task
      is loaded again.
      
      Then we will try to update it again, which will lead to a warning
      in __avic_vcpu_load, that the AVIC is already enabled.
      
      Fix this by disabling preemption in this code.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66c768d3
    • M
      KVM: x86: SVM: fix avic_kick_target_vcpus_fast · 603ccef4
      Maxim Levitsky 提交于
      There are two issues in avic_kick_target_vcpus_fast
      
      1. It is legal to issue an IPI request with APIC_DEST_NOSHORT
         and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic),
         which must be treated as a broadcast destination.
      
         Fix this by explicitly checking for it.
         Also don’t use ‘index’ in this case as it gives no new information.
      
      2. It is legal to issue a logical IPI request to more than one target.
         Index field only provides index in physical id table of first
         such target and therefore can't be used before we are sure
         that only a single target was addressed.
      
         Instead, parse the ICRL/ICRH, double check that a unicast interrupt
         was requested, and use that info to figure out the physical id
         of the target vCPU.
         At that point there is no need to use the index field as well.
      
      In addition to fixing the above	issues,	also skip the call to
      kvm_apic_match_dest.
      
      It is possible to do this now, because now as long as AVIC is not
      inhibited, it is guaranteed that none of the vCPUs changed their
      apic id from its default value.
      
      This fixes boot of windows guest with AVIC enabled because it uses
      IPI with 0xFF destination and no destination shorthand.
      
      Fixes: 7223fd2d ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      603ccef4