1. 05 6月, 2019 5 次提交
    • W
      KVM: LAPIC: Delay trace_kvm_wait_lapic_expire tracepoint to after vmexit · ec0671d5
      Wanpeng Li 提交于
      wait_lapic_expire() call was moved above guest_enter_irqoff() because of
      its tracepoint, which violated the RCU extended quiescent state invoked
      by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint
      below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before
      VM-Enter, but trace it after VM-Exit. This can help us to move
      wait_lapic_expire() just before vmentry in the later patch.
      
      [1] Commit 8b89fe1f ("kvm: x86: move tracepoints outside extended quiescent state")
      [2] https://patchwork.kernel.org/patch/7821111/
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Track whether wait_lapic_expire was called, and do not invoke the tracepoint
       if not. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ec0671d5
    • W
      KVM: LAPIC: Extract adaptive tune timer advancement logic · 84ea3aca
      Wanpeng Li 提交于
      Extract adaptive tune timer advancement logic to a single function.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Rename new function. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      84ea3aca
    • V
      KVM/nSVM: properly map nested VMCB · 8f38302c
      Vitaly Kuznetsov 提交于
      Commit 8c5fbf1a ("KVM/nSVM: Use the new mapping API for mapping guest
      memory") broke nested SVM completely: kvm_vcpu_map()'s second parameter is
      GFN so vmcb_gpa needs to be converted with gpa_to_gfn(), not the other way
      around.
      
      Fixes: 8c5fbf1a ("KVM/nSVM: Use the new mapping API for mapping guest memory")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f38302c
    • K
      kvm: x86: Fix reserved bits related calculation errors caused by MKTME · f3ecb59d
      Kai Huang 提交于
      Intel MKTME repurposes several high bits of physical address as 'keyID'
      for memory encryption thus effectively reduces platform's maximum
      physical address bits. Exactly how many bits are reduced is configured
      by BIOS. To honor such HW behavior, the repurposed bits are reduced from
      cpuinfo_x86->x86_phys_bits when MKTME is detected in CPU detection.
      Similarly, AMD SME/SEV also reduces physical address bits for memory
      encryption, and cpuinfo->x86_phys_bits is reduced too when SME/SEV is
      detected, so for both MKTME and SME/SEV, boot_cpu_data.x86_phys_bits
      doesn't hold physical address bits reported by CPUID anymore.
      
      Currently KVM treats bits from boot_cpu_data.x86_phys_bits to 51 as
      reserved bits, but it's not true anymore for MKTME, since MKTME treats
      those reduced bits as 'keyID', but not reserved bits. Therefore
      boot_cpu_data.x86_phys_bits cannot be used to calculate reserved bits
      anymore, although we can still use it for AMD SME/SEV since SME/SEV
      treats the reduced bits differently -- they are treated as reserved
      bits, the same as other reserved bits in page table entity [1].
      
      Fix by introducing a new 'shadow_phys_bits' variable in KVM x86 MMU code
      to store the effective physical bits w/o reserved bits -- for MKTME,
      it equals to physical address reported by CPUID, and for SME/SEV, it is
      boot_cpu_data.x86_phys_bits.
      
      Note that for the physical address bits reported to guest should remain
      unchanged -- KVM should report physical address reported by CPUID to
      guest, but not boot_cpu_data.x86_phys_bits. Because for Intel MKTME,
      there's no harm if guest sets up 'keyID' bits in guest page table (since
      MKTME only works at physical address level), and KVM doesn't even expose
      MKTME to guest. Arguably, for AMD SME/SEV, guest is aware of SEV thus it
      should adjust boot_cpu_data.x86_phys_bits when it detects SEV, therefore
      KVM should still reports physcial address reported by CPUID to guest.
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f3ecb59d
    • K
      kvm: x86: Move kvm_set_mmio_spte_mask() from x86.c to mmu.c · 7b6f8a06
      Kai Huang 提交于
      As a prerequisite to fix several SPTE reserved bits related calculation
      errors caused by MKTME, which requires kvm_set_mmio_spte_mask() to use
      local static variable defined in mmu.c.
      
      Also move call site of kvm_set_mmio_spte_mask() from kvm_arch_init() to
      kvm_mmu_module_init() so that kvm_set_mmio_spte_mask() can be static.
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7b6f8a06
  2. 28 5月, 2019 1 次提交
    • T
      KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · a86cb413
      Thomas Huth 提交于
      KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
      architectures. However, on s390x, the amount of usable CPUs is determined
      during runtime - it is depending on the features of the machine the code
      is running on. Since we are using the vcpu_id as an index into the SCA
      structures that are defined by the hardware (see e.g. the sca_add_vcpu()
      function), it is not only the amount of CPUs that is limited by the hard-
      ware, but also the range of IDs that we can use.
      Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
      So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
      code into the architecture specific code, and on s390x we have to return
      the same value here as for KVM_CAP_MAX_VCPUS.
      This problem has been discovered with the kvm_create_max_vcpus selftest.
      With this change applied, the selftest now passes on s390x, too.
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Message-Id: <20190523164309.13345-9-thuth@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      a86cb413
  3. 25 5月, 2019 15 次提交
    • P
      KVM: x86: fix return value for reserved EFER · 66f61c92
      Paolo Bonzini 提交于
      Commit 11988499 ("KVM: x86: Skip EFER vs. guest CPUID checks for
      host-initiated writes", 2019-04-02) introduced a "return false" in a
      function returning int, and anyway set_efer has a "nonzero on error"
      conventon so it should be returning 1.
      Reported-by: NPavel Machek <pavel@denx.de>
      Fixes: 11988499 ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66f61c92
    • P
      KVM: x86/pmu: do not mask the value that is written to fixed PMUs · 2924b521
      Paolo Bonzini 提交于
      According to the SDM, for MSR_IA32_PERFCTR0/1 "the lower-order 32 bits of
      each MSR may be written with any value, and the high-order 8 bits are
      sign-extended according to the value of bit 31", but the fixed counters
      in real hardware are limited to the width of the fixed counters ("bits
      beyond the width of the fixed-function counter are reserved and must be
      written as zeros").  Fix KVM to do the same.
      Reported-by: NNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2924b521
    • P
      KVM: x86/pmu: mask the result of rdpmc according to the width of the counters · 0e6f467e
      Paolo Bonzini 提交于
      This patch will simplify the changes in the next, by enforcing the
      masking of the counters to RDPMC and RDMSR.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0e6f467e
    • B
      x86/kvm/pmu: Set AMD's virt PMU version to 1 · a80c4ec1
      Borislav Petkov 提交于
      After commit:
      
        672ff6cf ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
      
      my AMD guests started #GPing like this:
      
        general protection fault: 0000 [#1] PREEMPT SMP
        CPU: 1 PID: 4355 Comm: bash Not tainted 5.1.0-rc6+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        RIP: 0010:x86_perf_event_update+0x3b/0xa0
      
      with Code: pointing to RDPMC. It is RDPMC because the guest has the
      hardware watchdog CONFIG_HARDLOCKUP_DETECTOR_PERF enabled which uses
      perf. Instrumenting kvm_pmu_rdpmc() some, showed that it fails due to:
      
        if (!pmu->version)
        	return 1;
      
      which the above commit added. Since AMD's PMU leaves the version at 0,
      that causes the #GP injection into the guest.
      
      Set pmu->version arbitrarily to 1 and move it above the non-applicable
      struct kvm_pmu members.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Mihai Carabas <mihai.carabas@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 672ff6cf ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a80c4ec1
    • P
      KVM: x86: do not spam dmesg with VMCS/VMCB dumps · 6f2f8453
      Paolo Bonzini 提交于
      Userspace can easily set up invalid processor state in such a way that
      dmesg will be filled with VMCS or VMCB dumps.  Disable this by default
      using a module parameter.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6f2f8453
    • P
      kvm: Check irqchip mode before assign irqfd · 654f1f13
      Peter Xu 提交于
      When assigning kvm irqfd we didn't check the irqchip mode but we allow
      KVM_IRQFD to succeed with all the irqchip modes.  However it does not
      make much sense to create irqfd even without the kernel chips.  Let's
      provide a arch-dependent helper to check whether a specific irqfd is
      allowed by the arch.  At least for x86, it should make sense to check:
      
      - when irqchip mode is NONE, all irqfds should be disallowed, and,
      
      - when irqchip mode is SPLIT, irqfds that are with resamplefd should
        be disallowed.
      
      For either of the case, previously we'll silently ignore the irq or
      the irq ack event if the irqchip mode is incorrect.  However that can
      cause misterious guest behaviors and it can be hard to triage.  Let's
      fail KVM_IRQFD even earlier to detect these incorrect configurations.
      
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Radim Krčmář <rkrcmar@redhat.com>
      CC: Alex Williamson <alex.williamson@redhat.com>
      CC: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      654f1f13
    • S
      kvm: svm/avic: fix off-by-one in checking host APIC ID · c9bcd3e3
      Suthikulpanit, Suravee 提交于
      Current logic does not allow VCPU to be loaded onto CPU with
      APIC ID 255. This should be allowed since the host physical APIC ID
      field in the AVIC Physical APIC table entry is an 8-bit value,
      and APIC ID 255 is valid in system with x2APIC enabled.
      Instead, do not allow VCPU load if the host APIC ID cannot be
      represented by an 8-bit value.
      
      Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK
      instead of AVIC_MAX_PHYSICAL_ID_COUNT.
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9bcd3e3
    • W
      KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace · 16ba3ab4
      Wanpeng Li 提交于
      Expose per-vCPU timer_advance_ns to userspace, so it is able to
      query the auto-adjusted value.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      16ba3ab4
    • W
      KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow · 0e6edceb
      Wanpeng Li 提交于
      After commit c3941d9e (KVM: lapic: Allow user to disable adaptive tuning of
      timer advancement), '-1' enables adaptive tuning starting from default
      advancment of 1000ns. However, we should expose an int instead of an overflow
      uint module parameter.
      
      Before patch:
      
      /sys/module/kvm/parameters/lapic_timer_advance_ns:4294967295
      
      After patch:
      
      /sys/module/kvm/parameters/lapic_timer_advance_ns:-1
      
      Fixes: c3941d9e (KVM: lapic: Allow user to disable adaptive tuning of timer advancement)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0e6edceb
    • Y
      kvm: vmx: Fix -Wmissing-prototypes warnings · 4d259965
      Yi Wang 提交于
      We get a warning when build kernel W=1:
      arch/x86/kvm/vmx/vmx.c:6365:6: warning: no previous prototype for ‘vmx_update_host_rsp’ [-Wmissing-prototypes]
       void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
      
      Add the missing declaration to fix this.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d259965
    • W
      KVM: nVMX: Fix using __this_cpu_read() in preemptible context · 541e886f
      Wanpeng Li 提交于
       BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590
        caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
        CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G           OE     5.1.0-rc4+ #1
        Call Trace:
         dump_stack+0x67/0x95
         __this_cpu_preempt_check+0xd2/0xe0
         nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
         nested_vmx_run+0xda/0x2b0 [kvm_intel]
         handle_vmlaunch+0x13/0x20 [kvm_intel]
         vmx_handle_exit+0xbd/0x660 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm]
         kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm]
         do_vfs_ioctl+0xa5/0x6e0
         ksys_ioctl+0x6d/0x80
         __x64_sys_ioctl+0x1a/0x20
         do_syscall_64+0x6f/0x6c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Accessing per-cpu variable should disable preemption, this patch extends the
      preemption disable region for __this_cpu_read().
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Fixes: 52017608 ("KVM: nVMX: add option to perform early consistency checks via H/W")
      Cc: stable@vger.kernel.org
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      541e886f
    • J
      kvm: x86: Include CPUID leaf 0x8000001e in kvm's supported CPUID · 382409b4
      Jim Mattson 提交于
      Kvm now supports extended CPUID functions through 0x8000001f.  CPUID
      leaf 0x8000001e is AMD's Processor Topology Information leaf. This
      contains similar information to CPUID leaf 0xb (Intel's Extended
      Topology Enumeration leaf), and should be included in the output of
      KVM_GET_SUPPORTED_CPUID, even though userspace is likely to override
      some of this information based upon the configuration of the
      particular VM.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Fixes: 8765d753 ("KVM: X86: Extend CPUID range to include new leaf")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      382409b4
    • J
      kvm: x86: Include multiple indices with CPUID leaf 0x8000001d · 32a243df
      Jim Mattson 提交于
      Per the APM, "CPUID Fn8000_001D_E[D,C,B,A]X reports cache topology
      information for the cache enumerated by the value passed to the
      instruction in ECX, referred to as Cache n in the following
      description. To gather information for all cache levels, software must
      repeatedly execute CPUID with 8000_001Dh in EAX and ECX set to
      increasing values beginning with 0 until a value of 00h is returned in
      the field CacheType (EAX[4:0]) indicating no more cache descriptions
      are available for this processor."
      
      The termination condition is the same as leaf 4, so we can reuse that
      code block for leaf 0x8000001d.
      
      Fixes: 8765d753 ("KVM: X86: Extend CPUID range to include new leaf")
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      32a243df
    • S
      KVM: nVMX: Clear nested_run_pending if setting nested state fails · 21be4ca1
      Sean Christopherson 提交于
      VMX's nested_run_pending flag is subtly consumed when stuffing state to
      enter guest mode, i.e. needs to be set according before KVM knows if
      setting guest state is successful.  If setting guest state fails, clear
      the flag as a nested run is obviously not pending.
      Reported-by: NAaron Lewis <aaronlewis@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      21be4ca1
    • P
      KVM: nVMX: really fix the size checks on KVM_SET_NESTED_STATE · db80927e
      Paolo Bonzini 提交于
      The offset for reading the shadow VMCS is sizeof(*kvm_state)+VMCS12_SIZE,
      so the correct size must be that plus sizeof(*vmcs12).  This could lead
      to KVM reading garbage data from userspace and not reporting an error,
      but is otherwise not sensitive.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db80927e
  4. 24 5月, 2019 7 次提交
  5. 21 5月, 2019 5 次提交
  6. 18 5月, 2019 1 次提交
  7. 17 5月, 2019 1 次提交
  8. 16 5月, 2019 4 次提交
    • A
      x86/speculation/mds: Revert CPU buffer clear on double fault exit · 88640e1d
      Andy Lutomirski 提交于
      The double fault ESPFIX path doesn't return to user mode at all --
      it returns back to the kernel by simulating a #GP fault.
      prepare_exit_to_usermode() will run on the way out of
      general_protection before running user code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 04dcbdb8 ("x86/speculation/mds: Clear CPU buffers on exit to user")
      Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      88640e1d
    • S
      Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" · f93f7ede
      Sean Christopherson 提交于
      The RDPMC-exiting control is dependent on the existence of the RDPMC
      instruction itself, i.e. is not tied to the "Architectural Performance
      Monitoring" feature.  For all intents and purposes, the control exists
      on all CPUs with VMX support since RDPMC also exists on all VCPUs with
      VMX supported.  Per Intel's SDM:
      
        The RDPMC instruction was introduced into the IA-32 Architecture in
        the Pentium Pro processor and the Pentium processor with MMX technology.
        The earlier Pentium processors have performance-monitoring counters, but
        they must be read with the RDMSR instruction.
      
      Because RDPMC-exiting always exists, KVM requires the control and refuses
      to load if it's not available.  As a result, hiding the PMU from a guest
      breaks nested virtualization if the guest attemts to use KVM.
      
      While it's not explicitly stated in the RDPMC pseudocode, the VM-Exit
      check for RDPMC-exiting follows standard fault vs. VM-Exit prioritization
      for privileged instructions, e.g. occurs after the CPL/CR0.PE/CR4.PCE
      checks, but before the counter referenced in ECX is checked for validity.
      
      In other words, the original KVM behavior of injecting a #GP was correct,
      and the KVM unit test needs to be adjusted accordingly, e.g. eat the #GP
      when the unit test guest (L3 in this case) executes RDPMC without
      RDPMC-exiting set in the unit test host (L2).
      
      This reverts commit e51bfdb6.
      
      Fixes: e51bfdb6 ("KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU")
      Reported-by: NDavid Hill <hilld@binarystorm.net>
      Cc: Saar Amar <saaramar@microsoft.com>
      Cc: Mihai Carabas <mihai.carabas@oracle.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f93f7ede
    • K
      kvm: x86: Fix L1TF mitigation for shadow MMU · 61455bf2
      Kai Huang 提交于
      Currently KVM sets 5 most significant bits of physical address bits
      reported by CPUID (boot_cpu_data.x86_phys_bits) for nonpresent or
      reserved bits SPTE to mitigate L1TF attack from guest when using shadow
      MMU. However for some particular Intel CPUs the physical address bits
      of internal cache is greater than physical address bits reported by
      CPUID.
      
      Use the kernel's existing boot_cpu_data.x86_cache_bits to determine the
      five most significant bits. Doing so improves KVM's L1TF mitigation in
      the unlikely scenario that system RAM overlaps the high order bits of
      the "real" physical address space as reported by CPUID. This aligns with
      the kernel's warnings regarding L1TF mitigation, e.g. in the above
      scenario the kernel won't warn the user about lack of L1TF mitigation
      if x86_cache_bits is greater than x86_phys_bits.
      
      Also initialize shadow_nonpresent_or_rsvd_mask explicitly to make it
      consistent with other 'shadow_{xxx}_mask', and opportunistically add a
      WARN once if KVM's L1TF mitigation cannot be applied on a system that
      is marked as being susceptible to L1TF.
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61455bf2
    • S
      KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible · d69129b4
      Sean Christopherson 提交于
      If L1 is using an MSR bitmap, unconditionally merge the MSR bitmaps from
      L0 and L1 for MSR_{KERNEL,}_{FS,GS}_BASE.  KVM unconditionally exposes
      MSRs L1.  If KVM is also running in L1 then it's highly likely L1 is
      also exposing the MSRs to L2, i.e. KVM doesn't need to intercept L2
      accesses.
      
      Based on code from Jintack Lim.
      
      Cc: Jintack Lim <jintack@xxxxxxxxxxxxxxx>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@xxxxxxxxx>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d69129b4
  9. 15 5月, 2019 1 次提交