1. 05 2月, 2020 12 次提交
  2. 28 1月, 2020 1 次提交
    • J
      kvm/svm: PKU not currently supported · a47970ed
      John Allen 提交于
      Current SVM implementation does not have support for handling PKU. Guests
      running on a host with future AMD cpus that support the feature will read
      garbage from the PKRU register and will hit segmentation faults on boot as
      memory is getting marked as protected that should not be. Ensure that cpuid
      from SVM does not advertise the feature.
      Signed-off-by: NJohn Allen <john.allen@amd.com>
      Cc: stable@vger.kernel.org
      Fixes: 0556cbdc ("x86/pkeys: Don't check if PKRU is zero before writing it")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a47970ed
  3. 24 1月, 2020 5 次提交
  4. 21 1月, 2020 4 次提交
    • T
      KVM: SVM: Override default MMIO mask if memory encryption is enabled · 52918ed5
      Tom Lendacky 提交于
      The KVM MMIO support uses bit 51 as the reserved bit to cause nested page
      faults when a guest performs MMIO. The AMD memory encryption support uses
      a CPUID function to define the encryption bit position. Given this, it is
      possible that these bits can conflict.
      
      Use svm_hardware_setup() to override the MMIO mask if memory encryption
      support is enabled. Various checks are performed to ensure that the mask
      is properly defined and rsvd_bits() is used to generate the new mask (as
      was done prior to the change that necessitated this patch).
      
      Fixes: 28a1f3ac ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52918ed5
    • S
      KVM: x86: Refactor and rename bit() to feature_bit() macro · 87382003
      Sean Christopherson 提交于
      Rename bit() to __feature_bit() to give it a more descriptive name, and
      add a macro, feature_bit(), to stuff the X68_FEATURE_ prefix to keep
      line lengths manageable for code that hardcodes the bit to be retrieved.
      
      No functional change intended.
      
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      87382003
    • S
      KVM: x86: Drop special XSAVE handling from guest_cpuid_has() · 96be4e06
      Sean Christopherson 提交于
      Now that KVM prevents setting host-reserved CR4 bits, drop the dedicated
      XSAVE check in guest_cpuid_has() in favor of open coding similar checks
      in the SVM/VMX XSAVES enabling flows.
      
      Note, checking boot_cpu_has(X86_FEATURE_XSAVE) in the XSAVES flows is
      technically redundant with respect to the CR4 reserved bit checks, e.g.
      XSAVES #UDs if CR4.OSXSAVE=0 and arch.xsaves_enabled is consumed if and
      only if CR4.OXSAVE=1 in guest.  Keep (add?) the explicit boot_cpu_has()
      checks to help document KVM's usage of arch.xsaves_enabled.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      96be4e06
    • W
      KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath · 1e9e2622
      Wanpeng Li 提交于
      ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our
      product observation, multicast IPIs are not as common as unicast IPI like
      RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.
      
      This patch introduce a mechanism to handle certain performance-critical
      WRMSRs in a very early stage of KVM VMExit handler.
      
      This mechanism is specifically used for accelerating writes to x2APIC ICR
      that attempt to send a virtual IPI with physical destination-mode, fixed
      delivery-mode and single target. Which was found as one of the main causes
      of VMExits for Linux workloads.
      
      The reason this mechanism significantly reduce the latency of such virtual
      IPIs is by sending the physical IPI to the target vCPU in a very early stage
      of KVM VMExit handler, before host interrupts are enabled and before expensive
      operations such as reacquiring KVM’s SRCU lock.
      Latency is reduced even more when KVM is able to use APICv posted-interrupt
      mechanism (which allows to deliver the virtual IPI directly to target vCPU
      without the need to kick it to host).
      
      Testing on Xeon Skylake server:
      
      The virtual IPI latency from sender send to receiver receive reduces
      more than 200+ cpu cycles.
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1e9e2622
  5. 09 1月, 2020 1 次提交
  6. 27 11月, 2019 1 次提交
  7. 15 11月, 2019 2 次提交
    • L
      KVM: SVM: Remove check if APICv enabled in SVM update_cr8_intercept() handler · 49d654d8
      Liran Alon 提交于
      This check is unnecessary as x86 update_cr8_intercept() which calls
      this VMX/SVM specific callback already performs this check.
      Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      49d654d8
    • A
      KVM: retpolines: x86: eliminate retpoline from svm.c exit handlers · 3dcb2a3f
      Andrea Arcangeli 提交于
      It's enough to check the exit value and issue a direct call to avoid
      the retpoline for all the common vmexit reasons.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a SVM host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get_update_offsets_now+70
          hrtimer_interrupt+131
          smp_apic_timer_interrupt+106
          apic_timer_interrupt+15
          start_sw_timer+359
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 1940
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_r12+33
          force_qs_rnp+217
          rcu_gp_kthread+1270
          kthread+268
          ret_from_fork+34
      ]: 4644
      @[]: 25095
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41887
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42723
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42766
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42848
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          start_sw_timer+279
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 499845
      
      @total: 1780243
      
      SVM has no TSC based programmable preemption timer so it is invoking
      ktime_get() frequently.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3dcb2a3f
  8. 31 10月, 2019 1 次提交
    • P
      KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active · 9167ab79
      Paolo Bonzini 提交于
      VMX already does so if the host has SMEP, in order to support the combination of
      CR0.WP=1 and CR4.SMEP=1.  However, it is perfectly safe to always do so, and in
      fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
      "load EFER" controls, because it may help avoiding a slow MSR write.  Removing
      all the conditionals simplifies the code.
      
      SVM does not have similar code, but it should since recent AMD processors do
      support SMEP.  So this patch also makes the code for the two vendors more similar
      while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.
      
      Cc: stable@vger.kernel.org
      Cc: Joerg Roedel <jroedel@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9167ab79
  9. 23 10月, 2019 1 次提交
  10. 22 10月, 2019 10 次提交
  11. 24 9月, 2019 2 次提交
    • J
      kvm: svm: Intercept RDPRU · 0cb8410b
      Jim Mattson 提交于
      The RDPRU instruction gives the guest read access to the IA32_APERF
      MSR and the IA32_MPERF MSR. According to volume 3 of the APM, "When
      virtualization is enabled, this instruction can be intercepted by the
      Hypervisor. The intercept bit is at VMCB byte offset 10h, bit 14."
      Since we don't enumerate the instruction in KVM_SUPPORTED_CPUID,
      intercept it and synthesize #UD.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NDrew Schmitt <dasch@google.com>
      Reviewed-by: NJacob Xu <jacobhxu@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0cb8410b
    • S
      KVM: VMX: Handle single-step #DB for EMULTYPE_SKIP on EPT misconfig · 1957aa63
      Sean Christopherson 提交于
      VMX's EPT misconfig flow to handle fast-MMIO path falls back to decoding
      the instruction to determine the instruction length when running as a
      guest (Hyper-V doesn't fill VMCS.VM_EXIT_INSTRUCTION_LEN because it's
      technically not defined for EPT misconfigs).  Rather than implement the
      slow skip in VMX's generic skip_emulated_instruction(),
      handle_ept_misconfig() directly calls kvm_emulate_instruction() with
      EMULTYPE_SKIP, which intentionally doesn't do single-step detection, and
      so handle_ept_misconfig() misses a single-step #DB.
      
      Rework the EPT misconfig fallback case to route it through
      kvm_skip_emulated_instruction() so that single-step #DBs and interrupt
      shadow updates are handled automatically.  I.e. make VMX's slow skip
      logic match SVM's and have the SVM flow not intentionally avoid the
      shadow update.
      
      Alternatively, the handle_ept_misconfig() could manually handle single-
      step detection, but that results in EMULTYPE_SKIP having split logic for
      the interrupt shadow vs. single-step #DBs, and split emulator logic is
      largely what led to this mess in the first place.
      
      Modifying SVM to mirror VMX flow isn't really an option as SVM's case
      isn't limited to a specific exit reason, i.e. handling the slow skip in
      skip_emulated_instruction() is mandatory for all intents and purposes.
      
      Drop VMX's skip_emulated_instruction() wrapper since it can now fail,
      and instead WARN if it fails unexpectedly, e.g. if exit_reason somehow
      becomes corrupted.
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Fixes: d391f120 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1957aa63