1. 27 1月, 2022 22 次提交
    • L
      KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time · 05a9e065
      Like Xu 提交于
      XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by
      both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS
      also needs to update kvm_update_cpuid_runtime(). In the above cases, the
      size in bytes of the XSAVE area containing all states enabled by XCR0 or
      (XCRO | IA32_XSS) needs to be updated.
      
      For simplicity and consistency, existing helpers are used to write values
      and call kvm_update_cpuid_runtime(), and it's not exactly a fast path.
      
      Fixes: a554d207 ("KVM: X86: Processor States following Reset or INIT")
      Cc: stable@vger.kernel.org
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220126172226.2298529-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05a9e065
    • L
      KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS · 4c282e51
      Like Xu 提交于
      Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the
      size in bytes of the XSAVE area is affected by the states enabled in XSS.
      
      Fixes: 20300099 ("kvm: vmx: add MSR logic for XSAVES")
      Cc: stable@vger.kernel.org
      Signed-off-by: NLike Xu <likexu@tencent.com>
      [sean: split out as a separate patch, adjust Fixes tag]
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220126172226.2298529-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4c282e51
    • X
      KVM: x86: Keep MSR_IA32_XSS unchanged for INIT · be4f3b3f
      Xiaoyao Li 提交于
      It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to
      zero on Power up and Reset but keeps unchanged on INIT.
      
      Fixes: a554d207 ("KVM: X86: Processor States following Reset or INIT")
      Cc: stable@vger.kernel.org
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220126172226.2298529-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      be4f3b3f
    • S
      KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} · 811f95ff
      Sean Christopherson 提交于
      Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN
      KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid()
      free the array only on failure.
      
       BUG: memory leak
       unreferenced object 0xffff88810963a800 (size 2048):
        comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00  ................
          47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00  GenuntelineI....
        backtrace:
          [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline]
          [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580
          [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline]
          [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199
          [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423
          [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251
          [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066
          [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline]
          [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline]
          [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline]
          [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
          [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: c6617c61 ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+be576ad7655690586eec@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125210445.2053429-1-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      811f95ff
    • S
      KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 · d6e656cd
      Sean Christopherson 提交于
      WARN if KVM attempts to allocate a shadow VMCS for vmcs02.  KVM emulates
      VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate
      a "real" shadow VMCS for L2.
      
      The previous code WARNed but continued anyway with the allocation,
      presumably in an attempt to avoid NULL pointer dereference.
      However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and
      indeed the sole caller does:
      
      	if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu))
      		goto out_shadow_vmcs;
      
      which makes it not a useful attempt.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220527.2093146-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d6e656cd
    • V
      KVM: x86: Check .flags in kvm_cpuid_check_equal() too · 033a3ea5
      Vitaly Kuznetsov 提交于
      kvm_cpuid_check_equal() checks for the (full) equality of the supplied
      CPUID data so .flags need to be checked too.
      Reported-by: NSean Christopherson <seanjc@google.com>
      Fixes: c6617c61 ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220126131804.2839410-1-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      033a3ea5
    • S
      KVM: x86: Forcibly leave nested virt when SMM state is toggled · f7e57078
      Sean Christopherson 提交于
      Forcibly leave nested virtualization operation if userspace toggles SMM
      state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
      forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
      vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
      vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
      
      Don't attempt to gracefully handle the transition as (a) most transitions
      are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
      sufficient information to handle all transitions, e.g. SVM wants access
      to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
      KVM_SET_NESTED_STATE during state restore as the latter disallows putting
      the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
      being post-VMXON in SMM if SMM is not active.
      
      Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
      due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
      just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
      in an architecturally impossible state.
      
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Modules linked in:
        CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
        Call Trace:
         <TASK>
         kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
         kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
         kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
         kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
         kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
         kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
         kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
         __fput+0x286/0x9f0 fs/file_table.c:311
         task_work_run+0xdd/0x1a0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0xb29/0x2a30 kernel/exit.c:806
         do_group_exit+0xd2/0x2f0 kernel/exit.c:935
         get_signal+0x4b0/0x28c0 kernel/signal.c:2862
         arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
         do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220358.2091737-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7e57078
    • V
      KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() · aa3b39f3
      Vitaly Kuznetsov 提交于
      Commit 3fa5e8fd ("KVM: SVM: delay svm_vcpu_init_msrpm after
      svm->vmcb is initialized") re-arranged svm_vcpu_init_msrpm() call in
      svm_create_vcpu(), thus making the comment about vmcb being NULL
      obsolete. Drop it.
      
      While on it, drop superfluous vmcb_is_clean() check: vmcb_mark_dirty()
      is a bit flip, an extra check is unlikely to bring any performance gain.
      Drop now-unneeded vmcb_is_clean() helper as well.
      
      Fixes: 3fa5e8fd ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211220152139.418372-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aa3b39f3
    • V
      KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real · 38dfa830
      Vitaly Kuznetsov 提交于
      Commit c4327f15 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support")
      introduced enlightened MSR-Bitmap support for KVM-on-Hyper-V but it didn't
      actually enable the support. Similar to enlightened NPT TLB flush and
      direct TLB flush features, the guest (KVM) has to tell L0 (Hyper-V) that
      it's using the feature by setting the appropriate feature fit in VMCB
      control area (sw reserved fields).
      
      Fixes: c4327f15 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211220152139.418372-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      38dfa830
    • S
      KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode · cdf85e0c
      Sean Christopherson 提交于
      Inject a #GP instead of synthesizing triple fault to try to avoid killing
      the guest if emulation of an SEV guest fails due to encountering the SMAP
      erratum.  The injected #GP may still be fatal to the guest, e.g. if the
      userspace process is providing critical functionality, but KVM should
      make every attempt to keep the guest alive.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-10-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cdf85e0c
    • S
      KVM: SVM: Don't apply SEV+SMAP workaround on code fetch or PT access · 3280cc22
      Sean Christopherson 提交于
      Resume the guest instead of synthesizing a triple fault shutdown if the
      instruction bytes buffer is empty due to the #NPF being on the code fetch
      itself or on a page table access.  The SMAP errata applies if and only if
      the code fetch was successful and ucode's subsequent data read from the
      code page encountered a SMAP violation.  In practice, the guest is likely
      hosed either way, but crashing the guest on a code fetch to emulated MMIO
      is technically wrong according to the behavior described in the APM.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3280cc22
    • S
      KVM: SVM: Inject #UD on attempted emulation for SEV guest w/o insn buffer · 04c40f34
      Sean Christopherson 提交于
      Inject #UD if KVM attempts emulation for an SEV guests without an insn
      buffer and instruction decoding is required.  The previous behavior of
      allowing emulation if there is no insn buffer is undesirable as doing so
      means KVM is reading guest private memory and thus decoding cyphertext,
      i.e. is emulating garbage.  The check was previously necessary as the
      emulation type was not provided, i.e. SVM needed to allow emulation to
      handle completion of emulation after exiting to userspace to handle I/O.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      04c40f34
    • S
      KVM: SVM: WARN if KVM attempts emulation on #UD or #GP for SEV guests · 132627c6
      Sean Christopherson 提交于
      WARN if KVM attempts to emulate in response to #UD or #GP for SEV guests,
      i.e. if KVM intercepts #UD or #GP, as emulation on any fault except #NPF
      is impossible since KVM cannot read guest private memory to get the code
      stream, and the CPU's DecodeAssists feature only provides the instruction
      bytes on #NPF.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-7-seanjc@google.com>
      [Warn on EMULTYPE_TRAP_UD_FORCED according to Liam Merwick's review. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      132627c6
    • S
      KVM: x86: Pass emulation type to can_emulate_instruction() · 4d31d9ef
      Sean Christopherson 提交于
      Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that
      a future commit can harden KVM's SEV support to WARN on emulation
      scenarios that should never happen.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d31d9ef
    • S
      KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support · c532f290
      Sean Christopherson 提交于
      Add a sanity check on DECODEASSIST being support if SEV is supported, as
      KVM cannot read guest private memory and thus relies on the CPU to
      provide the instruction byte stream on #NPF for emulation.  The intent of
      the check is to document the dependency, it should never fail in practice
      as producing hardware that supports SEV but not DECODEASSISTS would be
      non-sensical.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c532f290
    • S
      KVM: SVM: Don't intercept #GP for SEV guests · 0b0be065
      Sean Christopherson 提交于
      Never intercept #GP for SEV guests as reading SEV guest private memory
      will return cyphertext, i.e. emulating on #GP can't work as intended.
      
      Cc: stable@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b0be065
    • S
      Revert "KVM: SVM: avoid infinite loop on NPF from bad address" · 31c25585
      Sean Christopherson 提交于
      Revert a completely broken check on an "invalid" RIP in SVM's workaround
      for the DecodeAssists SMAP errata.  kvm_vcpu_gfn_to_memslot() obviously
      expects a gfn, i.e. operates in the guest physical address space, whereas
      RIP is a virtual (not even linear) address.  The "fix" worked for the
      problematic KVM selftest because the test identity mapped RIP.
      
      Fully revert the hack instead of trying to translate RIP to a GPA, as the
      non-SEV case is now handled earlier, and KVM cannot access guest page
      tables to translate RIP.
      
      This reverts commit e72436bc.
      
      Fixes: e72436bc ("KVM: SVM: avoid infinite loop on NPF from bad address")
      Reported-by: NLiam Merwick <liam.merwick@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      31c25585
    • S
      KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests · 55467fcd
      Sean Christopherson 提交于
      Always signal that emulation is possible for !SEV guests regardless of
      whether or not the CPU provided a valid instruction byte stream.  KVM can
      read all guest state (memory and registers) for !SEV guests, i.e. can
      fetch the code stream from memory even if the CPU failed to do so because
      of the SMAP errata.
      
      Fixes: 05d5a486 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
      Cc: stable@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      55467fcd
    • D
      KVM: x86: nSVM: skip eax alignment check for non-SVM instructions · 47c28d43
      Denis Valeev 提交于
      The bug occurs on #GP triggered by VMware backdoor when eax value is
      unaligned. eax alignment check should not be applied to non-SVM
      instructions because it leads to incorrect omission of the instructions
      emulation.
      Apply the alignment check only to SVM instructions to fix.
      
      Fixes: d1cba6c9 ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround")
      Signed-off-by: NDenis Valeev <lemniscattaden@gmail.com>
      Message-Id: <Yexlhaoe1Fscm59u@q>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c28d43
    • L
      KVM: x86/cpuid: Exclude unpermitted xfeatures sizes at KVM_GET_SUPPORTED_CPUID · 1ffce092
      Like Xu 提交于
      With the help of xstate_get_guest_group_perm(), KVM can exclude unpermitted
      xfeatures in cpuid.0xd.0.eax, in which case the corresponding xfeatures
      sizes should also be matched to the permitted xfeatures.
      
      To fix this inconsistency, the permitted_xcr0 and permitted_xss are defined
      consistently, which implies 'supported' plus certain permissions for this
      task, and it also fixes cpuid.0xd.1.ebx and later leaf-by-leaf queries.
      
      Fixes: 445ecdf7 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID")
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20220125115223.33707-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ffce092
    • W
      KVM: LAPIC: Also cancel preemption timer during SET_LAPIC · 35fe7cfb
      Wanpeng Li 提交于
      The below warning is splatting during guest reboot.
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 1931 at arch/x86/kvm/x86.c:10322 kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
        CPU: 0 PID: 1931 Comm: qemu-system-x86 Tainted: G          I       5.17.0-rc1+ #5
        RIP: 0010:kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
        Call Trace:
         <TASK>
         kvm_vcpu_ioctl+0x279/0x710 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fd39797350b
      
      This can be triggered by not exposing tsc-deadline mode and doing a reboot in
      the guest. The lapic_shutdown() function which is called in sys_reboot path
      will not disarm the flying timer, it just masks LVTT. lapic_shutdown() clears
      APIC state w/ LVT_MASKED and timer-mode bit is 0, this can trigger timer-mode
      switch between tsc-deadline and oneshot/periodic, which can result in preemption
      timer be cancelled in apic_update_lvtt(). However, We can't depend on this when
      not exposing tsc-deadline mode and oneshot/periodic modes emulated by preemption
      timer. Qemu will synchronise states around reset, let's cancel preemption timer
      under KVM_SET_LAPIC.
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1643102220-35667-1-git-send-email-wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      35fe7cfb
    • J
      KVM: VMX: Remove vmcs_config.order · 519669cc
      Jim Mattson 提交于
      The maximum size of a VMCS (or VMXON region) is 4096. By definition,
      these are order 0 allocations.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20220125004359.147600-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      519669cc
  2. 25 1月, 2022 3 次提交
    • Q
      KVM/X86: Make kvm_vcpu_reload_apic_access_page() static · d081a343
      Quanfa Fu 提交于
      Make kvm_vcpu_reload_apic_access_page() static
      as it is no longer invoked directly by vmx
      and it is also no longer exported.
      
      No functional change intended.
      Signed-off-by: NQuanfa Fu <quanfafu@gmail.com>
      Message-Id: <20211219091446.174584-1-quanfafu@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d081a343
    • S
      KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow · b9bed78e
      Sean Christopherson 提交于
      Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
      breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
      MOVSS blocking is active.  Setting the flag is necessary to make VM-Entry
      consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
      set and STI/MOVSS blocking is true, then the previous instruction must
      have been STI or MOV/POP, and therefore a single-step #DB must be pending
      since the RFLAGS.TF cannot have been set by the previous instruction,
      i.e. the one instruction delay after setting RFLAGS.TF must have already
      expired.
      
      Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
      when recording guest state as part of a VM-Exit, but #DB VM-Exits
      intentionally do not treat the #DB as "guest state" as interception of
      the #DB effectively makes the #DB host-owned, thus KVM needs to manually
      set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.
      
      Note, although this bug can be triggered by guest userspace, doing so
      requires IOPL=3, and guest userspace running with IOPL=3 has full access
      to all I/O ports (from the guest's perspective) and can crash/reboot the
      guest any number of ways.  IOPL=3 is required because STI blocking kicks
      in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
      takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.
      
      MOVSS blocking can be initiated by userspace, but can be coincident with
      a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
      executed in the MOVSS shadow.  MOV DR #GPs at CPL>0, thus MOVSS blocking
      is problematic only for CPL0 (and only if the guest is crazy enough to
      access a DR in a MOVSS shadow).  All other sources of #DBs are either
      suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
      are mutually exclusive with MOVSS blocking (T-bit task switch), or are
      already handled by KVM (ICEBP, a.k.a. INT1).
      
      This bug was originally found by running tests[1] created for XSA-308[2].
      Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
      presumably why the Xen bug was deemed to be an exploitable DOS from guest
      userspace.  KVM already handles ICEBP by skipping the ICEBP instruction
      and thus clears MOVSS blocking as a side effect of its "emulation".
      
      [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
      [2] https://xenbits.xen.org/xsa/advisory-308.htmlReported-by: NDavid Woodhouse <dwmw2@infradead.org>
      Reported-by: NAlexander Graf <graf@amazon.de>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220120000624.655815-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b9bed78e
    • V
      KVM: x86: Move CPUID.(EAX=0x12,ECX=1) mangling to __kvm_update_cpuid_runtime() · 5c89be1d
      Vitaly Kuznetsov 提交于
      Full equality check of CPUID data on update (kvm_cpuid_check_equal()) may
      fail for SGX enabled CPUs as CPUID.(EAX=0x12,ECX=1) is currently being
      mangled in kvm_vcpu_after_set_cpuid(). Move it to
      __kvm_update_cpuid_runtime() and split off cpuid_get_supported_xcr0()
      helper  as 'vcpu->arch.guest_supported_xcr0' update needs (logically)
      to stay in kvm_vcpu_after_set_cpuid().
      
      Cc: stable@vger.kernel.org
      Fixes: feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220124103606.2630588-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c89be1d
  3. 24 1月, 2022 2 次提交
    • P
      x86,kvm/xen: Remove superfluous .fixup usage · adb759e5
      Peter Zijlstra 提交于
      Commit 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and
      event channel delivery") adds superfluous .fixup usage after the whole
      .fixup section was removed in commit e5eefda5 ("x86: Remove .fixup
      section").
      
      Fixes: 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery")
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Message-Id: <20220123124219.GH20638@worktop.programming.kicks-ass.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      adb759e5
    • S
      KVM: VMX: Zero host's SYSENTER_ESP iff SYSENTER is NOT used · 94fea1d8
      Sean Christopherson 提交于
      Zero vmcs.HOST_IA32_SYSENTER_ESP when initializing *constant* host state
      if and only if SYSENTER cannot be used, i.e. the kernel is a 64-bit
      kernel and is not emulating 32-bit syscalls.  As the name suggests,
      vmx_set_constant_host_state() is intended for state that is *constant*.
      When SYSENTER is used, SYSENTER_ESP isn't constant because stacks are
      per-CPU, and the VMCS must be updated whenever the vCPU is migrated to a
      new CPU.  The logic in vmx_vcpu_load_vmcs() doesn't differentiate between
      "never loaded" and "loaded on a different CPU", i.e. setting SYSENTER_ESP
      on VMCS load also handles setting correct host state when the VMCS is
      first loaded.
      
      Because a VMCS must be loaded before it is initialized during vCPU RESET,
      zeroing the field in vmx_set_constant_host_state() obliterates the value
      that was written when the VMCS was loaded.  If the vCPU is run before it
      is migrated, the subsequent VM-Exit will zero out MSR_IA32_SYSENTER_ESP,
      leading to a #DF on the next 32-bit syscall.
      
        double fault: 0000 [#1] SMP
        CPU: 0 PID: 990 Comm: stable Not tainted 5.16.0+ #97
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        EIP: entry_SYSENTER_32+0x0/0xe7
        Code: <9c> 50 eb 17 0f 20 d8 a9 00 10 00 00 74 0d 25 ff ef ff ff 0f 22 d8
        EAX: 000000a2 EBX: a8d1300c ECX: a8d13014 EDX: 00000000
        ESI: a8f87000 EDI: a8d13014 EBP: a8d12fc0 ESP: 00000000
        DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210093
        CR0: 80050033 CR2: fffffffc CR3: 02c3b000 CR4: 00152e90
      
      Fixes: 6ab8a405 ("KVM: VMX: Avoid to rdmsrl(MSR_IA32_SYSENTER_ESP)")
      Cc: Lai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220122015211.1468758-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      94fea1d8
  4. 20 1月, 2022 13 次提交