1. 14 7月, 2022 1 次提交
    • V
      KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1 · 99482726
      Vitaly Kuznetsov 提交于
      Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to
      hang upon boot or shortly after when a non-default TSC frequency was
      set for L1. The issue is observed on a host where TSC scaling is
      supported. The problem appears to be that Windows doesn't use TSC
      frequency for its guests even when the feature is advertised and KVM
      filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from
      L1's. This leads to L2 running with the default frequency (matching
      host's) while L1 is running with an altered one.
      
      Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when
      it was set for L1. TSC_MULTIPLIER is already correctly computed and
      written by prepare_vmcs02().
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220712135009.952805-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      99482726
  2. 27 6月, 2022 1 次提交
  3. 30 4月, 2022 1 次提交
  4. 22 4月, 2022 1 次提交
    • S
      KVM: nVMX: Defer APICv updates while L2 is active until L1 is active · 7c69661e
      Sean Christopherson 提交于
      Defer APICv updates that occur while L2 is active until nested VM-Exit,
      i.e. until L1 regains control.  vmx_refresh_apicv_exec_ctrl() assumes L1
      is active and (a) stomps all over vmcs02 and (b) neglects to ever updated
      vmcs01.  E.g. if vmcs12 doesn't enable the TPR shadow for L2 (and thus no
      APICv controls), L1 performs nested VM-Enter APICv inhibited, and APICv
      becomes unhibited while L2 is active, KVM will set various APICv controls
      in vmcs02 and trigger a failed VM-Entry.  The kicker is that, unless
      running with nested_early_check=1, KVM blames L1 and chaos ensues.
      
      In all cases, ignoring vmcs02 and always deferring the inhibition change
      to vmcs01 is correct (or at least acceptable).  The ABSENT and DISABLE
      inhibitions cannot truly change while L2 is active (see below).
      
      IRQ_BLOCKING can change, but it is firmly a best effort debug feature.
      Furthermore, only L2's APIC is accelerated/virtualized to the full extent
      possible, e.g. even if L1 passes through its APIC to L2, normal MMIO/MSR
      interception will apply to the virtual APIC managed by KVM.
      The exception is the SELF_IPI register when x2APIC is enabled, but that's
      an acceptable hole.
      
      Lastly, Hyper-V's Auto EOI can technically be toggled if L1 exposes the
      MSRs to L2, but for that to work in any sane capacity, L1 would need to
      pass through IRQs to L2 as well, and IRQs must be intercepted to enable
      virtual interrupt delivery.  I.e. exposing Auto EOI to L2 and enabling
      VID for L2 are, for all intents and purposes, mutually exclusive.
      
      Lack of dynamic toggling is also why this scenario is all but impossible
      to encounter in KVM's current form.  But a future patch will pend an
      APICv update request _during_ vCPU creation to plug a race where a vCPU
      that's being created doesn't get included in the "all vCPUs request"
      because it's not yet visible to other vCPUs.  If userspaces restores L2
      after VM creation (hello, KVM selftests), the first KVM_RUN will occur
      while L2 is active and thus service the APICv update request made during
      VM creation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220420013732.3308816-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7c69661e
  5. 14 4月, 2022 3 次提交
    • S
      KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault · 9bd1f0ef
      Sean Christopherson 提交于
      Clear the IDT vectoring field in vmcs12 on next VM-Exit due to a double
      or triple fault.  Per the SDM, a VM-Exit isn't considered to occur during
      event delivery if the exit is due to an intercepted double fault or a
      triple fault.  Opportunistically move the default clearing (no event
      "pending") into the helper so that it's more obvious that KVM does indeed
      handle this case.
      
      Note, the double fault case is worded rather wierdly in the SDM:
      
        The original event results in a double-fault exception that causes the
        VM exit directly.
      
      Temporarily ignoring injected events, double faults can _only_ occur if
      an exception occurs while attempting to deliver a different exception,
      i.e. there's _always_ an original event.  And for injected double fault,
      while there's no original event, injected events are never subject to
      interception.
      
      Presumably the SDM is calling out that a the vectoring info will be valid
      if a different exit occurs after a double fault, e.g. if a #PF occurs and
      is intercepted while vectoring #DF, then the vectoring info will show the
      double fault.  In other words, the clause can simply be read as:
      
        The VM exit is caused by a double-fault exception.
      
      Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
      Cc: Chenyi Qiang <chenyi.qiang@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220407002315.78092-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9bd1f0ef
    • S
      KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry · c3634d25
      Sean Christopherson 提交于
      Don't modify vmcs12 exit fields except EXIT_REASON and EXIT_QUALIFICATION
      when performing a nested VM-Exit due to failed VM-Entry.  Per the SDM,
      only the two aformentioned fields are filled and "All other VM-exit
      information fields are unmodified".
      
      Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220407002315.78092-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c3634d25
    • S
      KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 · 45846661
      Sean Christopherson 提交于
      Remove WARNs that sanity check that KVM never lets a triple fault for L2
      escape and incorrectly end up in L1.  In normal operation, the sanity
      check is perfectly valid, but it incorrectly assumes that it's impossible
      for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through
      KVM_RUN (which guarantees kvm_check_nested_state() will see and handle
      the triple fault).
      
      The WARN can currently be triggered if userspace injects a machine check
      while L2 is active and CR4.MCE=0.  And a future fix to allow save/restore
      of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't
      lost on migration, will make it trivially easy for userspace to trigger
      the WARN.
      
      Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is
      tempting, but wrong, especially if/when the request is saved/restored,
      e.g. if userspace restores events (including a triple fault) and then
      restores nested state (which may forcibly leave guest mode).  Ignoring
      the fact that KVM doesn't currently provide the necessary APIs, it's
      userspace's responsibility to manage pending events during save/restore.
      
        ------------[ cut here ]------------
        WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Modules linked in: kvm_intel kvm irqbypass
        CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Call Trace:
         <TASK>
         vmx_leave_nested+0x30/0x40 [kvm_intel]
         vmx_set_nested_state+0xca/0x3e0 [kvm_intel]
         kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm]
         kvm_vcpu_ioctl+0x4b9/0x660 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        ---[ end trace 0000000000000000 ]---
      
      Fixes: cb6a32c2 ("KVM: x86: Handle triple fault in L2 without killing L1")
      Cc: stable@vger.kernel.org
      Cc: Chenyi Qiang <chenyi.qiang@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220407002315.78092-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45846661
  6. 25 2月, 2022 5 次提交
    • P
      KVM: x86/mmu: load new PGD after the shadow MMU is initialized · 3cffc89d
      Paolo Bonzini 提交于
      Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
      shadow_root_level anymore, pull the PGD load after the initialization of
      the shadow MMUs.
      
      Besides being more intuitive, this enables future simplifications
      and optimizations because it's not necessary anymore to compute the
      role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
      attempting to use a cached PGD to avoid having to figure out the new role.
      With this change, it could follow what nested_{vmx,svm}_load_cr3 are doing,
      and avoid unloading all the cached roots.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3cffc89d
    • P
      KVM: x86/mmu: do not pass vcpu to root freeing functions · 0c1c92f1
      Paolo Bonzini 提交于
      These functions only operate on a given MMU, of which there is more
      than one in a vCPU (we care about two, because the third does not have
      any roots and is only used to walk guest page tables).  They do need a
      struct kvm in order to lock the mmu_lock, but they do not needed anything
      else in the struct kvm_vcpu.  So, pass the vcpu->kvm directly to them.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c1c92f1
    • P
      KVM: x86: use struct kvm_mmu_root_info for mmu->root · b9e5603c
      Paolo Bonzini 提交于
      The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info.
      Use the struct to have more consistency between mmu->root and
      mmu->prev_roots.
      
      The patch is entirely search and replace except for cached_root_available,
      which does not need a temporary struct kvm_mmu_root_info anymore.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b9e5603c
    • S
      Revert "KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()" · 1a715810
      Sean Christopherson 提交于
      Revert back to refreshing vmcs.HOST_CR3 immediately prior to VM-Enter.
      The PCID (ASID) part of CR3 can be bumped without KVM being scheduled
      out, as the kernel will switch CR3 during __text_poke(), e.g. in response
      to a static key toggling.  If switch_mm_irqs_off() chooses a new ASID for
      the mm associate with KVM, KVM will do VM-Enter => VM-Exit with a stale
      vmcs.HOST_CR3.
      
      Add a comment to explain why KVM must wait until VM-Enter is imminent to
      refresh vmcs.HOST_CR3.
      
      The following splat was captured by stashing vmcs.HOST_CR3 in kvm_vcpu
      and adding a WARN in load_new_mm_cr3() to fire if a new ASID is being
      loaded for the KVM-associated mm while KVM has a "running" vCPU:
      
        static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
        {
      	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
      
      	...
      
      	WARN(vcpu && (vcpu->cr3 & GENMASK(11, 0)) != (new_mm_cr3 & GENMASK(11, 0)) &&
      	     (vcpu->cr3 & PHYSICAL_PAGE_MASK) == (new_mm_cr3 & PHYSICAL_PAGE_MASK),
      	     "KVM is hosed, loading CR3 = %lx, vmcs.HOST_CR3 = %lx", new_mm_cr3, vcpu->cr3);
        }
      
        ------------[ cut here ]------------
        KVM is hosed, loading CR3 = 8000000105393004, vmcs.HOST_CR3 = 105393003
        WARNING: CPU: 4 PID: 20717 at arch/x86/mm/tlb.c:291 load_new_mm_cr3+0x82/0xe0
        Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel
        CPU: 4 PID: 20717 Comm: stable Tainted: G        W         5.17.0-rc3+ #747
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:load_new_mm_cr3+0x82/0xe0
        RSP: 0018:ffffc9000489fa98 EFLAGS: 00010082
        RAX: 0000000000000000 RBX: 8000000105393004 RCX: 0000000000000027
        RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff888277d1b788
        RBP: 0000000000000004 R08: ffff888277d1b780 R09: ffffc9000489f8b8
        R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
        R13: ffff88810678a800 R14: 0000000000000004 R15: 0000000000000c33
        FS:  00007fa9f0e72700(0000) GS:ffff888277d00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 00000001001b5003 CR4: 0000000000172ea0
        Call Trace:
         <TASK>
         switch_mm_irqs_off+0x1cb/0x460
         __text_poke+0x308/0x3e0
         text_poke_bp_batch+0x168/0x220
         text_poke_finish+0x1b/0x30
         arch_jump_label_transform_apply+0x18/0x30
         static_key_slow_inc_cpuslocked+0x7c/0x90
         static_key_slow_inc+0x16/0x20
         kvm_lapic_set_base+0x116/0x190
         kvm_set_apic_base+0xa5/0xe0
         kvm_set_msr_common+0x2f4/0xf60
         vmx_set_msr+0x355/0xe70 [kvm_intel]
         kvm_set_msr_ignored_check+0x91/0x230
         kvm_emulate_wrmsr+0x36/0x120
         vmx_handle_exit+0x609/0x6c0 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0x146f/0x1b80
         kvm_vcpu_ioctl+0x279/0x690
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        ---[ end trace 0000000000000000 ]---
      
      This reverts commit 15ad9762.
      
      Fixes: 15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
      Reported-by: NWanpeng Li <kernellwp@gmail.com>
      Cc: Lai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Acked-by: NLai Jiangshan <jiangshanlai@gmail.com>
      Message-Id: <20220224191917.3508476-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1a715810
    • S
      Revert "KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()" · bca06b85
      Sean Christopherson 提交于
      Undo a nested VMX fix as a step toward reverting the commit it fixed,
      15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()"),
      as the underlying premise that "host CR3 in the vcpu thread can only be
      changed when scheduling" is wrong.
      
      This reverts commit a9f2705e.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220224191917.3508476-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bca06b85
  7. 11 2月, 2022 1 次提交
    • S
      KVM: nVMX: Refactor PMU refresh to avoid referencing kvm_x86_ops.pmu_ops · 0bcd556e
      Sean Christopherson 提交于
      Refactor the nested VMX PMU refresh helper to pass it a flag stating
      whether or not the vCPU has PERF_GLOBAL_CTRL instead of having the nVMX
      helper query the information by bouncing through kvm_x86_ops.pmu_ops.
      This will allow a future patch to use static_call() for the PMU ops
      without having to export any static call definitions from common x86, and
      it is also a step toward unexported kvm_x86_ops.
      
      Alternatively, nVMX could call kvm_pmu_is_valid_msr() to indirectly use
      kvm_x86_ops.pmu_ops, but that would incur an extra layer of indirection
      and would require exporting kvm_pmu_is_valid_msr().
      
      Opportunistically rename the helper to keep line lengths somewhat
      reasonable, and to better capture its high-level role.
      
      No functional change intended.
      
      Cc: Like Xu <like.xu.linux@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0bcd556e
  8. 28 1月, 2022 2 次提交
    • V
      KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use · 6cbbaab6
      Vitaly Kuznetsov 提交于
      Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when
      Enlightened VMCS interface is in use:
      
      "Any VMREAD or VMWRITE instructions while an enlightened VMCS is
      active is unsupported and can result in unexpected behavior.""
      
      Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field
      0x4404 ("VM-exit interruption information") are observed. Failing
      these attempts with nested_vmx_failInvalid() makes such guests
      unbootable.
      
      Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed
      eventually but for the time being we need a workaround. (Temporary) allow
      VMREAD to get data from the currently loaded Enlightened VMCS.
      
      Note: VMWRITE instructions remain forbidden, it is not clear how to
      handle them properly and hopefully won't ever be needed.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220112170134.1904308-6-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6cbbaab6
    • V
      KVM: nVMX: Rename vmcs_to_field_offset{,_table} · 2423a4c0
      Vitaly Kuznetsov 提交于
      vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque
      blob which is not supposed to be accessed directly. In fact,
      vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.
      
      Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity.
      
      No functional change intended.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220112170134.1904308-4-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2423a4c0
  9. 27 1月, 2022 2 次提交
    • S
      KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 · d6e656cd
      Sean Christopherson 提交于
      WARN if KVM attempts to allocate a shadow VMCS for vmcs02.  KVM emulates
      VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate
      a "real" shadow VMCS for L2.
      
      The previous code WARNed but continued anyway with the allocation,
      presumably in an attempt to avoid NULL pointer dereference.
      However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and
      indeed the sole caller does:
      
      	if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu))
      		goto out_shadow_vmcs;
      
      which makes it not a useful attempt.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220527.2093146-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d6e656cd
    • S
      KVM: x86: Forcibly leave nested virt when SMM state is toggled · f7e57078
      Sean Christopherson 提交于
      Forcibly leave nested virtualization operation if userspace toggles SMM
      state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
      forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
      vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
      vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
      
      Don't attempt to gracefully handle the transition as (a) most transitions
      are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
      sufficient information to handle all transitions, e.g. SVM wants access
      to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
      KVM_SET_NESTED_STATE during state restore as the latter disallows putting
      the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
      being post-VMXON in SMM if SMM is not active.
      
      Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
      due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
      just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
      in an architecturally impossible state.
      
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Modules linked in:
        CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
        Call Trace:
         <TASK>
         kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
         kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
         kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
         kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
         kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
         kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
         kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
         __fput+0x286/0x9f0 fs/file_table.c:311
         task_work_run+0xdd/0x1a0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0xb29/0x2a30 kernel/exit.c:806
         do_group_exit+0xd2/0x2f0 kernel/exit.c:935
         get_signal+0x4b0/0x28c0 kernel/signal.c:2862
         arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
         do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220358.2091737-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7e57078
  10. 07 1月, 2022 2 次提交
    • E
      KVM: x86: Update vPMCs when retiring branch instructions · 018d70ff
      Eric Hankland 提交于
      When KVM retires a guest branch instruction through emulation,
      increment any vPMCs that are configured to monitor "branch
      instructions retired," and update the sample period of those counters
      so that they will overflow at the right time.
      Signed-off-by: NEric Hankland <ehankland@google.com>
      [jmattson:
        - Split the code to increment "branch instructions retired" into a
          separate commit.
        - Moved/consolidated the calls to kvm_pmu_trigger_event() in the
          emulation of VMLAUNCH/VMRESUME to accommodate the evolution of
          that code.
      ]
      Fixes: f5132b01 ("KVM: Expose a version 2 architectural PMU to a guests")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20211130074221.93635-7-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      018d70ff
    • L
      KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs() · a9f2705e
      Lai Jiangshan 提交于
      The host CR3 in the vcpu thread can only be changed when scheduling,
      so commit 15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
      changed vmx.c to only save it in vmx_prepare_switch_to_guest().
      
      However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS.
      vmx_set_host_fs_gs() is called in both places, so rename it to
      vmx_set_vmcs_host_state() and make it update HOST_CR3.
      
      Fixes: 15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20211216021938.11752-2-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a9f2705e
  11. 08 12月, 2021 8 次提交
  12. 02 12月, 2021 1 次提交
  13. 26 11月, 2021 3 次提交
    • S
      KVM: nVMX: Emulate guest TLB flush on nested VM-Enter with new vpid12 · 712494de
      Sean Christopherson 提交于
      Fully emulate a guest TLB flush on nested VM-Enter which changes vpid12,
      i.e. L2's VPID, instead of simply doing INVVPID to flush real hardware's
      TLB entries for vpid02.  From L1's perspective, changing L2's VPID is
      effectively a TLB flush unless "hardware" has previously cached entries
      for the new vpid12.  Because KVM tracks only a single vpid12, KVM doesn't
      know if the new vpid12 has been used in the past and so must treat it as
      a brand new, never been used VPID, i.e. must assume that the new vpid12
      represents a TLB flush from L1's perspective.
      
      For example, if L1 and L2 share a CR3, the first VM-Enter to L2 (with a
      VPID) is effectively a TLB flush as hardware/KVM has never seen vpid12
      and thus can't have cached entries in the TLB for vpid12.
      Reported-by: NLai Jiangshan <jiangshanlai+lkml@gmail.com>
      Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211125014944.536398-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      712494de
    • S
      KVM: nVMX: Abide to KVM_REQ_TLB_FLUSH_GUEST request on nested vmentry/vmexit · 40e5f908
      Sean Christopherson 提交于
      Like KVM_REQ_TLB_FLUSH_CURRENT, the GUEST variant needs to be serviced at
      nested transitions, as KVM doesn't track requests for L1 vs L2.  E.g. if
      there's a pending flush when a nested VM-Exit occurs, then the flush was
      requested in the context of L2 and needs to be handled before switching
      to L1, otherwise the flush for L2 would effectiely be lost.
      
      Opportunistically add a helper to handle CURRENT and GUEST as a pair, the
      logic for when they need to be serviced is identical as both requests are
      tied to L1 vs. L2, the only difference is the scope of the flush.
      Reported-by: NLai Jiangshan <jiangshanlai+lkml@gmail.com>
      Fixes: 07ffaf34 ("KVM: nVMX: Sync all PGDs on nested transition with shadow paging")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211125014944.536398-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      40e5f908
    • P
      KVM: VMX: do not use uninitialized gfn_to_hva_cache · 8503fea6
      Paolo Bonzini 提交于
      An uninitialized gfn_to_hva_cache has ghc->len == 0, which causes
      the accessors to croak very loudly.  While a BUG_ON is definitely
      _too_ loud and a bug on its own, there is indeed an issue of using
      the caches in such a way that they could not have been initialized,
      because ghc->gpa == 0 might match and thus kvm_gfn_to_hva_cache_init
      would not be called.
      
      For the vmcs12_cache, the solution is simply to invoke
      kvm_gfn_to_hva_cache_init unconditionally: we already know
      that the cache does not match the current VMCS pointer.
      For the shadow_vmcs12_cache, there is no similar condition
      that checks the VMCS link pointer, so invalidate the cache
      on VMXON.
      
      Fixes: cee66664 ("KVM: nVMX: Use a gfn_to_hva_cache for vmptrld")
      Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Reported-by: syzbot+7b7db8bb4db6fd5e157b@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8503fea6
  14. 18 11月, 2021 4 次提交
  15. 11 11月, 2021 3 次提交
    • V
      KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT · 329bd56c
      Vipin Sharma 提交于
      handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2
      field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that
      holds the invalidation type. Add a helper to retrieve reg2 from VMX
      instruction info to consolidate and document the shift+mask magic.
      Signed-off-by: NVipin Sharma <vipinsh@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109174426.2350547-2-vipinsh@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      329bd56c
    • S
      KVM: nVMX: Clean up x2APIC MSR handling for L2 · a5e0c252
      Sean Christopherson 提交于
      Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
      holdout of open coded bitmap manipulations.  Freshen up the SDM/PRM
      comment, rename the function to make it abundantly clear the funky
      behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
      (the previous comment was flat out wrong for x2APIC behavior).
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a5e0c252
    • S
      KVM: nVMX: Handle dynamic MSR intercept toggling · 67f4b996
      Sean Christopherson 提交于
      Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
      and always update the relevant bits in vmcs02.  This fixes two distinct,
      but intertwined bugs related to dynamic MSR bitmap modifications.
      
      The first issue is that KVM fails to enable MSR interception in vmcs02
      for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
      and later enables interception.
      
      The second issue is that KVM fails to honor userspace MSR filtering when
      preparing vmcs02.
      
      Fix both issues simultaneous as fixing only one of the issues (doesn't
      matter which) would create a mess that no one should have to bisect.
      Fixing only the first bug would exacerbate the MSR filtering issue as
      userspace would see inconsistent behavior depending on the whims of L1.
      Fixing only the second bug (MSR filtering) effectively requires fixing
      the first, as the nVMX code only knows how to transition vmcs02's
      bitmap from 1->0.
      
      Move the various accessor/mutators that are currently buried in vmx.c
      into vmx.h so that they can be shared by the nested code.
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Fixes: d69129b4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      67f4b996
  16. 25 10月, 2021 1 次提交
  17. 30 9月, 2021 1 次提交