1. 14 4月, 2022 1 次提交
    • S
      KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 · 45846661
      Sean Christopherson 提交于
      Remove WARNs that sanity check that KVM never lets a triple fault for L2
      escape and incorrectly end up in L1.  In normal operation, the sanity
      check is perfectly valid, but it incorrectly assumes that it's impossible
      for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through
      KVM_RUN (which guarantees kvm_check_nested_state() will see and handle
      the triple fault).
      
      The WARN can currently be triggered if userspace injects a machine check
      while L2 is active and CR4.MCE=0.  And a future fix to allow save/restore
      of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't
      lost on migration, will make it trivially easy for userspace to trigger
      the WARN.
      
      Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is
      tempting, but wrong, especially if/when the request is saved/restored,
      e.g. if userspace restores events (including a triple fault) and then
      restores nested state (which may forcibly leave guest mode).  Ignoring
      the fact that KVM doesn't currently provide the necessary APIs, it's
      userspace's responsibility to manage pending events during save/restore.
      
        ------------[ cut here ]------------
        WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Modules linked in: kvm_intel kvm irqbypass
        CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Call Trace:
         <TASK>
         vmx_leave_nested+0x30/0x40 [kvm_intel]
         vmx_set_nested_state+0xca/0x3e0 [kvm_intel]
         kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm]
         kvm_vcpu_ioctl+0x4b9/0x660 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        ---[ end trace 0000000000000000 ]---
      
      Fixes: cb6a32c2 ("KVM: x86: Handle triple fault in L2 without killing L1")
      Cc: stable@vger.kernel.org
      Cc: Chenyi Qiang <chenyi.qiang@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220407002315.78092-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45846661
  2. 02 4月, 2022 7 次提交
  3. 25 2月, 2022 1 次提交
    • P
      KVM: x86/mmu: load new PGD after the shadow MMU is initialized · 3cffc89d
      Paolo Bonzini 提交于
      Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
      shadow_root_level anymore, pull the PGD load after the initialization of
      the shadow MMUs.
      
      Besides being more intuitive, this enables future simplifications
      and optimizations because it's not necessary anymore to compute the
      role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
      attempting to use a cached PGD to avoid having to figure out the new role.
      With this change, it could follow what nested_{vmx,svm}_load_cr3 are doing,
      and avoid unloading all the cached roots.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3cffc89d
  4. 11 2月, 2022 2 次提交
    • V
      KVM: nSVM: Implement Enlightened MSR-Bitmap feature · 66c03a92
      Vitaly Kuznetsov 提交于
      Similar to nVMX commit 502d2bf5 ("KVM: nVMX: Implement Enlightened MSR
      Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM).
      
      Notable differences from nVMX implementation:
      - As the feature uses SW reserved fields in VMCB control, KVM needs to
      make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()).
      
      - 'msrpm_base_pa' needs to be always be overwritten in
      nested_svm_vmrun_msrpm(), even when the update is skipped. As an
      optimization, nested_vmcb02_prepare_control() copies it from VMCB01
      so when MSR-Bitmap feature for L2 is disabled nothing needs to be done.
      
      - 'struct vmcb_ctrl_area_cached' needs to be extended with clean
      fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to
      copy it so nested_svm_vmrun_msrpm() can use it later.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220202095100.129834-5-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66c03a92
    • V
      KVM: nSVM: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt · 73c25546
      Vitaly Kuznetsov 提交于
      Similar to nVMX commit ed2a4800 ("KVM: nVMX: Track whether changes in
      L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep
      track of whether MSR bitmap for L2 needs to be rebuilt due to changes in
      MSR bitmap for L1 or switching to a different L2.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220202095100.129834-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      73c25546
  5. 09 2月, 2022 1 次提交
  6. 27 1月, 2022 1 次提交
    • S
      KVM: x86: Forcibly leave nested virt when SMM state is toggled · f7e57078
      Sean Christopherson 提交于
      Forcibly leave nested virtualization operation if userspace toggles SMM
      state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
      forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
      vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
      vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
      
      Don't attempt to gracefully handle the transition as (a) most transitions
      are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
      sufficient information to handle all transitions, e.g. SVM wants access
      to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
      KVM_SET_NESTED_STATE during state restore as the latter disallows putting
      the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
      being post-VMXON in SMM if SMM is not active.
      
      Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
      due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
      just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
      in an architecturally impossible state.
      
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Modules linked in:
        CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
        Call Trace:
         <TASK>
         kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
         kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
         kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
         kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
         kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
         kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
         kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
         __fput+0x286/0x9f0 fs/file_table.c:311
         task_work_run+0xdd/0x1a0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0xb29/0x2a30 kernel/exit.c:806
         do_group_exit+0xd2/0x2f0 kernel/exit.c:935
         get_signal+0x4b0/0x28c0 kernel/signal.c:2862
         arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
         do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220358.2091737-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7e57078
  7. 08 12月, 2021 9 次提交
  8. 01 10月, 2021 2 次提交
  9. 30 9月, 2021 1 次提交
  10. 23 9月, 2021 1 次提交
  11. 22 9月, 2021 1 次提交
  12. 16 8月, 2021 2 次提交
  13. 02 8月, 2021 1 次提交
    • P
      KVM: nSVM: remove useless kvm_clear_*_queue · db105fab
      Paolo Bonzini 提交于
      For an event to be in injected state when nested_svm_vmrun executes,
      it must have come from exitintinfo when svm_complete_interrupts ran:
      
        vcpu_enter_guest
         static_call(kvm_x86_run) -> svm_vcpu_run
          svm_complete_interrupts
           // now the event went from "exitintinfo" to "injected"
         static_call(kvm_x86_handle_exit) -> handle_exit
          svm_invoke_exit_handler
            vmrun_interception
             nested_svm_vmrun
      
      However, no event could have been in exitintinfo before a VMRUN
      vmexit.  The code in svm.c is a bit more permissive than the one
      in vmx.c:
      
              if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
                  exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
                  exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
                  exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
      
      but in any case, a VMRUN instruction would not even start to execute
      during an attempted event delivery.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db105fab
  14. 28 7月, 2021 1 次提交
  15. 26 7月, 2021 2 次提交
  16. 15 7月, 2021 4 次提交
  17. 25 6月, 2021 3 次提交