1. 20 9月, 2022 2 次提交
  2. 17 9月, 2022 1 次提交
    • C
      KVM: VMX: Enable bus lock VM exit · 8af4079e
      Chenyi Qiang 提交于
      mainline inclusion
      from mainline-v5.12-rc1
      commit fe6b6bc8
      category: feature
      feature: KVM Bus Lock VM Exit
      bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
      CVE: N/A
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
      commit/?id=fe6b6bc8
      
      Intel-SIG: commit fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
      
      -------------------------------------
      
      KVM: VMX: Enable bus lock VM exit
      
      Virtual Machine can exploit bus locks to degrade the performance of
      system. Bus lock can be caused by split locked access to writeback(WB)
      memory or by using locks on uncacheable(UC) memory. The bus lock is
      typically >1000 cycles slower than an atomic operation within a cache
      line. It also disrupts performance on other cores (which must wait for
      the bus lock to be released before their memory operations can
      complete).
      
      To address the threat, bus lock VM exit is introduced to notify the VMM
      when a bus lock was acquired, allowing it to enforce throttling or other
      policy based mitigations.
      
      A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
      Detection" VM-execution control(bit 30 of Secondary Processor-based VM
      execution controls). If delivery of this VM exit was preempted by a
      higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC
      access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of
      exit reason in vmcs field is set to 1.
      
      In current implementation, the KVM exposes this capability through
      KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap
      (i.e. off and exit) and enable it explicitly (disabled by default). If
      bus locks in guest are detected by KVM, exit to user space even when
      current exit reason is handled by KVM internally. Set a new field
      KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there
      is a bus lock detected in guest.
      
      Document for Bus Lock VM exit is now available at the latest "Intel
      Architecture Instruction Set Extensions Programming Reference".
      
      Document Link:
      https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAichun Shi <aichun.shi@intel.com>
      8af4079e
  3. 01 9月, 2022 1 次提交
  4. 31 8月, 2022 1 次提交
  5. 29 8月, 2022 2 次提交
  6. 04 8月, 2022 1 次提交
  7. 19 7月, 2022 1 次提交
    • S
      KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded · 85fc72e3
      Sean Christopherson 提交于
      stable inclusion
      from stable-v5.10.112
      commit 342454231ee5f2c2782f5510cab2e7a968486fef
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5HL0X
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=342454231ee5f2c2782f5510cab2e7a968486fef
      
      --------------------------------
      
      commit 1d0e8480 upstream.
      
      Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as
      -1 is technically undefined behavior when its value is read out by
      param_get_bool(), as boolean values are supposed to be '0' or '1'.
      
      Alternatively, KVM could define a custom getter for the param, but the
      auto value doesn't depend on the vendor module in any way, and printing
      "auto" would be unnecessarily unfriendly to the user.
      
      In addition to fixing the undefined behavior, resolving the auto value
      also fixes the scenario where the auto value resolves to N and no vendor
      module is loaded.  Previously, -1 would result in Y being printed even
      though KVM would ultimately disable the mitigation.
      
      Rename the existing MMU module init/exit helpers to clarify that they're
      invoked with respect to the vendor module, and add comments to document
      why KVM has two separate "module init" flows.
      
        =========================================================================
        UBSAN: invalid-load in kernel/params.c:320:33
        load of value 255 is not a valid value for type '_Bool'
        CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        Call Trace:
         <TASK>
         dump_stack_lvl+0x34/0x44
         ubsan_epilogue+0x5/0x40
         __ubsan_handle_load_invalid_value.cold+0x43/0x48
         param_get_bool.cold+0xf/0x14
         param_attr_show+0x55/0x80
         module_attr_show+0x1c/0x30
         sysfs_kf_seq_show+0x93/0xc0
         seq_read_iter+0x11c/0x450
         new_sync_read+0x11b/0x1a0
         vfs_read+0xf0/0x190
         ksys_read+0x5f/0xe0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        =========================================================================
      
      Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation")
      Cc: stable@vger.kernel.org
      Reported-by: NBruno Goncalves <bgoncalv@redhat.com>
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220331221359.3912754-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      85fc72e3
  8. 08 7月, 2022 2 次提交
  9. 07 6月, 2022 1 次提交
  10. 10 5月, 2022 1 次提交
    • S
      KVM: x86: Forcibly leave nested virt when SMM state is toggled · 0f01efa3
      Sean Christopherson 提交于
      stable inclusion
      from stable-v5.10.97
      commit 080dbe7e9b86a0392d8dffc00d9971792afc121f
      bugzilla: https://gitee.com/openeuler/kernel/issues/I55O0O
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=080dbe7e9b86a0392d8dffc00d9971792afc121f
      
      --------------------------------
      
      commit f7e57078 upstream.
      
      Forcibly leave nested virtualization operation if userspace toggles SMM
      state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
      forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
      vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
      vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
      
      Don't attempt to gracefully handle the transition as (a) most transitions
      are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
      sufficient information to handle all transitions, e.g. SVM wants access
      to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
      KVM_SET_NESTED_STATE during state restore as the latter disallows putting
      the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
      being post-VMXON in SMM if SMM is not active.
      
      Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
      due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
      just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
      in an architecturally impossible state.
      
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Modules linked in:
        CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
        Call Trace:
         <TASK>
         kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
         kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
         kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
         kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
         kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
         kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
         kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
         __fput+0x286/0x9f0 fs/file_table.c:311
         task_work_run+0xdd/0x1a0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0xb29/0x2a30 kernel/exit.c:806
         do_group_exit+0xd2/0x2f0 kernel/exit.c:935
         get_signal+0x4b0/0x28c0 kernel/signal.c:2862
         arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
         do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220358.2091737-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Backported-by: NTadeusz Struk <tadeusz.struk@linaro.org>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYu Liao <liaoyu15@huawei.com>
      Reviewed-by: NWei Li <liwei391@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      0f01efa3
  11. 19 4月, 2022 1 次提交
  12. 14 1月, 2022 1 次提交
  13. 13 10月, 2021 1 次提交
  14. 12 10月, 2021 1 次提交
  15. 03 6月, 2021 2 次提交
  16. 19 4月, 2021 2 次提交
    • S
      KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish · 5b355504
      Sean Christopherson 提交于
      stable inclusion
      from stable-5.10.27
      commit 771dfb3c531d1ecce209c82161227d66b24d7784
      bugzilla: 51493
      
      --------------------------------
      
      [ Upstream commit b318e8de ]
      
      Fix a plethora of issues with MSR filtering by installing the resulting
      filter as an atomic bundle instead of updating the live filter one range
      at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
      the hardware MSR bitmaps won't be updated until the next VM-Enter, but
      the relevant software struct is atomically updated, which is what KVM
      really needs.
      
      Similar to the approach used for modifying memslots, make arch.msr_filter
      a SRCU-protected pointer, do all the work configuring the new filter
      outside of kvm->lock, and then acquire kvm->lock only when the new filter
      has been vetted and created.  That way vCPU readers either see the old
      filter or the new filter in their entirety, not some half-baked state.
      
      Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
      TOCTOU bug, but that's just the tip of the iceberg...
      
        - Nothing is __rcu annotated, making it nigh impossible to audit the
          code for correctness.
        - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
          coding style aside, the lack of a smb_rmb() anywhere casts all code
          into doubt.
        - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
          count before taking the lock.
        - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
      
      The entire approach of updating the live filter is also flawed.  While
      installing a new filter is inherently racy if vCPUs are running, fixing
      the above issues also makes it trivial to ensure certain behavior is
      deterministic, e.g. KVM can provide deterministic behavior for MSRs with
      identical settings in the old and new filters.  An atomic update of the
      filter also prevents KVM from getting into a half-baked state, e.g. if
      installing a filter fails, the existing approach would leave the filter
      in a half-baked state, having already committed whatever bits of the
      filter were already processed.
      
      [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Reported-by: NYuan Yao <yaoyuan0329os@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210316184436.2544875-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5b355504
    • C
      kvm: debugfs: add EXIT_REASON_PREEMPTION_TIMER to vcpu_stat · 3beedef8
      chenjiajun 提交于
      virt inclusion
      category: feature
      bugzilla: 46853
      CVE: NA
      
      Export EXIT_REASON_PREEMPTION_TIMER kvm exits to vcpu_stat debugfs.
      Add a new column to vcpu_stat, and provide preemption_timer status to
      virtualization detection tools.
      Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com>
      Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3beedef8
  17. 12 1月, 2021 2 次提交
  18. 27 11月, 2020 1 次提交
    • P
      KVM: x86: Fix split-irqchip vs interrupt injection window request · 71cc849b
      Paolo Bonzini 提交于
      kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
      a hodge-podge of conditions, hacked together to get something that
      more or less works.  But what is actually needed is much simpler;
      in both cases the fundamental question is, do we have a place to stash
      an interrupt if userspace does KVM_INTERRUPT?
      
      In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
      Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
      unnecessarily restrictive.
      
      In split irqchip mode it's a bit more complicated, we need to check
      kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
      cycle and thus requires ExtINTs not to be masked) as well as
      !pending_userspace_extint(vcpu).  However, there is no need to
      check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
      pending ExtINT state separate from event injection state, and checking
      kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
      priority than APIC interrupts.  In fact the latter fixes a bug:
      when userspace requests an IRQ window vmexit, an interrupt in the
      local APIC can cause kvm_cpu_has_interrupt() to be true and thus
      kvm_vcpu_ready_for_interrupt_injection() to return false.  When this
      happens, vcpu_run does not exit to userspace but the interrupt window
      vmexits keep occurring.  The VM loops without any hope of making progress.
      
      Once we try to fix these with something like
      
           return kvm_arch_interrupt_allowed(vcpu) &&
      -        !kvm_cpu_has_interrupt(vcpu) &&
      -        !kvm_event_needs_reinjection(vcpu) &&
      -        kvm_cpu_accept_dm_intr(vcpu);
      +        (!lapic_in_kernel(vcpu)
      +         ? !vcpu->arch.interrupt.injected
      +         : (kvm_apic_accept_pic_intr(vcpu)
      +            && !pending_userspace_extint(v)));
      
      we realize two things.  First, thanks to the previous patch the complex
      conditional can reuse !kvm_cpu_has_extint(vcpu).  Second, the interrupt
      window request in vcpu_enter_guest()
      
              bool req_int_win =
                      dm_request_for_irq_injection(vcpu) &&
                      kvm_cpu_accept_dm_intr(vcpu);
      
      should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
      it is unnecessary to ask the processor for an interrupt window
      if we would not be able to return to userspace.  Therefore,
      kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
      ANDed with the existing check for masked ExtINT.  It all makes sense:
      
      - we can accept an interrupt from userspace if there is a place
        to stash it (and, for irqchip split, ExtINTs are not masked).
        Interrupts from userspace _can_ be accepted even if right now
        EFLAGS.IF=0.
      
      - in order to tell userspace we will inject its interrupt ("IRQ
        window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
        KVM and the vCPU need to be ready to accept the interrupt.
      
      ... and this is what the patch implements.
      Reported-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Analyzed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Tested-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      71cc849b
  19. 13 11月, 2020 1 次提交
    • B
      KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch · 0107973a
      Babu Moger 提交于
      SEV guests fail to boot on a system that supports the PCID feature.
      
      While emulating the RSM instruction, KVM reads the guest CR3
      and calls kvm_set_cr3(). If the vCPU is in the long mode,
      kvm_set_cr3() does a sanity check for the CR3 value. In this case,
      it validates whether the value has any reserved bits set. The
      reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
      encryption is enabled, the memory encryption bit is set in the CR3
      value. The memory encryption bit may fall within the KVM reserved
      bit range, causing the KVM emulation failure.
      
      Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
      cache the reserved bits in the CR3 value. This will be initialized
      to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).
      
      If the architecture has any special bits(like AMD SEV encryption bit)
      that needs to be masked from the reserved bits, should be cleared
      in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.
      
      Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0107973a
  20. 23 10月, 2020 1 次提交
  21. 22 10月, 2020 6 次提交
  22. 28 9月, 2020 7 次提交
    • P
      KVM: x86: rename KVM_REQ_GET_VMCS12_PAGES · 729c15c2
      Paolo Bonzini 提交于
      We are going to use it for SVM too, so use a more generic name.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      729c15c2
    • A
      KVM: x86: Introduce MSR filtering · 1a155254
      Alexander Graf 提交于
      It's not desireable to have all MSRs always handled by KVM kernel space. Some
      MSRs would be useful to handle in user space to either emulate behavior (like
      uCode updates) or differentiate whether they are valid based on the CPU model.
      
      To allow user space to specify which MSRs it wants to see handled by KVM,
      this patch introduces a new ioctl to push filter rules with bitmaps into
      KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access.
      With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the
      denied MSR events to user space to operate on.
      
      If no filter is populated, MSR handling stays identical to before.
      Signed-off-by: NAlexander Graf <graf@amazon.com>
      
      Message-Id: <20200925143422.21718-8-graf@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1a155254
    • A
      KVM: x86: Add infrastructure for MSR filtering · 51de8151
      Alexander Graf 提交于
      In the following commits we will add pieces of MSR filtering.
      To ensure that code compiles even with the feature half-merged, let's add
      a few stubs and struct definitions before the real patches start.
      Signed-off-by: NAlexander Graf <graf@amazon.com>
      
      Message-Id: <20200925143422.21718-4-graf@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      51de8151
    • A
      KVM: x86: Allow deflecting unknown MSR accesses to user space · 1ae09954
      Alexander Graf 提交于
      MSRs are weird. Some of them are normal control registers, such as EFER.
      Some however are registers that really are model specific, not very
      interesting to virtualization workloads, and not performance critical.
      Others again are really just windows into package configuration.
      
      Out of these MSRs, only the first category is necessary to implement in
      kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
      certain CPU models and MSRs that contain information on the package level
      are much better suited for user space to process. However, over time we have
      accumulated a lot of MSRs that are not the first category, but still handled
      by in-kernel KVM code.
      
      This patch adds a generic interface to handle WRMSR and RDMSR from user
      space. With this, any future MSR that is part of the latter categories can
      be handled in user space.
      
      Furthermore, it allows us to replace the existing "ignore_msrs" logic with
      something that applies per-VM rather than on the full system. That way you
      can run productive VMs in parallel to experimental ones where you don't care
      about proper MSR handling.
      Signed-off-by: NAlexander Graf <graf@amazon.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      
      Message-Id: <20200925143422.21718-3-graf@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ae09954
    • S
      KVM: x86: Rename "shared_msrs" to "user_return_msrs" · 7e34fbd0
      Sean Christopherson 提交于
      Rename the "shared_msrs" mechanism, which is used to defer restoring
      MSRs that are only consumed when running in userspace, to a more banal
      but less likely to be confusing "user_return_msrs".
      
      The "shared" nomenclature is confusing as it's not obvious who is
      sharing what, e.g. reasonable interpretations are that the guest value
      is shared by vCPUs in a VM, or that the MSR value is shared/common to
      guest and host, both of which are wrong.
      
      "shared" is also misleading as the MSR value (in hardware) is not
      guaranteed to be shared/reused between VMs (if that's indeed the correct
      interpretation of the name), as the ability to share values between VMs
      is simply a side effect (albiet a very nice side effect) of deferring
      restoration of the host value until returning from userspace.
      
      "user_return" avoids the above confusion by describing the mechanism
      itself instead of its effects.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200923180409.32255-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e34fbd0
    • S
      KVM: x86: Add intr/vectoring info and error code to kvm_exit tracepoint · 235ba74f
      Sean Christopherson 提交于
      Extend the kvm_exit tracepoint to align it with kvm_nested_vmexit in
      terms of what information is captured.  On SVM, add interrupt info and
      error code, while on VMX it add IDT vectoring and error code.  This
      sets the stage for macrofying the kvm_exit tracepoint definition so that
      it can be reused for kvm_nested_vmexit without loss of information.
      
      Opportunistically stuff a zero for VM_EXIT_INTR_INFO if the VM-Enter
      failed, as the field is guaranteed to be invalid.  Note, it'd be
      possible to further filter the interrupt/exception fields based on the
      VM-Exit reason, but the helper is intended only for tracepoints, i.e.
      an extra VMREAD or two is a non-issue, the failed VM-Enter case is just
      low hanging fruit.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200923201349.16097-5-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      235ba74f
    • S
      KVM: x86: Add kvm_x86_ops hook to short circuit emulation · 09e3e2a1
      Sean Christopherson 提交于
      Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a
      more generic is_emulatable(), and unconditionally call the new function
      in x86_emulate_instruction().
      
      KVM will use the generic hook to support multiple security related
      technologies that prevent emulation in one way or another.  Similar to
      the existing AMD #NPF case where emulation of the current instruction is
      not possible due to lack of information, AMD's SEV-ES and Intel's SGX
      and TDX will introduce scenarios where emulation is impossible due to
      the guest's register state being inaccessible.  And again similar to the
      existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(),
      i.e. outside of the control of vendor-specific code.
      
      While the cause and architecturally visible behavior of the various
      cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean
      resume or complete shutdown, and SEV-ES and TDX "return" an error, the
      impact on the common emulation code is identical: KVM must stop
      emulation immediately and resume the guest.
      
      Query is_emulatable() in handle_ud() as well so that the
      force_emulation_prefix code doesn't incorrectly modify RIP before
      calling emulate_instruction() in the absurdly unlikely scenario that
      KVM encounters forced emulation in conjunction with "do not emulate".
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      09e3e2a1
  23. 22 8月, 2020 1 次提交
    • W
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon 提交于
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd