1. 02 8月, 2021 6 次提交
  2. 30 7月, 2021 1 次提交
    • P
      KVM: x86: accept userspace interrupt only if no event is injected · fa7a549d
      Paolo Bonzini 提交于
      Once an exception has been injected, any side effects related to
      the exception (such as setting CR2 or DR6) have been taked place.
      Therefore, once KVM sets the VM-entry interruption information
      field or the AMD EVENTINJ field, the next VM-entry must deliver that
      exception.
      
      Pending interrupts are processed after injected exceptions, so
      in theory it would not be a problem to use KVM_INTERRUPT when
      an injected exception is present.  However, DOSEMU is using
      run->ready_for_interrupt_injection to detect interrupt windows
      and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
      interrupt manually.  For this to work, the interrupt window
      must be delayed after the completion of the previous event
      injection.
      
      Cc: stable@vger.kernel.org
      Reported-by: NStas Sergeev <stsp2@yandex.ru>
      Tested-by: NStas Sergeev <stsp2@yandex.ru>
      Fixes: 71cc849b ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fa7a549d
  3. 26 7月, 2021 1 次提交
  4. 15 7月, 2021 2 次提交
    • L
      KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() · f85d4016
      Lai Jiangshan 提交于
      When the host is using debug registers but the guest is not using them
      nor is the guest in guest-debug state, the kvm code does not reset
      the host debug registers before kvm_x86->run().  Rather, it relies on
      the hardware vmentry instruction to automatically reset the dr7 registers
      which ensures that the host breakpoints do not affect the guest.
      
      This however violates the non-instrumentable nature around VM entry
      and exit; for example, when a host breakpoint is set on vcpu->arch.cr2,
      
      Another issue is consistency.  When the guest debug registers are active,
      the host breakpoints are reset before kvm_x86->run(). But when the
      guest debug registers are inactive, the host breakpoints are delayed to
      be disabled.  The host tracing tools may see different results depending
      on what the guest is doing.
      
      To fix the problems, we clear %db7 unconditionally before kvm_x86->run()
      if the host has set any breakpoints, no matter if the guest is using
      them or not.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210628172632.81029-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      [Only clear %db7 instead of reloading all debug registers. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f85d4016
    • S
      Revert "KVM: x86: WARN and reject loading KVM if NX is supported but not enabled" · f0414b07
      Sean Christopherson 提交于
      Let KVM load if EFER.NX=0 even if NX is supported, the analysis and
      testing (or lack thereof) for the non-PAE host case was garbage.
      
      If the kernel won't be using PAE paging, .Ldefault_entry in head_32.S
      skips over the entire EFER sequence.  Hopefully that can be changed in
      the future to allow KVM to require EFER.NX, but the motivation behind
      KVM's requirement isn't yet merged.  Reverting and revisiting the mess
      at a later date is by far the safest approach.
      
      This reverts commit 8bbed95d.
      
      Fixes: 8bbed95d ("KVM: x86: WARN and reject loading KVM if NX is supported but not enabled")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210625001853.318148-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f0414b07
  5. 25 6月, 2021 8 次提交
    • A
      kvm: x86: Allow userspace to handle emulation errors · 19238e75
      Aaron Lewis 提交于
      Add a fallback mechanism to the in-kernel instruction emulator that
      allows userspace the opportunity to process an instruction the emulator
      was unable to.  When the in-kernel instruction emulator fails to process
      an instruction it will either inject a #UD into the guest or exit to
      userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
      not know how to proceed in an appropriate manner.  This feature lets
      userspace get involved to see if it can figure out a better path
      forward.
      Signed-off-by: NAaron Lewis <aaronlewis@google.com>
      Reviewed-by: NDavid Edmondson <david.edmondson@oracle.com>
      Message-Id: <20210510144834.658457-2-aaronlewis@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19238e75
    • S
      KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper · 20f632bd
      Sean Christopherson 提交于
      Grab all CR0/CR4 MMU role bits from current vCPU state when initializing
      a non-nested shadow MMU.  Extract the masks from kvm_post_set_cr{0,4}(),
      as the CR0/CR4 update masks must exactly match the mmu_role bits, with
      one exception (see below).  The "full" CR0/CR4 will be used by future
      commits to initialize the MMU and its role, as opposed to the current
      approach of pulling everything from vCPU, which is incorrect for certain
      flows, e.g. nested NPT.
      
      CR4.LA57 is an exception, as it can be toggled on VM-Exit (for L1's MMU)
      but can't be toggled via MOV CR4 while long mode is active.  I.e. LA57
      needs to be in the mmu_role, but technically doesn't need to be checked
      by kvm_post_set_cr4().  However, the extra check is completely benign as
      the hardware restrictions simply mean LA57 will never be _the_ cause of
      a MMU reset during MOV CR4.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-18-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20f632bd
    • S
      KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER · dbc4739b
      Sean Christopherson 提交于
      When configuring KVM's MMU, pass CR0 and CR4 as unsigned longs, and EFER
      as a u64 in various flows (mostly MMU).  Passing the params as u32s is
      functionally ok since all of the affected registers reserve bits 63:32 to
      zero (enforced by KVM), but it's technically wrong.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-15-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbc4739b
    • S
      KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken · 63f5a190
      Sean Christopherson 提交于
      Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
      instability.  Initialize last_vmentry_cpu to -1 and use it to detect if
      the vCPU has been run at least once when its CPUID model is changed.
      
      KVM does not correctly handle changes to paging related settings in the
      guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc...  KVM
      could theoretically zap all shadow pages, but actually making that happen
      is a mess due to lock inversion (vcpu->mutex is held).  And even then,
      updating paging settings on the fly would only work if all vCPUs are
      stopped, updated in concert with identical settings, then restarted.
      
      To support running vCPUs with different vCPU models (that affect paging),
      KVM would need to track all relevant information in kvm_mmu_page_role.
      Note, that's the _page_ role, not the full mmu_role.  Updating mmu_role
      isn't sufficient as a vCPU can reuse a shadow page translation that was
      created by a vCPU with different settings and thus completely skip the
      reserved bit checks (that are tied to CPUID).
      
      Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
      it would require doubling gfn_track from a u16 to a u32, i.e. would
      increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
      E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
      would all need to be tracked.
      
      In practice, there is no remotely sane use case for changing any paging
      related CPUID entries on the fly, so just sweep it under the rug (after
      yelling at userspace).
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63f5a190
    • S
      KVM: x86: Properly reset MMU context at vCPU RESET/INIT · 0aa18375
      Sean Christopherson 提交于
      Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
      was set prior to INIT.  Simply re-initializing the current MMU is not
      sufficient as the current root HPA may not be usable in the new context.
      E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
      KVM will fail to switch to the 32-bit pae_root and bomb on the next
      VM-Enter due to running with a 64-bit CR3 in 32-bit mode.
      
      This bug was papered over in both VMX and SVM, but still managed to rear
      its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
      kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
      checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
      an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
      generate the wrong MMU role.
      
      In VMX, the INIT issue is specific to running without unrestricted guest
      since unrestricted guest is available if and only if EPT is enabled.
      Commit 8668a3c4 ("KVM: VMX: Reset mmu context when entering real
      mode") resolved the issue by forcing a reset when entering emulated real
      mode.
      
      In SVM, commit ebae871a ("kvm: svm: reset mmu on VCPU reset") forced
      a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
      the time the bug was fixed, the SVM problem was exacerbated by a complete
      lack of a CR4 update.
      
      The vendor resets will be reverted in future patches, primarily to aid
      bisection in case there are non-INIT flows that rely on the existing VMX
      logic.
      
      Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
      all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
      CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
      context reset.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0aa18375
    • J
      KVM: debugfs: Reuse binary stats descriptors · bc9e9e67
      Jing Zhang 提交于
      To remove code duplication, use the binary stats descriptors in the
      implementation of the debugfs interface for statistics. This unifies
      the definition of statistics for the binary and debugfs interfaces.
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-8-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bc9e9e67
    • J
      KVM: stats: Support binary stats retrieval for a VCPU · ce55c049
      Jing Zhang 提交于
      Add a VCPU ioctl to get a statistics file descriptor by which a read
      functionality is provided for userspace to read out VCPU stats header,
      descriptors and data.
      Define VCPU statistics descriptors and header for all architectures.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NRicardo Koller <ricarkol@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NFuad Tabba <tabba@google.com>
      Tested-by: Fuad Tabba <tabba@google.com> #arm64
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-5-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ce55c049
    • J
      KVM: stats: Support binary stats retrieval for a VM · fcfe1bae
      Jing Zhang 提交于
      Add a VM ioctl to get a statistics file descriptor by which a read
      functionality is provided for userspace to read out VM stats header,
      descriptors and data.
      Define VM statistics descriptors and header for all architectures.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NRicardo Koller <ricarkol@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NFuad Tabba <tabba@google.com>
      Tested-by: Fuad Tabba <tabba@google.com> #arm64
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-4-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fcfe1bae
  6. 24 6月, 2021 5 次提交
  7. 23 6月, 2021 1 次提交
  8. 18 6月, 2021 16 次提交