1. 26 5月, 2019 1 次提交
    • V
      x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 · a0a49d87
      Vitaly Kuznetsov 提交于
      [ Upstream commit da66761c2d93a46270d69001abb5692717495a68 ]
      
      It was reported that with some special Multi Processor Group configuration,
      e.g:
       bcdedit.exe /set groupsize 1
       bcdedit.exe /set maxgroup on
       bcdedit.exe /set groupaware on
      for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
      is in use.
      
      Tracing kvm_hv_flush_tlb immediately reveals the issue:
      
       kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
      
      The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
      however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
      We don't flush anything and apparently it's not what Windows expects.
      
      TLFS doesn't say anything about such requests and newer Windows versions
      seem to be unaffected. This all feels like a WS2012 bug, which is, however,
      easy to workaround in KVM: let's flush everything when we see an empty
      flush request, over-flushing doesn't hurt.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a0a49d87
  2. 22 5月, 2019 2 次提交
  3. 17 5月, 2019 2 次提交
  4. 15 5月, 2019 4 次提交
  5. 08 5月, 2019 1 次提交
  6. 05 5月, 2019 2 次提交
  7. 27 4月, 2019 3 次提交
    • S
      Revert "svm: Fix AVIC incomplete IPI emulation" · 3e1b3e4d
      Suthikulpanit, Suravee 提交于
      commit 4a58038b9e420276157785afa0a0bbb4b9bc2265 upstream.
      
      This reverts commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57.
      
      As Oren Twaig pointed out the old discussion:
      
        https://patchwork.kernel.org/patch/8292231/
      
      that the change coud potentially cause an extra IPI to be sent to
      the destination vcpu because the AVIC hardware already set the IRR bit
      before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
      Since writting to ICR and ICR2 will also set the IRR. If something triggers
      the destination vcpu to get scheduled before the emulation finishes, then
      this could result in an additional IPI.
      
      Also, the issue mentioned in the commit bb218fbcfaaa was misdiagnosed.
      
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reported-by: NOren Twaig <oren@scalemp.com>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e1b3e4d
    • V
      KVM: x86: svm: make sure NMI is injected after nmi_singlestep · c9e34935
      Vitaly Kuznetsov 提交于
      commit 99c221796a810055974b54c02e8f53297e48d146 upstream.
      
      I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
      the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
      shows that we're sometimes able to deliver a few but never all.
      
      When we're trying to inject an NMI we may fail to do so immediately for
      various reasons, however, we still need to inject it so enable_nmi_window()
      arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
      for pending NMIs before entering the guest and unless there's a different
      event to process, the NMI will never get delivered.
      
      Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
      pending NMIs are checked and possibly injected.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9e34935
    • S
      KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU · 18cf09a8
      Sean Christopherson 提交于
      commit 8f4dc2e77cdfaf7e644ef29693fa229db29ee1de upstream.
      
      Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
      state area, i.e. don't save/restore EFER across SMM transitions.  KVM
      somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
      guest doesn't support long mode.  But during RSM, KVM unconditionally
      clears EFER so that it can get back to pure 32-bit mode in order to
      start loading CRs with their actual non-SMM values.
      
      Clear EFER only when it will be written when loading the non-SMM state
      so as to preserve bits that can theoretically be set on 32-bit vCPUs,
      e.g. KVM always emulates EFER_SCE.
      
      And because CR4.PAE is cleared only to play nice with EFER, wrap that
      code in the long mode check as well.  Note, this may result in a
      compiler warning about cr4 being consumed uninitialized.  Re-read CR4
      even though it's technically unnecessary, as doing so allows for more
      readable code and RSM emulation is not a performance critical path.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18cf09a8
  8. 20 4月, 2019 1 次提交
    • S
      KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail · f35e2a68
      Sean Christopherson 提交于
      [ Upstream commit bd18bffca35397214ae68d85cf7203aca25c3c1d ]
      
      A VMEnter that VMFails (as opposed to VMExits) does not touch host
      state beyond registers that are explicitly noted in the VMFail path,
      e.g. EFLAGS.  Host state does not need to be loaded because VMFail
      is only signaled for consistency checks that occur before the CPU
      starts to load guest state, i.e. there is no need to restore any
      state as nothing has been modified.  But in the case where a VMFail
      is detected by hardware and not by KVM (due to deferring consistency
      checks to hardware), KVM has already loaded some amount of guest
      state.  Luckily, "loaded" only means loaded to KVM's software model,
      i.e. vmcs01 has not been modified.  So, unwind our software model to
      the pre-VMEntry host state.
      
      Not restoring host state in this VMFail path leads to a variety of
      failures because we end up with stale data in vcpu->arch, e.g. CR0,
      CR4, EFER, etc... will all be out of sync relative to vmcs01.  Any
      significant delta in the stale data is all but guaranteed to crash
      L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
      
      An alternative to this "soft" reload would be to load host state from
      vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
      wildly inconsistent with respect to the VMX architecture, e.g. an L1
      VMM with separate VMExit and VMFail paths would explode.
      
      Note that this approach does not mean KVM is 100% accurate with
      respect to VMX hardware behavior, even at an architectural level
      (the exact order of consistency checks is microarchitecture specific).
      But 100% emulation accuracy isn't the goal (with this patch), rather
      the goal is to be consistent in the information delivered to L1, e.g.
      a VMExit should not fall-through VMENTER, and a VMFail should not jump
      to HOST_RIP.
      
      This technically reverts commit "5af41573 (KVM: nVMX: Fix mmu
      context after VMLAUNCH/VMRESUME failure)", but retains the core
      aspects of that patch, just in an open coded form due to the need to
      pull state from vmcs01 instead of vmcs12.  Restoring host state
      resolves a variety of issues introduced by commit "4f350c6d
      (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
      which remedied the incorrect behavior of treating VMFail like VMExit
      but in doing so neglected to restore arch state that had been modified
      prior to attempting nested VMEnter.
      
      A sample failure that occurs due to stale vcpu.arch state is a fault
      of some form while emulating an LGDT (due to emulated UMIP) from L1
      after a failed VMEntry to L3, in this case when running the KVM unit
      test test_tpr_threshold_values in L1.  L0 also hits a WARN in this
      case due to a stale arch.cr4.UMIP.
      
      L1:
        BUG: unable to handle kernel paging request at ffffc90000663b9e
        PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
        Oops: 0009 [#1] SMP
        CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G        W         4.18.0-rc2+ #2
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:native_load_gdt+0x0/0x10
      
        ...
      
        Call Trace:
         load_fixmap_gdt+0x22/0x30
         __vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
         vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
         nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
         vmx_handle_exit+0x246/0x15a0 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
         kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
         do_vfs_ioctl+0x9f/0x600
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x4f/0x100
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      L0:
        WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
        ...
        CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
        Hardware name: Intel Corporation Kabylake Client platform/KBL S
        RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
      
        ...
      
        Call Trace:
         kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
         kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
         do_vfs_ioctl+0x9f/0x5e0
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x49/0xf0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 5af41573 (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
      Fixes: 4f350c6d (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f35e2a68
  9. 17 4月, 2019 4 次提交
  10. 03 4月, 2019 2 次提交
    • S
      KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 3a18eaba
      Sean Christopherson 提交于
      commit 0cf9135b773bf32fba9dd8e6699c1b331ee4b749 upstream.
      
      The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
      userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
      regardless of hardware support under the pretense that KVM fully
      emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
      handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
      also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
      
      Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
      that it's emulated on AMD hosts.
      
      Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
      Cc: stable@vger.kernel.org
      Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a18eaba
    • S
      KVM: x86: update %rip after emulating IO · b9733a74
      Sean Christopherson 提交于
      commit 45def77ebf79e2e8942b89ed79294d97ce914fa0 upstream.
      
      Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
      OUT 92h or CF9h.  Userspace may emulate said mechanism, i.e. reset a
      vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
      that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.
      
      To avoid corruping %rip after such a reset, commit 0967b7bf ("KVM:
      Skip pio instruction when it is emulated, not executed") changed the
      behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
      instruction prior to exiting to userspace.  Full emulation doesn't need
      such tricks becase re-emulating the instruction will naturally handle
      %rip being changed to point at the reset vector.
      
      Updating %rip prior to executing to userspace has several drawbacks:
      
        - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
          fails it will likely yell about the wrong address.
        - Single step exits to userspace for are effectively dropped as
          KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
        - Behavior of PIO emulation is different depending on whether it
          goes down the fast path or the slow path.
      
      Rather than skip the PIO instruction before exiting to userspace,
      snapshot the linear %rip and cancel PIO completion if the current
      value does not match the snapshot.  For a 64-bit vCPU, i.e. the most
      common scenario, the snapshot and comparison has negligible overhead
      as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
      VMREAD in this case.
      
      All other alternatives to snapshotting the linear %rip that don't
      rely on an explicit reset announcenment suffer from one corner case
      or another.  For example, canceling PIO completion on any write to
      %rip fails if userspace does a save/restore of %rip, and attempting to
      avoid that issue by canceling PIO only if %rip changed then fails if PIO
      collides with the reset %rip.  Attempting to zero in on the exact reset
      vector won't work for APs, which means adding more hooks such as the
      vCPU's MP_STATE, and so on and so forth.
      
      Checking for a linear %rip match technically suffers from corner cases,
      e.g. userspace could theoretically rewrite the underlying code page and
      expect a different instruction to execute, or the guest hardcodes a PIO
      reset at 0xfffffff0, but those are far, far outside of what can be
      considered normal operation.
      
      Fixes: 432baf60 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
      Cc: <stable@vger.kernel.org>
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9733a74
  11. 24 3月, 2019 6 次提交
    • S
      KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 5ffb710b
      Sean Christopherson 提交于
      commit 34333cc6c2cb021662fd32e24e618d1b86de95bf upstream.
      
      Regarding segments with a limit==0xffffffff, the SDM officially states:
      
          When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
          or may not cause the indicated exceptions.  Behavior is
          implementation-specific and may vary from one execution to another.
      
      In practice, all CPUs that support VMX ignore limit checks for "flat
      segments", i.e. an expand-up data or code segment with base=0 and
      limit=0xffffffff.  This is subtly different than wrapping the effective
      address calculation based on the address size, as the flat segment
      behavior also applies to accesses that would wrap the 4g boundary, e.g.
      a 4-byte access starting at 0xffffffff will access linear addresses
      0xffffffff, 0x0, 0x1 and 0x2.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ffb710b
    • S
      KVM: nVMX: Apply addr size mask to effective address for VMX instructions · 29b515c2
      Sean Christopherson 提交于
      commit 8570f9e881e3fde98801bb3a47eef84dd934d405 upstream.
      
      The address size of an instruction affects the effective address, not
      the virtual/linear address.  The final address may still be truncated,
      e.g. to 32-bits outside of long mode, but that happens irrespective of
      the address size, e.g. a 32-bit address size can yield a 64-bit virtual
      address when using FS/GS with a non-zero base.
      
      Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29b515c2
    • S
      KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 9ce0ffeb
      Sean Christopherson 提交于
      commit 946c522b603f281195af1df91837a1d4d1eb3bc9 upstream.
      
      The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
      operands for various instructions, including VMX instructions, as a
      naturally sized unsigned value, but masks the value by the addr size,
      e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
      reported as 0xffffffd8 for a 32-bit address size.  Despite some weird
      wording regarding sign extension, the SDM explicitly states that bits
      beyond the instructions address size are undefined:
      
          In all cases, bits of this field beyond the instruction’s address
          size are undefined.
      
      Failure to sign extend the displacement results in KVM incorrectly
      treating a negative displacement as a large positive displacement when
      the address size of the VMX instruction is smaller than KVM's native
      size, e.g. a 32-bit address size on a 64-bit KVM.
      
      The very original decoding, added by commit 064aea77 ("KVM: nVMX:
      Decoding memory operands of VMX instructions"), sort of modeled sign
      extension by truncating the final virtual/linear address for a 32-bit
      address size.  I.e. it messed up the effective address but made it work
      by adjusting the final address.
      
      When segmentation checks were added, the truncation logic was kept
      as-is and no sign extension logic was introduced.  In other words, it
      kept calculating the wrong effective address while mostly generating
      the correct virtual/linear address.  As the effective address is what's
      used in the segment limit checks, this results in KVM incorreclty
      injecting #GP/#SS faults due to non-existent segment violations when
      a nested VMM uses negative displacements with an address size smaller
      than KVM's native address size.
      
      Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
      KVM using 0x100000fd8 as the effective address when checking for a
      segment limit violation.  This causes a 100% failure rate when running
      a 32-bit KVM build as L1 on top of a 64-bit KVM L0.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ce0ffeb
    • S
      KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux · c235af5a
      Sean Christopherson 提交于
      commit ddfd1730fd829743e41213e32ccc8b4aa6dc8325 upstream.
      
      When installing new memslots, KVM sets bit 0 of the generation number to
      indicate that an update is in-progress.  Until the update is complete,
      there are no guarantees as to whether a vCPU will see the old or the new
      memslots.  Explicity prevent caching MMIO accesses so as to avoid using
      an access cached from the old memslots after the new memslots have been
      installed.
      
      Note that it is unclear whether or not disabling caching during the
      update window is strictly necessary as there is no definitive
      documentation as to what ordering guarantees KVM provides with respect
      to updating memslots.  That being said, the MMIO spte code does not
      allow reusing sptes created while an update is in-progress, and the
      associated documentation explicitly states:
      
          We do not want to use an MMIO sptes created with an odd generation
          number, ...  If KVM is unlucky and creates an MMIO spte while the
          low bit is 1, the next access to the spte will always be a cache miss.
      
      At the very least, disabling the per-vCPU MMIO cache during updates will
      make its behavior consistent with the MMIO spte behavior and
      documentation.
      
      Fixes: 56f17dd3 ("kvm: x86: fix stale mmio cache bug")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c235af5a
    • S
      KVM: x86/mmu: Detect MMIO generation wrap in any address space · 656e9e5d
      Sean Christopherson 提交于
      commit e1359e2beb8b0a1188abc997273acbaedc8ee791 upstream.
      
      The check to detect a wrap of the MMIO generation explicitly looks for a
      generation number of zero.  Now that unique memslots generation numbers
      are assigned to each address space, only address space 0 will get a
      generation number of exactly zero when wrapping.  E.g. when address
      space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will
      wrap to 0x2.  Adjust the MMIO generation to strip the address space
      modifier prior to checking for a wrap.
      
      Fixes: 4bd518f1 ("KVM: use separate generations for each address space")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      656e9e5d
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 23ad135a
      Sean Christopherson 提交于
      commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.
      
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23ad135a
  12. 06 3月, 2019 2 次提交
  13. 27 2月, 2019 1 次提交
  14. 20 2月, 2019 3 次提交
  15. 13 2月, 2019 4 次提交
    • J
      cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · 97a7fa90
      Josh Poimboeuf 提交于
      commit b284909abad48b07d3071a9fc9b5692b3e64914b upstream.
      
      With the following commit:
      
        73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      
      ... the hotplug code attempted to detect when SMT was disabled by BIOS,
      in which case it reported SMT as permanently disabled.  However, that
      code broke a virt hotplug scenario, where the guest is booted with only
      primary CPU threads, and a sibling is brought online later.
      
      The problem is that there doesn't seem to be a way to reliably
      distinguish between the HW "SMT disabled by BIOS" case and the virt
      "sibling not yet brought online" case.  So the above-mentioned commit
      was a bit misguided, as it permanently disabled SMT for both cases,
      preventing future virt sibling hotplugs.
      
      Going back and reviewing the original problems which were attempted to
      be solved by that commit, when SMT was disabled in BIOS:
      
        1) /sys/devices/system/cpu/smt/control showed "on" instead of
           "notsupported"; and
      
        2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
      
      I'd propose that we instead consider #1 above to not actually be a
      problem.  Because, at least in the virt case, it's possible that SMT
      wasn't disabled by BIOS and a sibling thread could be brought online
      later.  So it makes sense to just always default the smt control to "on"
      to allow for that possibility (assuming cpuid indicates that the CPU
      supports SMT).
      
      The real problem is #2, which has a simple fix: change vmx_vm_init() to
      query the actual current SMT state -- i.e., whether any siblings are
      currently online -- instead of looking at the SMT "control" sysfs value.
      
      So fix it by:
      
        a) reverting the original "fix" and its followup fix:
      
           73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
           bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")
      
           and
      
        b) changing vmx_vm_init() to query the actual current SMT state --
           instead of the sysfs control value -- to determine whether the L1TF
           warning is needed.  This also requires the 'sched_smt_present'
           variable to exported, instead of 'cpu_smt_control'.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      97a7fa90
    • P
      KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) · 236fd677
      Peter Shier 提交于
      commit ecec76885bcfe3294685dc363fd1273df0d5d65f upstream.
      
      Bugzilla: 1671904
      
      There are multiple code paths where an hrtimer may have been started to
      emulate an L1 VMX preemption timer that can result in a call to free_nested
      without an intervening L2 exit where the hrtimer is normally
      cancelled. Unconditionally cancel in free_nested to cover all cases.
      
      Embargoed until Feb 7th 2019.
      Signed-off-by: NPeter Shier <pshier@google.com>
      Reported-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reported-by: NFelix Wilhelm <fwilhelm@google.com>
      Cc: stable@kernel.org
      Message-Id: <20181011184646.154065-1-pshier@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      236fd677
    • P
      KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) · 5a45d372
      Paolo Bonzini 提交于
      commit 353c0956a618a07ba4bbe7ad00ff29fe70e8412a upstream.
      
      Bugzilla: 1671930
      
      Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with
      memory operand, INVEPT, INVVPID) can incorrectly inject a page fault
      when passed an operand that points to an MMIO address.  The page fault
      will use uninitialized kernel stack memory as the CR2 and error code.
      
      The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
      exit to userspace; however, it is not an easy fix, so for now just
      ensure that the error code and CR2 are zero.
      
      Embargoed until Feb 7th 2019.
      Reported-by: NFelix Wilhelm <fwilhelm@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a45d372
    • V
      KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported · 84a4572b
      Vitaly Kuznetsov 提交于
      [ Upstream commit e87555e550cef4941579cd879759a7c0dee24e68 ]
      
      AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm
      knows nothing about it, however, this MSR is among emulated_msrs and
      thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS,
      of course, fails.
      
      Report the MSR as unsupported to not confuse userspace.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      84a4572b
  16. 31 1月, 2019 2 次提交