1. 22 9月, 2021 9 次提交
    • H
      kvm: fix wrong exception emulation in check_rdtsc · e9337c84
      Hou Wenlong 提交于
      According to Intel's SDM Vol2 and AMD's APM Vol3, when
      CR4.TSD is set, use rdtsc/rdtscp instruction above privilege
      level 0 should trigger a #GP.
      
      Fixes: d7eb8203 ("KVM: SVM: Add intercept checks for remaining group7 instructions")
      Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
      Message-Id: <1297c0dd3f1bb47a6d089f850b629c7aa0247040.1629257115.git.houwenlong93@linux.alibaba.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e9337c84
    • S
      KVM: SEV: Pin guest memory for write for RECEIVE_UPDATE_DATA · 50c03801
      Sean Christopherson 提交于
      Require the target guest page to be writable when pinning memory for
      RECEIVE_UPDATE_DATA.  Per the SEV API, the PSP writes to guest memory:
      
        The result is then encrypted with GCTX.VEK and written to the memory
        pointed to by GUEST_PADDR field.
      
      Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
      Cc: stable@vger.kernel.org
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210914210951.2994260-2-seanjc@google.com>
      Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NPeter Gonda <pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      50c03801
    • M
      KVM: SVM: fix missing sev_decommission in sev_receive_start · f1815e0a
      Mingwei Zhang 提交于
      DECOMMISSION the current SEV context if binding an ASID fails after
      RECEIVE_START.  Per AMD's SEV API, RECEIVE_START generates a new guest
      context and thus needs to be paired with DECOMMISSION:
      
           The RECEIVE_START command is the only command other than the LAUNCH_START
           command that generates a new guest context and guest handle.
      
      The missing DECOMMISSION can result in subsequent SEV launch failures,
      as the firmware leaks memory and might not able to allocate more SEV
      guest contexts in the future.
      
      Note, LAUNCH_START suffered the same bug, but was previously fixed by
      commit 934002cd ("KVM: SVM: Call SEV Guest Decommission if ASID
      binding fails").
      
      Cc: Alper Gun <alpergun@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: David Rienjes <rientjes@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: John Allen <john.allen@amd.com>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vipin Sharma <vipinsh@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Acked-by: NBrijesh Singh <brijesh.singh@amd.com>
      Fixes: af43cbbf ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command")
      Signed-off-by: NMingwei Zhang <mizhang@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210912181815.3899316-1-mizhang@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f1815e0a
    • P
      KVM: SEV: Acquire vcpu mutex when updating VMSA · bb18a677
      Peter Gonda 提交于
      The update-VMSA ioctl touches data stored in struct kvm_vcpu, and
      therefore should not be performed concurrently with any VCPU ioctl
      that might cause KVM or the processor to use the same data.
      
      Adds vcpu mutex guard to the VMSA updating code. Refactors out
      __sev_launch_update_vmsa() function to deal with per vCPU parts
      of sev_launch_update_vmsa().
      
      Fixes: ad73109a ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20210915171755.3773766-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb18a677
    • Y
      KVM: nVMX: fix comments of handle_vmon() · ed7023a1
      Yu Zhang 提交于
      "VMXON pointer" is saved in vmx->nested.vmxon_ptr since
      commit 3573e22c ("KVM: nVMX: additional checks on
      vmxon region"). Also, handle_vmptrld() & handle_vmclear()
      now have logic to check the VMCS pointer against the VMXON
      pointer.
      
      So just remove the obsolete comments of handle_vmon().
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed7023a1
    • H
      KVM: x86: Handle SRCU initialization failure during page track init · eb7511bf
      Haimin Zhang 提交于
      Check the return of init_srcu_struct(), which can fail due to OOM, when
      initializing the page track mechanism.  Lack of checking leads to a NULL
      pointer deref found by a modified syzkaller.
      Reported-by: NTCS Robot <tcs_robot@tencent.com>
      Signed-off-by: NHaimin Zhang <tcs_kernel@tencent.com>
      Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com>
      [Move the call towards the beginning of kvm_arch_init_vm. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eb7511bf
    • S
      KVM: VMX: Remove defunct "nr_active_uret_msrs" field · cd36ae87
      Sean Christopherson 提交于
      Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are
      both defunct now that KVM keeps the list constant and instead explicitly
      tracks which entries need to be loaded into hardware.
      
      No functional change intended.
      
      Fixes: ee9d22e0 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210908002401.1947049-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cd36ae87
    • S
      KVM: x86: Clear KVM's cached guest CR3 at RESET/INIT · 03a6e840
      Sean Christopherson 提交于
      Explicitly zero the guest's CR3 and mark it available+dirty at RESET/INIT.
      Per Intel's SDM and AMD's APM, CR3 is zeroed at both RESET and INIT.  For
      RESET, this is a nop as vcpu is zero-allocated.  For INIT, the bug has
      likely escaped notice because no firmware/kernel puts its page tables root
      at PA=0, let alone relies on INIT to get the desired CR3 for such page
      tables.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210921000303.400537-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      03a6e840
    • S
      KVM: x86: Mark all registers as avail/dirty at vCPU creation · 7117003f
      Sean Christopherson 提交于
      Mark all registers as available and dirty at vCPU creation, as the vCPU has
      obviously not been loaded into hardware, let alone been given the chance to
      be modified in hardware.  On SVM, reading from "uninitialized" hardware is
      a non-issue as VMCBs are zero allocated (thus not truly uninitialized) and
      hardware does not allow for arbitrary field encoding schemes.
      
      On VMX, backing memory for VMCSes is also zero allocated, but true
      initialization of the VMCS _technically_ requires VMWRITEs, as the VMX
      architectural specification technically allows CPU implementations to
      encode fields with arbitrary schemes.  E.g. a CPU could theoretically store
      the inverted value of every field, which would result in VMREAD to a
      zero-allocated field returns all ones.
      
      In practice, only the AR_BYTES fields are known to be manipulated by
      hardware during VMREAD/VMREAD; no known hardware or VMM (for nested VMX)
      does fancy encoding of cacheable field values (CR0, CR3, CR4, etc...).  In
      other words, this is technically a bug fix, but practically speakings it's
      a glorified nop.
      
      Failure to mark registers as available has been a lurking bug for quite
      some time.  The original register caching supported only GPRs (+RIP, which
      is kinda sorta a GPR), with the masks initialized at ->vcpu_reset().  That
      worked because the two cacheable registers, RIP and RSP, are generally
      speaking not read as side effects in other flows.
      
      Arguably, commit aff48baa ("KVM: Fetch guest cr3 from hardware on
      demand") was the first instance of failure to mark regs available.  While
      _just_ marking CR3 available during vCPU creation wouldn't have fixed the
      VMREAD from an uninitialized VMCS bug because ept_update_paging_mode_cr0()
      unconditionally read vmcs.GUEST_CR3, marking CR3 _and_ intentionally not
      reading GUEST_CR3 when it's available would have avoided VMREAD to a
      technically-uninitialized VMCS.
      
      Fixes: aff48baa ("KVM: Fetch guest cr3 from hardware on demand")
      Fixes: 6de4f3ad ("KVM: Cache pdptrs")
      Fixes: 6de12732 ("KVM: VMX: Optimize vmx_get_rflags()")
      Fixes: 2fb92db1 ("KVM: VMX: Cache vmcs segment fields")
      Fixes: bd31fe49 ("KVM: VMX: Add proper cache tracking for CR0")
      Fixes: f98c1e77 ("KVM: VMX: Add proper cache tracking for CR4")
      Fixes: 5addc235 ("KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags")
      Fixes: 87915858 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210921000303.400537-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7117003f
  2. 06 9月, 2021 8 次提交
    • Z
      KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted · d9130a2d
      Zelin Deng 提交于
      When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature
      especially there's a big tsc warp (like a new vCPU is hot-added into VM
      which has been up for a long time), tsc_offset is added by a large value
      then go back to guest. This causes system time jump as tsc_timestamp is
      not adjusted in the meantime and pvclock monotonic character.
      To fix this, just notify kvm to update vCPU's guest time before back to
      guest.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d9130a2d
    • P
      KVM: MMU: mark role_regs and role accessors as maybe unused · 4ac21457
      Paolo Bonzini 提交于
      It is reasonable for these functions to be used only in some configurations,
      for example only if the host is 64-bits (and therefore supports 64-bit
      guests).  It is also reasonable to keep the role_regs and role accessors
      in sync even though some of the accessors may be used only for one of the
      two sets (as is the case currently for CR4.LA57)..
      
      Because clang reports warnings for unused inlines declared in a .c file,
      mark both sets of accessors as __maybe_unused.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4ac21457
    • S
      KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page · 1148bfc4
      Sean Christopherson 提交于
      Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the
      first cache line, of kvm_mmu_page so that "spt" and to a lesser extent
      "gfns" land in the first cache line.  "lpage_disallowed_link" is accessed
      relatively infrequently compared to "spt", which is accessed any time KVM
      is walking and/or manipulating the shadow page tables.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210901221023.1303578-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1148bfc4
    • S
      KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality · ca41c34c
      Sean Christopherson 提交于
      Move "tdp_mmu_page" into the 1-byte void left by the recently removed
      "mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page,
      i.e. in the same cache line as the most commonly accessed fields.
      
      Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in
      32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch
      can always wrap the field in the unlikely event KVM gains a 1-byte flag
      that is 32-bit specific.
      
      Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due
      to it previously sharing an 8-byte chunk with write_flooding_count.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210901221023.1303578-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ca41c34c
    • S
      Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" · e7177339
      Sean Christopherson 提交于
      Revert a misguided illegal GPA check when "translating" a non-nested GPA.
      The check is woefully incomplete as it does not fill in @exception as
      expected by all callers, which leads to KVM attempting to inject a bogus
      exception, potentially exposing kernel stack information in the process.
      
       WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
       CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0
       RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
       Call Trace:
        x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853
        kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199
        handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336
        __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline]
        vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038
        vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712
        vcpu_run arch/x86/kvm/x86.c:9779 [inline]
        kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010
        kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652
      
      The bug has escaped notice because practically speaking the GPA check is
      useless.  The GPA check in question only comes into play when KVM is
      walking guest page tables (or "translating" CR3), and KVM already handles
      illegal GPA checks by setting reserved bits in rsvd_bits_mask for each
      PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an
      illegal CR3.  This particular failure doesn't hit the existing reserved
      bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture
      simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging
      doesn't define any reserved PA bits checks, which KVM emulates by only
      incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32.
      
      Simply remove the bogus check.  There is zero meaningful value and no
      architectural justification for supporting guest.MAXPHYADDR < 32, and
      properly filling the exception would introduce non-trivial complexity.
      
      This reverts commit ec7771ab.
      
      Fixes: ec7771ab ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210831164224.1119728-2-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e7177339
    • J
      KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page · 678a305b
      Jia He 提交于
      After reverting and restoring the fast tlb invalidation patch series,
      the mmio_cached is not removed. Hence a unused field is left in
      kvm_mmu_page.
      
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NJia He <justin.he@arm.com>
      Message-Id: <20210830145336.27183-1-justin.he@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      678a305b
    • M
      KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation · 81b4b56d
      Maxim Levitsky 提交于
      If we are emulating an invalid guest state, we don't have a correct
      exit reason, and thus we shouldn't do anything in this function.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      81b4b56d
    • S
      KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host · a717a780
      Sean Christopherson 提交于
      Include pml5_root in the set of special roots if and only if the host,
      and thus NPT, is using 5-level paging.  mmu_alloc_special_roots() expects
      special roots to be allocated as a bundle, i.e. they're either all valid
      or all NULL.  But for pml5_root, that expectation only holds true if the
      host uses 5-level paging, which causes KVM to WARN about pml5_root being
      NULL when the other special roots are valid.
      
      The silver lining of 4-level vs. 5-level NPT being tied to the host
      kernel's paging level is that KVM's shadow root level is constant; unlike
      VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR.  That
      means KVM can still expect pml5_root to be bundled with the other special
      roots, it just needs to be conditioned on the shadow root level.
      
      Fixes: cb0f722a ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host")
      Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210824005824.205536-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a717a780
  3. 21 8月, 2021 23 次提交