1. 23 9月, 2020 1 次提交
  2. 12 9月, 2020 1 次提交
  3. 21 8月, 2020 1 次提交
  4. 18 8月, 2020 3 次提交
    • J
      kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode · cb957adb
      Jim Mattson 提交于
      See the SDM, volume 3, section 4.4.1:
      
      If PAE paging would be in use following an execution of MOV to CR0 or
      MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
      CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
      the PDPTEs are loaded from the address in CR3.
      
      Fixes: b9baba86 ("KVM, pkeys: expose CPUID/CR4 to guest")
      Cc: Huaitong Han <huaitong.han@intel.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20200817181655.3716509-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb957adb
    • J
      kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode · 427890af
      Jim Mattson 提交于
      See the SDM, volume 3, section 4.4.1:
      
      If PAE paging would be in use following an execution of MOV to CR0 or
      MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
      CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
      the PDPTEs are loaded from the address in CR3.
      
      Fixes: 0be0226f ("KVM: MMU: fix SMAP virtualization")
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20200817181655.3716509-2-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      427890af
    • P
      KVM: x86: fix access code passed to gva_to_gpa · 19cf4b7e
      Paolo Bonzini 提交于
      The PK bit of the error code is computed dynamically in permission_fault
      and therefore need not be passed to gva_to_gpa: only the access bits
      (fetch, user, write) need to be passed down.
      
      Not doing so causes a splat in the pku test:
      
         WARNING: CPU: 25 PID: 5465 at arch/x86/kvm/mmu.h:197 paging64_walk_addr_generic+0x594/0x750 [kvm]
         Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0014.D62.2001092233 01/09/2020
         RIP: 0010:paging64_walk_addr_generic+0x594/0x750 [kvm]
         Code: <0f> 0b e9 db fe ff ff 44 8b 43 04 4c 89 6c 24 30 8b 13 41 39 d0 89
         RSP: 0018:ff53778fc623fb60 EFLAGS: 00010202
         RAX: 0000000000000001 RBX: ff53778fc623fbf0 RCX: 0000000000000007
         RDX: 0000000000000001 RSI: 0000000000000002 RDI: ff4501efba818000
         RBP: 0000000000000020 R08: 0000000000000005 R09: 00000000004000e7
         R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
         R13: ff4501efba818388 R14: 10000000004000e7 R15: 0000000000000000
         FS:  00007f2dcf31a700(0000) GS:ff4501f1c8040000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 0000000000000000 CR3: 0000001dea475005 CR4: 0000000000763ee0
         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         PKRU: 55555554
         Call Trace:
          paging64_gva_to_gpa+0x3f/0xb0 [kvm]
          kvm_fixup_and_inject_pf_error+0x48/0xa0 [kvm]
          handle_exception_nmi+0x4fc/0x5b0 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0x911/0x1c10 [kvm]
          kvm_vcpu_ioctl+0x23e/0x5d0 [kvm]
          ksys_ioctl+0x92/0xb0
          __x64_sys_ioctl+0x16/0x20
          do_syscall_64+0x3e/0xb0
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
         ---[ end trace d17eb998aee991da ]---
      Reported-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Fixes: 89786147 ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19cf4b7e
  5. 10 8月, 2020 1 次提交
    • S
      KVM: x86: Don't attempt to load PDPTRs when 64-bit mode is enabled · 05487215
      Sean Christopherson 提交于
      Don't attempt to load PDPTRs if EFER.LME=1, i.e. if 64-bit mode is
      enabled.  A recent change to reload the PDTPRs when CR0.CD or CR0.NW is
      toggled botched the EFER.LME handling and sends KVM down the PDTPR path
      when is_paging() is true, i.e. when the guest toggles CD/NW in 64-bit
      mode.
      
      Split the CR0 checks for 64-bit vs. 32-bit PAE into separate paths.  The
      64-bit path is specifically checking state when paging is toggled on,
      i.e. CR0.PG transititions from 0->1.  The PDPTR path now needs to run if
      the new CR0 state has paging enabled, irrespective of whether paging was
      already enabled.  Trying to shave a few cycles to make the PDPTR path an
      "else if" case is a mess.
      
      Fixes: d42e3fae ("kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changes")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Peter Shier <pshier@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20200714015732.32426-1-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05487215
  6. 05 8月, 2020 1 次提交
  7. 31 7月, 2020 2 次提交
  8. 30 7月, 2020 1 次提交
    • T
      x86/kvm: Use __xfer_to_guest_mode_work_pending() in kvm_run_vcpu() · f3020b88
      Thomas Gleixner 提交于
      The comments explicitely explain that the work flags check and handling in
      kvm_run_vcpu() is done with preemption and interrupts enabled as KVM
      invokes the check again right before entering guest mode with interrupts
      disabled which guarantees that the work flags are observed and handled
      before VMENTER.
      
      Nevertheless the flag pending check in kvm_run_vcpu() uses the helper
      variant which requires interrupts to be disabled triggering an instant
      lockdep splat. This was caught in testing before and then not fixed up in
      the patch before applying. :(
      
      Use the relaxed and intentionally racy __xfer_to_guest_mode_work_pending()
      instead.
      
      Fixes: 72c3c0fe ("x86/kvm: Use generic xfer to guest work function")
      Reported-by: Qian Cai <cai@lca.pw> writes:
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/87bljxa2sa.fsf@nanos.tec.linutronix.de
      
      f3020b88
  9. 24 7月, 2020 1 次提交
  10. 17 7月, 2020 1 次提交
  11. 11 7月, 2020 4 次提交
    • M
      KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support · 3edd6839
      Mohammed Gamal 提交于
      This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which
      allows userspace to query if the underlying architecture would
      support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly
      (e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X)
      
      The complications in this patch are due to unexpected (but documented)
      behaviour we see with NPF vmexit handling in AMD processor.  If
      SVM is modified to add guest physical address checks in the NPF
      and guest #PF paths, we see the followning error multiple times in
      the 'access' test in kvm-unit-tests:
      
                  test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
                  Dump mapping: address: 0x123400000000
                  ------L4: 24c3027
                  ------L3: 24c4027
                  ------L2: 24c5021
                  ------L1: 1002000021
      
      This is because the PTE's accessed bit is set by the CPU hardware before
      the NPF vmexit. This is handled completely by hardware and cannot be fixed
      in software.
      
      Therefore, availability of the new capability depends on a boolean variable
      allow_smaller_maxphyaddr which is set individually by VMX and SVM init
      routines. On VMX it's always set to true, on SVM it's only set to true
      when NPT is not enabled.
      
      CC: Tom Lendacky <thomas.lendacky@amd.com>
      CC: Babu Moger <babu.moger@amd.com>
      Signed-off-by: NMohammed Gamal <mgamal@redhat.com>
      Message-Id: <20200710154811.418214-10-mgamal@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3edd6839
    • P
      KVM: x86: rename update_bp_intercept to update_exception_bitmap · 6986982f
      Paolo Bonzini 提交于
      We would like to introduce a callback to update the #PF intercept
      when CPUID changes.  Just reuse update_bp_intercept since VMX is
      already using update_exception_bitmap instead of a bespoke function.
      
      While at it, remove an unnecessary assignment in the SVM version,
      which is already done in the caller (kvm_arch_vcpu_ioctl_set_guest_debug)
      and has nothing to do with the exception bitmap.
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6986982f
    • M
      KVM: x86: Add helper functions for illegal GPA checking and page fault injection · 89786147
      Mohammed Gamal 提交于
      This patch adds two helper functions that will be used to support virtualizing
      MAXPHYADDR in both kvm-intel.ko and kvm.ko.
      
      kvm_fixup_and_inject_pf_error() injects a page fault for a user-specified GVA,
      while kvm_mmu_is_illegal_gpa() checks whether a GPA exceeds vCPU address limits.
      Signed-off-by: NMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20200710154811.418214-2-mgamal@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      89786147
    • V
      KVM: x86: move MSR_IA32_PERF_CAPABILITIES emulation to common x86 code · d574c539
      Vitaly Kuznetsov 提交于
      state_test/smm_test selftests are failing on AMD with:
      "Unexpected result from KVM_GET_MSRS, r: 51 (failed MSR was 0x345)"
      
      MSR_IA32_PERF_CAPABILITIES is an emulated MSR on Intel but it is not
      known to AMD code, we can move the emulation to common x86 code. For
      AMD, we basically just allow the host to read and write zero to the MSR.
      
      Fixes: 27461da3 ("KVM: x86/pmu: Support full width counting")
      Suggested-by: NJim Mattson <jmattson@google.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200710152559.1645827-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d574c539
  12. 09 7月, 2020 12 次提交
  13. 04 7月, 2020 1 次提交
  14. 29 6月, 2020 1 次提交
    • W
      KVM: X86: Fix async pf caused null-ptr-deref · 9d3c447c
      Wanpeng Li 提交于
      Syzbot reported that:
      
        CPU: 1 PID: 6780 Comm: syz-executor153 Not tainted 5.7.0-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__apic_accept_irq+0x46/0xb80
        Call Trace:
         kvm_arch_async_page_present+0x7de/0x9e0
         kvm_check_async_pf_completion+0x18d/0x400
         kvm_arch_vcpu_ioctl_run+0x18bf/0x69f0
         kvm_vcpu_ioctl+0x46a/0xe20
         ksys_ioctl+0x11a/0x180
         __x64_sys_ioctl+0x6f/0xb0
         do_syscall_64+0xf6/0x7d0
         entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      The testcase enables APF mechanism in MSR_KVM_ASYNC_PF_EN with ASYNC_PF_INT
      enabled w/o setting MSR_KVM_ASYNC_PF_INT before, what's worse, interrupt
      based APF 'page ready' event delivery depends on in kernel lapic, however,
      we didn't bail out when lapic is not in kernel during guest setting
      MSR_KVM_ASYNC_PF_EN which causes the null-ptr-deref in host later.
      This patch fixes it.
      
      Reported-by: syzbot+1bf777dfdde86d64b89b@syzkaller.appspotmail.com
      Fixes: 2635b5c4 (KVM: x86: interrupt based APF 'page ready' event delivery)
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1593426391-8231-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9d3c447c
  15. 23 6月, 2020 2 次提交
    • M
      KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling · 26769f96
      Marcelo Tosatti 提交于
      The Linux TSC calibration procedure is subject to small variations
      (its common to see +-1 kHz difference between reboots on a given CPU, for example).
      
      So migrating a guest between two hosts with identical processor can fail, in case
      of a small variation in calibrated TSC between them.
      
      Without TSC scaling, the current kernel interface will either return an error
      (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode.
      
      This change enables the following TSC tolerance check to
      accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default).
      
              /*
               * Compute the variation in TSC rate which is acceptable
               * within the range of tolerance and decide if the
               * rate being applied is within that bounds of the hardware
               * rate.  If so, no scaling or compensation need be done.
               */
              thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
              thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
              if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
                      pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
                      use_scaling = 1;
              }
      
      NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm).
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      
      Message-Id: <20200616114741.GA298183@fuller.cnet>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      26769f96
    • X
      KVM: X86: Fix MSR range of APIC registers in X2APIC mode · bf10bd0b
      Xiaoyao Li 提交于
      Only MSR address range 0x800 through 0x8ff is architecturally reserved
      and dedicated for accessing APIC registers in x2APIC mode.
      
      Fixes: 0105d1a5 ("KVM: x2apic interface to lapic")
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bf10bd0b
  16. 12 6月, 2020 3 次提交
  17. 09 6月, 2020 1 次提交
  18. 08 6月, 2020 2 次提交
    • E
      KVM: x86: Fix APIC page invalidation race · e649b3f0
      Eiichi Tsukata 提交于
      Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried
      to fix inappropriate APIC page invalidation by re-introducing arch
      specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
      kvm_mmu_notifier_invalidate_range_start. However, the patch left a
      possible race where the VMCS APIC address cache is updated *before*
      it is unmapped:
      
        (Invalidator) kvm_mmu_notifier_invalidate_range_start()
        (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
        (KVM VCPU) vcpu_enter_guest()
        (KVM VCPU) kvm_vcpu_reload_apic_access_page()
        (Invalidator) actually unmap page
      
      Because of the above race, there can be a mismatch between the
      host physical address stored in the APIC_ACCESS_PAGE VMCS field and
      the host physical address stored in the EPT entry for the APIC GPA
      (0xfee0000).  When this happens, the processor will not trap APIC
      accesses, and will instead show the raw contents of the APIC-access page.
      Because Windows OS periodically checks for unexpected modifications to
      the LAPIC register, this will show up as a BSOD crash with BugCheck
      CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
      https://bugzilla.redhat.com/show_bug.cgi?id=1751017.
      
      The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
      cannot guarantee that no additional references are taken to the pages in
      the range before kvm_mmu_notifier_invalidate_range_end().  Fortunately,
      this case is supported by the MMU notifier API, as documented in
      include/linux/mmu_notifier.h:
      
      	 * If the subsystem
               * can't guarantee that no additional references are taken to
               * the pages in the range, it has to implement the
               * invalidate_range() notifier to remove any references taken
               * after invalidate_range_start().
      
      The fix therefore is to reload the APIC-access page field in the VMCS
      from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().
      
      Cc: stable@vger.kernel.org
      Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation")
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951Signed-off-by: NEiichi Tsukata <eiichi.tsukata@nutanix.com>
      Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e649b3f0
    • V
      Revert "KVM: x86: work around leak of uninitialized stack contents" · 25597f64
      Vitaly Kuznetsov 提交于
      handle_vmptrst()/handle_vmread() stopped injecting #PF unconditionally
      and switched to nested_vmx_handle_memory_failure() which just kills the
      guest with KVM_EXIT_INTERNAL_ERROR in case of MMIO access, zeroing
      'exception' in kvm_write_guest_virt_system() is not needed anymore.
      
      This reverts commit 541ab2ae.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200605115906.532682-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      25597f64
  19. 05 6月, 2020 1 次提交