1. 25 10月, 2019 1 次提交
  2. 23 10月, 2019 2 次提交
    • J
      KVM: nVMX: Don't leak L1 MMIO regions to L2 · 671ddc70
      Jim Mattson 提交于
      If the "virtualize APIC accesses" VM-execution control is set in the
      VMCS, the APIC virtualization hardware is triggered when a page walk
      in VMX non-root mode terminates at a PTE wherein the address of the 4k
      page frame matches the APIC-access address specified in the VMCS. On
      hardware, the APIC-access address may be any valid 4k-aligned physical
      address.
      
      KVM's nVMX implementation enforces the additional constraint that the
      APIC-access address specified in the vmcs12 must be backed by
      a "struct page" in L1. If not, L0 will simply clear the "virtualize
      APIC accesses" VM-execution control in the vmcs02.
      
      The problem with this approach is that the L1 guest has arranged the
      vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
      VM-execution control is clear in the vmcs12--so that the L2 guest
      physical address(es)--or L2 guest linear address(es)--that reference
      the L2 APIC map to the APIC-access address specified in the
      vmcs12. Without the "virtualize APIC accesses" VM-execution control in
      the vmcs02, the APIC accesses in the L2 guest will directly access the
      APIC-access page in L1.
      
      When there is no mapping whatsoever for the APIC-access address in L1,
      the L2 VM just loses the intended APIC virtualization. However, when
      the APIC-access address is mapped to an MMIO region in L1, the L2
      guest gets direct access to the L1 MMIO device. For example, if the
      APIC-access address specified in the vmcs12 is 0xfee00000, then L2
      gets direct access to L1's APIC.
      
      Since this vmcs12 configuration is something that KVM cannot
      faithfully emulate, the appropriate response is to exit to userspace
      with KVM_INTERNAL_ERROR_EMULATION.
      
      Fixes: fe3ef05c ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
      Reported-by: NDan Cross <dcross@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      671ddc70
    • M
      KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update · 5c94ac5d
      Miaohe Lin 提交于
      Guest physical APIC ID may not equal to vcpu->vcpu_id in some case.
      We may set the wrong physical id in avic_handle_ldr_update as we
      always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page
      instead.
      Export and use kvm_xapic_id here and in avic_handle_apic_id_update
      as suggested by Vitaly.
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c94ac5d
  3. 22 10月, 2019 6 次提交
  4. 20 10月, 2019 1 次提交
  5. 18 10月, 2019 2 次提交
  6. 15 10月, 2019 2 次提交
  7. 12 10月, 2019 10 次提交
  8. 09 10月, 2019 2 次提交
    • T
      perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp · df4d2973
      Tom Lendacky 提交于
      It turns out that the NMI latency workaround from commit:
      
        6d3edaae ("x86/perf/amd: Resolve NMI latency issues for active PMCs")
      
      ends up being too conservative and results in the perf NMI handler claiming
      NMIs too easily on AMD hardware when the NMI watchdog is active.
      
      This has an impact, for example, on the hpwdt (HPE watchdog timer) module.
      This module can produce an NMI that is used to reset the system. It
      registers an NMI handler for the NMI_UNKNOWN type and relies on the fact
      that nothing has claimed an NMI so that its handler will be invoked when
      the watchdog device produces an NMI. After the referenced commit, the
      hpwdt module is unable to process its generated NMI if the NMI watchdog is
      active, because the current NMI latency mitigation results in the NMI
      being claimed by the perf NMI handler.
      
      Update the AMD perf NMI latency mitigation workaround to, instead, use a
      window of time. Whenever a PMC is handled in the perf NMI handler, set a
      timestamp which will act as a perf NMI window. Any NMIs arriving within
      that window will be claimed by perf. Anything outside that window will
      not be claimed by perf. The value for the NMI window is set to 100 msecs.
      This is a conservative value that easily covers any NMI latency in the
      hardware. While this still results in a window in which the hpwdt module
      will not receive its NMI, the window is now much, much smaller.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6d3edaae ("x86/perf/amd: Resolve NMI latency issues for active PMCs")
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      df4d2973
    • K
      x86/cpu: Add Comet Lake to the Intel CPU models header · 8d7c6ac3
      Kan Liang 提交于
      Comet Lake is the new 10th Gen Intel processor. Add two new CPU model
      numbers to the Intel family list.
      
      The CPU model numbers are not published in the SDM yet but they come
      from an authoritative internal source.
      
       [ bp: Touch up commit message. ]
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: ak@linux.intel.com
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1570549810-25049-2-git-send-email-kan.liang@linux.intel.com
      8d7c6ac3
  9. 08 10月, 2019 4 次提交
    • S
      x86/cpu/vmware: Use the full form of INL in VMWARE_PORT · fbcfb8f0
      Sami Tolvanen 提交于
      LLVM's assembler doesn't accept the short form INL instruction:
      
        inl (%%dx)
      
      but instead insists on the output register to be explicitly specified:
      
        <inline asm>:1:7: error: invalid operand for instruction
                inl (%dx)
                   ^
        LLVM ERROR: Error parsing inline asm
      
      Use the full form of the instruction to fix the build.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Acked-by: NThomas Hellstrom <thellstrom@vmware.com>
      Cc: clang-built-linux@googlegroups.com
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: "VMware, Inc." <pv-drivers@vmware.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://github.com/ClangBuiltLinux/linux/issues/734
      Link: https://lkml.kernel.org/r/20191007192129.104336-1-samitolvanen@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fbcfb8f0
    • J
      x86/asm: Fix MWAITX C-state hint value · 454de1e7
      Janakarajan Natarajan 提交于
      As per "AMD64 Architecture Programmer's Manual Volume 3: General-Purpose
      and System Instructions", MWAITX EAX[7:4]+1 specifies the optional hint
      of the optimized C-state. For C0 state, EAX[7:4] should be set to 0xf.
      
      Currently, a value of 0xf is set for EAX[3:0] instead of EAX[7:4]. Fix
      this by changing MWAITX_DISABLE_CSTATES from 0xf to 0xf0.
      
      This hasn't had any implications so far because setting reserved bits in
      EAX is simply ignored by the CPU.
      
       [ bp: Fixup comment in delay_mwaitx() and massage. ]
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "x86@kernel.org" <x86@kernel.org>
      Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20191007190011.4859-1-Janakarajan.Natarajan@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      454de1e7
    • B
      x86/xen: Return from panic notifier · c6875f3a
      Boris Ostrovsky 提交于
      Currently execution of panic() continues until Xen's panic notifier
      (xen_panic_event()) is called at which point we make a hypercall that
      never returns.
      
      This means that any notifier that is supposed to be called later as
      well as significant part of panic() code (such as pstore writes from
      kmsg_dump()) is never executed.
      
      There is no reason for xen_panic_event() to be this last point in
      execution since panic()'s emergency_restart() will call into
      xen_emergency_restart() from where we can perform our hypercall.
      
      Nevertheless, we will provide xen_legacy_crash boot option that will
      preserve original behavior during crash. This option could be used,
      for example, if running kernel dumper (which happens after panic
      notifiers) is undesirable.
      Reported-by: NJames Dingwall <james@dingwall.me.uk>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      c6875f3a
    • L
      uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to it · c512c691
      Linus Torvalds 提交于
      In commit 9f79b78e ("Convert filldir[64]() from __put_user() to
      unsafe_put_user()") I made filldir() use unsafe_put_user(), which
      improves code generation on x86 enormously.
      
      But because we didn't have a "unsafe_copy_to_user()", the dirent name
      copy was also done by hand with unsafe_put_user() in a loop, and it
      turns out that a lot of other architectures didn't like that, because
      unlike x86, they have various alignment issues.
      
      Most non-x86 architectures trap and fix it up, and some (like xtensa)
      will just fail unaligned put_user() accesses unconditionally.  Which
      makes that "copy using put_user() in a loop" not work for them at all.
      
      I could make that code do explicit alignment etc, but the architectures
      that don't like unaligned accesses also don't really use the fancy
      "user_access_begin/end()" model, so they might just use the regular old
      __copy_to_user() interface.
      
      So this commit takes that looping implementation, turns it into the x86
      version of "unsafe_copy_to_user()", and makes other architectures
      implement the unsafe copy version as __copy_to_user() (the same way they
      do for the other unsafe_xyz() accessor functions).
      
      Note that it only does this for the copying _to_ user space, and we
      still don't have a unsafe version of copy_from_user().
      
      That's partly because we have no current users of it, but also partly
      because the copy_from_user() case is slightly different and cannot
      efficiently be implemented in terms of a unsafe_get_user() loop (because
      gcc can't do asm goto with outputs).
      
      It would be trivial to do this using "rep movsb", which would work
      really nicely on newer x86 cores, but really badly on some older ones.
      
      Al Viro is looking at cleaning up all our user copy routines to make
      this all a non-issue, but for now we have this simple-but-stupid version
      for x86 that works fine for the dirent name copy case because those
      names are short strings and we simply don't need anything fancier.
      
      Fixes: 9f79b78e ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Reported-and-tested-by: NTony Luck <tony.luck@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c512c691
  10. 07 10月, 2019 1 次提交
    • D
      efi/x86: Do not clean dummy variable in kexec path · 2ecb7402
      Dave Young 提交于
      kexec reboot fails randomly in UEFI based KVM guest.  The firmware
      just resets while calling efi_delete_dummy_variable();  Unfortunately
      I don't know how to debug the firmware, it is also possible a potential
      problem on real hardware as well although nobody reproduced it.
      
      The intention of the efi_delete_dummy_variable is to trigger garbage collection
      when entering virtual mode.  But SetVirtualAddressMap can only run once
      for each physical reboot, thus kexec_enter_virtual_mode() is not necessarily
      a good place to clean a dummy object.
      
      Drop the efi_delete_dummy_variable so that kexec reboot can work.
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMatthew Garrett <mjg59@google.com>
      Cc: Ben Dooks <ben.dooks@codethink.co.uk>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Octavian Purdila <octavian.purdila@intel.com>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott Talbert <swt@techie.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Cc: linux-integrity@vger.kernel.org
      Link: https://lkml.kernel.org/r/20191002165904.8819-8-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ecb7402
  11. 04 10月, 2019 1 次提交
  12. 03 10月, 2019 2 次提交
  13. 02 10月, 2019 2 次提交
  14. 01 10月, 2019 3 次提交
  15. 28 9月, 2019 1 次提交
    • W
      KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF · 19a36d32
      Waiman Long 提交于
      The l1tf_vmx_mitigation is only set to VMENTER_L1D_FLUSH_NOT_REQUIRED
      when the ARCH_CAPABILITIES MSR indicates that L1D flush is not required.
      However, if the CPU is not affected by L1TF, l1tf_vmx_mitigation will
      still be set to VMENTER_L1D_FLUSH_AUTO. This is certainly not the best
      option for a !X86_BUG_L1TF CPU.
      
      So force l1tf_vmx_mitigation to VMENTER_L1D_FLUSH_NOT_REQUIRED to make it
      more explicit in case users are checking the vmentry_l1d_flush parameter.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      [Patch rewritten accoring to Borislav Petkov's suggestion. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19a36d32