1. 22 7月, 2019 3 次提交
  2. 20 7月, 2019 6 次提交
    • E
      KVM: x86: Add fixed counters to PMU filter · 30cd8604
      Eric Hankland 提交于
      Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist
      fixed counters.
      Signed-off-by: NEric Hankland <ehankland@google.com>
      [No need to check padding fields for zero. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      30cd8604
    • P
      KVM: nVMX: do not use dangling shadow VMCS after guest reset · 88dddc11
      Paolo Bonzini 提交于
      If a KVM guest is reset while running a nested guest, free_nested will
      disable the shadow VMCS execution control in the vmcs01.  However,
      on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
      the VMCS12 to the shadow VMCS which has since been freed.
      
      This causes a vmptrld of a NULL pointer on my machime, but Jan reports
      the host to hang altogether.  Let's see how much this trivial patch fixes.
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      88dddc11
    • P
      KVM: VMX: dump VMCS on failed entry · 3b20e03a
      Paolo Bonzini 提交于
      This is useful for debugging, and is ratelimited nowadays.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3b20e03a
    • L
      KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed · 6fc3977c
      Like Xu 提交于
      If a perf_event creation fails due to any reason of the host perf
      subsystem, it has no chance to log the corresponding event for guest
      which may cause abnormal sampling data in guest result. In debug mode,
      this message helps to understand the state of vPMC and we may not
      limit the number of occurrences but not in a spamming style.
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fc3977c
    • L
      KVM: SVM: Fix detection of AMD Errata 1096 · 118154bd
      Liran Alon 提交于
      When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is
      possible that CPU microcode implementing DecodeAssist will fail
      to read bytes of instruction which caused #NPF. This is AMD errata
      1096 and it happens because CPU microcode reading instruction bytes
      incorrectly attempts to read code as implicit supervisor-mode data
      accesses (that is, just like it would read e.g. a TSS), which are
      susceptible to SMAP faults. The microcode reads CS:RIP and if it is
      a user-mode address according to the page tables, the processor
      gives up and returns no instruction bytes.  In this case,
      GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly
      return 0 instead of the correct guest instruction bytes.
      
      Current KVM code attemps to detect and workaround this errata, but it
      has multiple issues:
      
      1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1,
      which is required for encountering a SMAP fault.
      
      2) It assumes SMAP faults can only occur when guest CPL==3.
      However, in case guest CR4.SMEP=0, the guest can execute an instruction
      which reside in a user-accessible page with CPL<3 priviledge. If this
      instruction raise a #NPF on it's data access, then CPU DecodeAssist
      microcode will still encounter a SMAP violation.  Even though no sane
      OS will do so (as it's an obvious priviledge escalation vulnerability),
      we still need to handle this semanticly correct in KVM side.
      
      Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy
      triggerable condition and guests usually enable SMAP together with SMEP.
      If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt
      at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest
      instead of #NPF.  We keep this condition to avoid false positives in
      the detection of the errata.
      
      In addition, to avoid future confusion and improve code readbility,
      include details of the errata in code and not just in commit message.
      
      Fixes: 05d5a486 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
      Cc: Singh Brijesh <brijesh.singh@amd.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      118154bd
    • W
      KVM: LAPIC: Inject timer interrupt via posted interrupt · 0c5f81da
      Wanpeng Li 提交于
      Dedicated instances are currently disturbed by unnecessary jitter due
      to the emulated lapic timers firing on the same pCPUs where the
      vCPUs reside.  There is no hardware virtual timer on Intel for guest
      like ARM, so both programming timer in guest and the emulated timer fires
      incur vmexits.  This patch tries to avoid vmexit when the emulated timer
      fires, at least in dedicated instance scenario when nohz_full is enabled.
      
      In that case, the emulated timers can be offload to the nearest busy
      housekeeping cpus since APICv has been found for several years in server
      processors. The guest timer interrupt can then be injected via posted interrupts,
      which are delivered by the housekeeping cpu once the emulated timer fires.
      
      The host should tuned so that vCPUs are placed on isolated physical
      processors, and with several pCPUs surplus for busy housekeeping.
      If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
      ~3% redis performance benefit can be observed on Skylake server, and the
      number of external interrupt vmexits drops substantially.  Without patch
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
      EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
      
      While with patch:
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
      EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c5f81da
  3. 18 7月, 2019 1 次提交
    • W
      KVM: LAPIC: Make lapic timer unpinned · 4d151bf3
      Wanpeng Li 提交于
      Commit 61abdbe0 ("kvm: x86: make lapic hrtimer pinned") pinned the
      lapic timer to avoid to wait until the next kvm exit for the guest to
      see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick
      after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned
      will be used in follow up patches.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d151bf3
  4. 17 7月, 2019 1 次提交
  5. 16 7月, 2019 2 次提交
  6. 15 7月, 2019 5 次提交
    • Y
      kvm: x86: some tsc debug cleanup · 9a5611af
      Yi Wang 提交于
      There are some pr_debug in TSC code, which may have
      been no use, so remove them as Paolo suggested.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a5611af
    • Y
      kvm: vmx: fix coccinelle warnings · 9481b7f1
      Yi Wang 提交于
      This fixes the following coccinelle warning:
      
      WARNING: return of 0/1 in function 'vmx_need_emulation_on_page_fault'
      with return type bool
      
      Return false instead of 0.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9481b7f1
    • A
      x86: kvm: avoid constant-conversion warning · a6a6d3b1
      Arnd Bergmann 提交于
      clang finds a contruct suspicious that converts an unsigned
      character to a signed integer and back, causing an overflow:
      
      arch/x86/kvm/mmu.c:4605:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Werror,-Wconstant-conversion]
                      u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
                         ~~                               ^~
      arch/x86/kvm/mmu.c:4607:38: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -241 to 15 [-Werror,-Wconstant-conversion]
                      u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
                         ~~                              ^~
      arch/x86/kvm/mmu.c:4609:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -171 to 85 [-Werror,-Wconstant-conversion]
                      u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
                         ~~                               ^~
      
      Add an explicit cast to tell clang that everything works as
      intended here.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://github.com/ClangBuiltLinux/linux/issues/95Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a6a6d3b1
    • A
      x86: kvm: avoid -Wsometimes-uninitized warning · f4e4805e
      Arnd Bergmann 提交于
      Clang notices a code path in which some variables are never
      initialized, but fails to figure out that this can never happen
      on i386 because is_64_bit_mode() always returns false.
      
      arch/x86/kvm/hyperv.c:1610:6: error: variable 'ingpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
              if (!longmode) {
                  ^~~~~~~~~
      arch/x86/kvm/hyperv.c:1632:55: note: uninitialized use occurs here
              trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa);
                                                                   ^~~~~
      arch/x86/kvm/hyperv.c:1610:2: note: remove the 'if' if its condition is always true
              if (!longmode) {
              ^~~~~~~~~~~~~~~
      arch/x86/kvm/hyperv.c:1595:18: note: initialize the variable 'ingpa' to silence this warning
              u64 param, ingpa, outgpa, ret = HV_STATUS_SUCCESS;
                              ^
                               = 0
      arch/x86/kvm/hyperv.c:1610:6: error: variable 'outgpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
      arch/x86/kvm/hyperv.c:1610:6: error: variable 'param' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
      
      Flip the condition around to avoid the conditional execution on i386.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4e4805e
    • J
      KVM: x86: expose AVX512_BF16 feature to guest · 0b774629
      Jing Liu 提交于
      AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
      format (BF16) for deep learning optimization.
      
      Intel adds AVX512 BFLOAT16 feature in CooperLake, which is CPUID.7.1.EAX[5].
      
      Detailed information of the CPUID bit can be found here,
      https://software.intel.com/sites/default/files/managed/c5/15/\
      architecture-instruction-set-extensions-programming-reference.pdf.
      Signed-off-by: NJing Liu <jing2.liu@linux.intel.com>
      [Fix type mismatch in min, changing constant "1" to "1u". - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b774629
  7. 13 7月, 2019 7 次提交
  8. 11 7月, 2019 4 次提交
  9. 10 7月, 2019 3 次提交
    • A
      x86/pgtable/32: Fix LOWMEM_PAGES constant · 26515699
      Arnd Bergmann 提交于
      clang points out that the computation of LOWMEM_PAGES causes a signed
      integer overflow on 32-bit x86:
      
      arch/x86/kernel/head32.c:83:20: error: signed shift result (0x100000000) requires 34 bits to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow]
                      (PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT);
                                       ^~~~~~~~~~~~
      arch/x86/include/asm/pgtable_32.h:109:27: note: expanded from macro 'LOWMEM_PAGES'
       #define LOWMEM_PAGES ((((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT))
                               ~^ ~~
      arch/x86/include/asm/pgtable_32.h:98:34: note: expanded from macro 'PAGE_TABLE_SIZE'
       #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
      
      Use the _ULL() macro to make it a 64-bit constant.
      
      Fixes: 1e620f9b ("x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190710130522.1802800-1-arnd@arndb.de
      26515699
    • Y
      kvm: x86: Fix -Wmissing-prototypes warnings · cdc238eb
      Yi Wang 提交于
      We get a warning when build kernel W=1:
      
      arch/x86/kvm/../../../virt/kvm/eventfd.c:48:1: warning: no previous prototype for ‘kvm_arch_irqfd_allowed’ [-Wmissing-prototypes]
       kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args)
       ^
      
      The reason is kvm_arch_irqfd_allowed() is declared in arch/x86/kvm/irq.h,
      which is not included by eventfd.c. Considering kvm_arch_irqfd_allowed()
      is a weakly defined function in eventfd.c, remove the declaration to
      kvm_host.h can fix this.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cdc238eb
    • P
      x86/alternatives: Fix int3_emulate_call() selftest stack corruption · ecc60610
      Peter Zijlstra 提交于
      KASAN shows the following splat during boot:
      
        BUG: KASAN: unknown-crash in unwind_next_frame+0x3f6/0x490
        Read of size 8 at addr ffffffff84007db0 by task swapper/0
      
        CPU: 0 PID: 0 Comm: swapper Tainted: G                T 5.2.0-rc6-00013-g7457c0da #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         dump_stack+0x19/0x1b
         print_address_description+0x1b0/0x2b2
         __kasan_report+0x10f/0x171
         kasan_report+0x12/0x1c
         __asan_load8+0x54/0x81
         unwind_next_frame+0x3f6/0x490
         unwind_next_frame+0x1b/0x23
         arch_stack_walk+0x68/0xa5
         stack_trace_save+0x7b/0xa0
         save_trace+0x3c/0x93
         mark_lock+0x1ef/0x9b1
         lock_acquire+0x122/0x221
         __mutex_lock+0xb6/0x731
         mutex_lock_nested+0x16/0x18
         _vm_unmap_aliases+0x141/0x183
         vm_unmap_aliases+0x14/0x16
         change_page_attr_set_clr+0x15e/0x2f2
         set_memory_4k+0x2a/0x2c
         check_bugs+0x11fd/0x1298
         start_kernel+0x793/0x7eb
         x86_64_start_reservations+0x55/0x76
         x86_64_start_kernel+0x87/0xaa
         secondary_startup_64+0xa4/0xb0
      
        Memory state around the buggy address:
         ffffffff84007c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
         ffffffff84007d00: f1 00 00 00 00 00 00 00 00 00 f2 f2 f2 f3 f3 f3
        >ffffffff84007d80: f3 79 be 52 49 79 be 00 00 00 00 00 00 00 00 f1
      
      It turns out that int3_selftest() is corrupting the stack.  The problem is
      that the KASAN-ified version of int3_magic() is much less trivial than the
      C code appears.  It clobbers several unexpected registers.  So when the
      selftest's INT3 is converted to an emulated call to int3_magic(), the
      registers are clobbered and Bad Things happen when the function returns.
      
      Fix this by converting int3_magic() to the trivial ASM function it should
      be, avoiding all calling convention issues. Also add ASM_CALL_CONSTRAINT to
      the INT3 ASM, since it contains a 'CALL'.
      
      [peterz: cribbed changelog from josh]
      
      Fixes: 7457c0da ("x86/alternatives: Add int3_emulate_call() selftest")
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Debugged-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Link: https://lkml.kernel.org/r/20190709125744.GB3402@hirez.programming.kicks-ass.net
      ecc60610
  10. 09 7月, 2019 4 次提交
  11. 07 7月, 2019 2 次提交
    • S
      x86/fpu: Inline fpu__xstate_clear_all_cpu_caps() · 7891bc0a
      Sebastian Andrzej Siewior 提交于
      All fpu__xstate_clear_all_cpu_caps() does is to invoke one simple
      function since commit
      
        73e3a7d2 ("x86/fpu: Remove the explicit clearing of XSAVE dependent features")
      
      so invoke that function directly and remove the wrapper.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190704060743.rvew4yrjd6n33uzx@linutronix.de
      7891bc0a
    • S
      x86/fpu: Make 'no387' and 'nofxsr' command line options useful · 9838e3bf
      Sebastian Andrzej Siewior 提交于
      The command line option `no387' is designed to disable the FPU
      entirely. This only 'works' with CONFIG_MATH_EMULATION enabled.
      
      But on 64bit this cannot work because user space expects SSE to work which
      required basic FPU support. MATH_EMULATION does not help because SSE is not
      emulated.
      
      The command line option `nofxsr' should also be limited to 32bit because
      FXSR is part of the required flags on 64bit so turning it off is not
      possible.
      
      Clearing X86_FEATURE_FPU without emulation enabled will not work anyway and
      hang in fpu__init_system_early_generic() before the console is enabled.
      
      Setting additioal dependencies, ensures that the CPU still boots on a
      modern CPU. Otherwise, dropping FPU will leave FXSR enabled causing the
      kernel to crash early in fpu__init_system_mxcsr().
      
      With XSAVE support it will crash in fpu__init_cpu_xstate(). The problem is
      that xsetbv() with XMM set and SSE cleared is not allowed.  That means
      XSAVE has to be disabled. The XSAVE support is disabled in
      fpu__init_system_xstate_size_legacy() but it is too late. It can be
      removed, it has been added in commit
      
        1f999ab5 ("x86, xsave: Disable xsave in i387 emulation mode")
      
      to use `no387' on a CPU with XSAVE support.
      
      All this happens before console output.
      
      After hat, the next possible crash is in RAID6 detect code because MMX
      remained enabled. With a 3DNOW enabled config it will explode in memcpy()
      for instance due to kernel_fpu_begin() but this is unconditionally enabled.
      
      This is enough to boot a Debian Wheezy on a 32bit qemu "host" CPU which
      supports everything up to XSAVES, AVX2 without 3DNOW. Later, Debian
      increased the minimum requirements to i686 which means it does not boot
      userland atleast due to CMOV.
      
      After masking the additional features it still keeps SSE4A and 3DNOW*
      enabled (if present on the host) but those are unused in the kernel.
      
      Restrict `no387' and `nofxsr' otions to 32bit only. Add dependencies for
      FPU, FXSR to additionaly mask CMOV, MMX, XSAVE if FXSR or FPU is cleared.
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190703083247.57kjrmlxkai3vpw3@linutronix.de
      9838e3bf
  12. 06 7月, 2019 1 次提交
  13. 05 7月, 2019 1 次提交