1. 12 10月, 2017 1 次提交
    • L
      KVM: x86: introduce ISA specific SMM entry/exit callbacks · 0234bf88
      Ladi Prosek 提交于
      Entering and exiting SMM may require ISA specific handling under certain
      circumstances. This commit adds two new callbacks with empty implementations.
      Actual functionality will be added in following commits.
      
      * pre_enter_smm() is to be called when injecting an SMM, before any
        SMM related vcpu state has been changed
      * pre_leave_smm() is to be called when emulating the RSM instruction,
        when the vcpu is in real mode and before any SMM related vcpu state
        has been restored
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0234bf88
  2. 05 10月, 2017 1 次提交
  3. 23 9月, 2017 1 次提交
    • J
      x86/asm: Fix inline asm call constraints for Clang · f5caf621
      Josh Poimboeuf 提交于
      For inline asm statements which have a CALL instruction, we list the
      stack pointer as a constraint to convince GCC to ensure the frame
      pointer is set up first:
      
        static inline void foo()
        {
      	register void *__sp asm(_ASM_SP);
      	asm("call bar" : "+r" (__sp))
        }
      
      Unfortunately, that pattern causes Clang to corrupt the stack pointer.
      
      The fix is easy: convert the stack pointer register variable to a global
      variable.
      
      It should be noted that the end result is different based on the GCC
      version.  With GCC 6.4, this patch has exactly the same result as
      before:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9820389		9491555		8816046		8516940
       after	9820389		9491555		8816046		8516940
      
      With GCC 7.2, however, GCC's behavior has changed.  It now changes its
      behavior based on the conversion of the register variable to a global.
      That somehow convinces it to *always* set up the frame pointer before
      inserting *any* inline asm.  (Therefore, listing the variable as an
      output constraint is a no-op and is no longer necessary.)  It's a bit
      overkill, but the performance impact should be negligible.  And in fact,
      there's a nice improvement with frame pointers disabled:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9796316		9468236		9076191		8790305
       after	9796957		9464267		9076381		8785949
      
      So in summary, while listing the stack pointer as an output constraint
      is no longer necessary for newer versions of GCC, it's still needed for
      older versions.
      Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f5caf621
  4. 19 9月, 2017 1 次提交
  5. 25 8月, 2017 3 次提交
  6. 30 6月, 2017 1 次提交
  7. 22 6月, 2017 1 次提交
    • P
      KVM: x86: fix singlestepping over syscall · c8401dda
      Paolo Bonzini 提交于
      TF is handled a bit differently for syscall and sysret, compared
      to the other instructions: TF is checked after the instruction completes,
      so that the OS can disable #DB at a syscall by adding TF to FMASK.
      When the sysret is executed the #DB is taken "as if" the syscall insn
      just completed.
      
      KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
      Fix the behavior, otherwise you could get #DB on a user stack which is not
      nice.  This does not affect Linux guests, as they use an IST or task gate
      for #DB.
      
      This fixes CVE-2017-7518.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c8401dda
  8. 01 6月, 2017 1 次提交
    • N
      KVM: x86: avoid large stack allocations in em_fxrstor · 9d643f63
      Nick Desaulniers 提交于
      em_fxstor previously called fxstor_fixup.  Both created instances of
      struct fxregs_state on the stack, which triggered the warning:
      
      arch/x86/kvm/emulate.c:4018:12: warning: stack frame size of 1080 bytes
      in function
            'em_fxrstor' [-Wframe-larger-than=]
      static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
                 ^
      with CONFIG_FRAME_WARN set to 1024.
      
      This patch does the fixup in em_fxstor now, avoiding one additional
      struct fxregs_state, and now fxstor_fixup can be removed as it has no
      other call sites.
      
      Further, the calculation for offsets into xmm_space can be shared
      between em_fxstor and em_fxsave.
      Signed-off-by: NNick Desaulniers <nick.desaulniers@gmail.com>
      [Clean up calculation of offsets and fix it for 64-bit mode. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9d643f63
  9. 20 5月, 2017 1 次提交
  10. 27 4月, 2017 1 次提交
    • L
      KVM: x86: fix emulation of RSM and IRET instructions · 6ed071f0
      Ladi Prosek 提交于
      On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm
      on hflags is reverted later on in x86_emulate_instruction where hflags are
      overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests
      as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu.
      
      Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after
      an instruction is emulated, this commit deletes emul_flags altogether and
      makes the emulator access vcpu->arch.hflags using two new accessors. This
      way all changes, on the emulator side as well as in functions called from
      the emulator and accessing vcpu state with emul_to_vcpu, are preserved.
      
      More details on the bug and its manifestation with Windows and OVMF:
      
        It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD.
        I believe that the SMM part explains why we started seeing this only with
        OVMF.
      
        KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates
        the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because
        later on in x86_emulate_instruction we overwrite arch.hflags with
        ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call.
        The AMD-specific hflag of interest here is HF_NMI_MASK.
      
        When rebooting the system, Windows sends an NMI IPI to all but the current
        cpu to shut them down. Only after all of them are parked in HLT will the
        initiating cpu finish the restart. If NMI is masked, other cpus never get
        the memo and the initiating cpu spins forever, waiting for
        hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe.
      
      Fixes: a584539b ("KVM: x86: pass the whole hflags field to emulator and back")
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ed071f0
  11. 21 4月, 2017 1 次提交
  12. 12 1月, 2017 2 次提交
  13. 09 1月, 2017 1 次提交
  14. 25 11月, 2016 1 次提交
    • R
      KVM: x86: drop error recovery in em_jmp_far and em_ret_far · 2117d539
      Radim Krčmář 提交于
      em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
      bit mode, but syzkaller proved otherwise (and SDM agrees).
      Code segment was restored upon failure, but it was left uninitialized
      outside of long mode, which could lead to a leak of host kernel stack.
      We could have fixed that by always saving and restoring the CS, but we
      take a simpler approach and just break any guest that manages to fail
      as the error recovery is error-prone and modern CPUs don't need emulator
      for this.
      
      Found by syzkaller:
      
        WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] panic+0x1b7/0x3a3 kernel/panic.c:179
         [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
         [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
         [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
         [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
         [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
         [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
         [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
         [...] complete_emulated_io arch/x86/kvm/x86.c:6870
         [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
         [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
         [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: d1442d85 ("KVM: x86: Handle errors when RIP is set during far jumps")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      2117d539
  15. 17 11月, 2016 4 次提交
  16. 03 11月, 2016 1 次提交
  17. 14 7月, 2016 1 次提交
    • P
      x86/kvm: Audit and remove any unnecessary uses of module.h · 1767e931
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  In the case of
      kvm where it is modular, we can extend that to also include files
      that are building basic support functionality but not related
      to loading or registering the final module; such files also have
      no need whatsoever for module.h
      
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each instance for the
      presence of either and replace as needed.
      
      Several instances got replaced with moduleparam.h since that was
      really all that was required for those particular files.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160714001901.31603-8-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1767e931
  18. 11 5月, 2016 1 次提交
  19. 24 2月, 2016 3 次提交
    • P
      KVM: x86: fix conversion of addresses to linear in 32-bit protected mode · 0c1d77f4
      Paolo Bonzini 提交于
      Commit e8dd2d2d ("Silence compiler warning in arch/x86/kvm/emulate.c",
      2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
      case actually could modify "la", but after the patch this change is
      not reflected in *linear.
      
      The bug is visible whenever a non-zero segment base causes the linear
      address to wrap around the 4GB mark.
      
      Fixes: e8dd2d2d
      Cc: stable@vger.kernel.org
      Reported-by: NAurelien Jarno <aurelien@aurel32.net>
      Tested-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c1d77f4
    • J
      x86/kvm: Make test_cc() always inline · cb7390fe
      Josh Poimboeuf 提交于
      With some configs (including allyesconfig), gcc doesn't inline
      test_cc().  When that happens, test_cc() doesn't create a stack frame
      before inserting the inline asm call instruction.  This breaks frame
      pointer convention if CONFIG_FRAME_POINTER is enabled and can result in
      a bad stack trace.
      
      Force it to always be inlined so that its containing function's stack
      frame can be used.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160122161612.GE20502@treble.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb7390fe
    • J
      x86/kvm: Set ELF function type for fastop functions · 1482a082
      Josh Poimboeuf 提交于
      The callable functions created with the FOP* and FASTOP* macros are
      missing ELF function annotations, which confuses tools like stacktool.
      Properly annotate them.
      
      This adds some additional labels to the assembly, but the generated
      binary code is unchanged (with the exception of instructions which have
      embedded references to __LINE__).
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/e399651c89ace54906c203c0557f66ed6ea3ce8d.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1482a082
  20. 04 11月, 2015 2 次提交
  21. 14 10月, 2015 1 次提交
    • P
      KVM: x86: fix RSM into 64-bit protected mode · b10d92a5
      Paolo Bonzini 提交于
      In order to get into 64-bit protected mode, you need to enable
      paging while EFER.LMA=1.  For this to work, CS.L must be 0.
      Currently, we load the segments before CR0 and CR4, which means
      that if RSM returns into 64-bit protected mode CS.L is already 1
      and everything breaks.
      
      Luckily, CS.L=0 is always the case when executing RSM, because it
      is forbidden to execute RSM from 64-bit protected mode.  Hence it
      is enough to load CR0 and CR4 first, and only then the segments.
      
      Fixes: 660a5d51
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b10d92a5
  22. 06 9月, 2015 1 次提交
  23. 04 6月, 2015 3 次提交
  24. 20 5月, 2015 3 次提交
    • N
      KVM: x86: Fix zero iterations REP-string · 428e3d08
      Nadav Amit 提交于
      When a REP-string is executed in 64-bit mode with an address-size prefix,
      ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel
      CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits
      of the pointers in MOVS/STOS. This behavior is specific to Intel according to
      few experiments.
      
      As one may guess, this is an undocumented behavior. Yet, it is observable in
      the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that
      VMware appears to get it right.  The behavior can be observed using the
      following code:
      
       #include <stdio.h>
      
       #define LOW_MASK	(0xffffffff00000000ull)
       #define ALL_MASK	(0xffffffffffffffffull)
       #define TEST(opcode)							\
      	do {								\
      	asm volatile(".byte 0xf2 \n\t .byte 0x67 \n\t .byte " opcode "\n\t" \
      			: "=S"(s), "=c"(c), "=D"(d) 			\
      			: "S"(ALL_MASK), "c"(LOW_MASK), "D"(ALL_MASK));	\
      	printf("opcode %s rcx=%llx rsi=%llx rdi=%llx\n",		\
      		opcode, c, s, d);					\
      	} while(0)
      
      void main()
      {
      	unsigned long long s, d, c;
      	iopl(3);
      	TEST("0x6c");
      	TEST("0x6d");
      	TEST("0x6e");
      	TEST("0x6f");
      	TEST("0xa4");
      	TEST("0xa5");
      	TEST("0xa6");
      	TEST("0xa7");
      	TEST("0xaa");
      	TEST("0xab");
      	TEST("0xae");
      	TEST("0xaf");
      }
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      428e3d08
    • N
      KVM: x86: Fix update RCX/RDI/RSI on REP-string · ee122a71
      Nadav Amit 提交于
      When REP-string instruction is preceded with an address-size prefix,
      ECX/EDI/ESI are used as the operation counter and pointers.  When they are
      updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they
      are updated on every 32-bit register operation.  Fix it.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee122a71
    • N
      KVM: x86: Fix DR7 mask on task-switch while debugging · 3db176d5
      Nadav Amit 提交于
      If the host sets hardware breakpoints to debug the guest, and a task-switch
      occurs in the guest, the architectural DR7 will not be updated. The effective
      DR7 would be updated instead.
      
      This fix puts the DR7 update during task-switch emulation, so it now uses the
      standard DR setting mechanism instead of the one that was previously used. As a
      bonus, the update of DR7 will now be effective for AMD as well.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3db176d5
  25. 08 5月, 2015 1 次提交
  26. 08 4月, 2015 1 次提交
    • W
      kvm: x86: fix x86 eflags fixed bit · 35fd68a3
      Wanpeng Li 提交于
      Guest can't be booted w/ ept=0, there is a message dumped as below:
      
      If you're running a guest on an Intel machine without unrestricted mode
      support, the failure can be most likely due to the guest entering an invalid
      state for Intel VT. For example, the guest maybe running in big real mode
      which is not supported on less recent Intel processors.
      
      EAX=00000011 EBX=f000d2f6 ECX=00006cac EDX=000f8956
      ESI=bffbdf62 EDI=00000000 EBP=00006c68 ESP=00006c68
      EIP=0000d187 EFL=00000004 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
      ES =e000 000e0000 ffffffff 00809300 DPL=0 DS16 [-WA]
      CS =f000 000f0000 ffffffff 00809b00 DPL=0 CS16 [-RA]
      SS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
      DS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
      FS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
      GS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
      LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
      TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
      GDT=     000f6a80 00000037
      IDT=     000f6abe 00000000
      CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
      DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
      DR6=00000000ffff0ff0 DR7=0000000000000400
      EFER=0000000000000000
      Code=01 1e b8 6a 2e 0f 01 16 74 6a 0f 20 c0 66 83 c8 01 0f 22 c0 <66> ea 8f d1 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 b8X
      
      X86 eflags bit 1 is fixed set, which means that 1 << 1 is set instead of 1,
      this patch fix it.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Message-Id: <1428473294-6633-1-git-send-email-wanpeng.li@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      35fd68a3
  27. 30 3月, 2015 1 次提交