1. 04 6月, 2019 1 次提交
  2. 27 4月, 2019 1 次提交
    • S
      KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU · 18cf09a8
      Sean Christopherson 提交于
      commit 8f4dc2e77cdfaf7e644ef29693fa229db29ee1de upstream.
      
      Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
      state area, i.e. don't save/restore EFER across SMM transitions.  KVM
      somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
      guest doesn't support long mode.  But during RSM, KVM unconditionally
      clears EFER so that it can get back to pure 32-bit mode in order to
      start loading CRs with their actual non-SMM values.
      
      Clear EFER only when it will be written when loading the non-SMM state
      so as to preserve bits that can theoretically be set on 32-bit vCPUs,
      e.g. KVM always emulates EFER_SCE.
      
      And because CR4.PAE is cleared only to play nice with EFER, wrap that
      code in the long mode check as well.  Note, this may result in a
      compiler warning about cr4 being consumed uninitialized.  Re-read CR4
      even though it's technically unnecessary, as doing so allows for more
      readable code and RSM emulation is not a performance critical path.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18cf09a8
  3. 06 8月, 2018 1 次提交
  4. 12 6月, 2018 2 次提交
  5. 15 5月, 2018 1 次提交
  6. 04 4月, 2018 1 次提交
  7. 17 3月, 2018 2 次提交
  8. 25 1月, 2018 1 次提交
  9. 21 12月, 2017 1 次提交
  10. 14 12月, 2017 3 次提交
  11. 06 12月, 2017 1 次提交
  12. 17 11月, 2017 2 次提交
    • D
      KVM: x86: fix em_fxstor() sleeping while in atomic · 4d772cb8
      David Hildenbrand 提交于
      Commit 9d643f63 ("KVM: x86: avoid large stack allocations in
      em_fxrstor") optimize the stack size, but introduced a guest memory access
      which might sleep while in atomic.
      
      Fix it by introducing, again, a second fxregs_state. Try to avoid
      large stacks by using noinline. Add some helpful comments.
      
      Reported by syzbot:
      
      in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
      2 locks held by syzkaller879109/2909:
        #0:  (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] vcpu_load+0x1c/0x70
      arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
        #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
      arch/x86/kvm/x86.c:6983 [inline]
        #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
      arch/x86/kvm/x86.c:7061 [inline]
        #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>]
      kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
      CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:16 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:52
        ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
        __might_sleep+0x95/0x190 kernel/sched/core.c:5967
        __might_fault+0xab/0x1d0 mm/memory.c:4383
        __copy_from_user include/linux/uaccess.h:71 [inline]
        __kvm_read_guest_page+0x58/0xa0
      arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
        kvm_vcpu_read_guest_page+0x44/0x60
      arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
        kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
        kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
        segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
        em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
        x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
        x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
        kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
        handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
        vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
        vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
        vcpu_run arch/x86/kvm/x86.c:7061 [inline]
        kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
        kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
        vfs_ioctl fs/ioctl.c:45 [inline]
        do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
        SYSC_ioctl fs/ioctl.c:700 [inline]
        SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x437fc9
      RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9
      RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000
      R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000
      R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000
      
      Fixes: 9d643f63 ("KVM: x86: avoid large stack allocations in em_fxrstor")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      4d772cb8
    • W
      KVM: X86: Fix operand/address-size during instruction decoding · 3853be26
      Wanpeng Li 提交于
      Pedro reported:
        During tests that we conducted on KVM, we noticed that executing a "PUSH %ES"
        instruction under KVM produces different results on both memory and the SP
        register depending on whether EPT support is enabled. With EPT the SP is
        reduced by 4 bytes (and the written value is 0-padded) but without EPT support
        it is only reduced by 2 bytes. The difference can be observed when the CS.DB
        field is 1 (32-bit) but not when it's 0 (16-bit).
      
      The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D
      also should be respected instead of just default operand/address-size/66H
      prefix/67H prefix during instruction decoding. This patch fixes it by also
      adjusting operand/address-size according to CS.D.
      Reported-by: NPedro Fonseca <pfonseca@cs.washington.edu>
      Tested-by: NPedro Fonseca <pfonseca@cs.washington.edu>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Pedro Fonseca <pfonseca@cs.washington.edu>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      3853be26
  13. 12 10月, 2017 1 次提交
    • L
      KVM: x86: introduce ISA specific SMM entry/exit callbacks · 0234bf88
      Ladi Prosek 提交于
      Entering and exiting SMM may require ISA specific handling under certain
      circumstances. This commit adds two new callbacks with empty implementations.
      Actual functionality will be added in following commits.
      
      * pre_enter_smm() is to be called when injecting an SMM, before any
        SMM related vcpu state has been changed
      * pre_leave_smm() is to be called when emulating the RSM instruction,
        when the vcpu is in real mode and before any SMM related vcpu state
        has been restored
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0234bf88
  14. 05 10月, 2017 1 次提交
  15. 23 9月, 2017 1 次提交
    • J
      x86/asm: Fix inline asm call constraints for Clang · f5caf621
      Josh Poimboeuf 提交于
      For inline asm statements which have a CALL instruction, we list the
      stack pointer as a constraint to convince GCC to ensure the frame
      pointer is set up first:
      
        static inline void foo()
        {
      	register void *__sp asm(_ASM_SP);
      	asm("call bar" : "+r" (__sp))
        }
      
      Unfortunately, that pattern causes Clang to corrupt the stack pointer.
      
      The fix is easy: convert the stack pointer register variable to a global
      variable.
      
      It should be noted that the end result is different based on the GCC
      version.  With GCC 6.4, this patch has exactly the same result as
      before:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9820389		9491555		8816046		8516940
       after	9820389		9491555		8816046		8516940
      
      With GCC 7.2, however, GCC's behavior has changed.  It now changes its
      behavior based on the conversion of the register variable to a global.
      That somehow convinces it to *always* set up the frame pointer before
      inserting *any* inline asm.  (Therefore, listing the variable as an
      output constraint is a no-op and is no longer necessary.)  It's a bit
      overkill, but the performance impact should be negligible.  And in fact,
      there's a nice improvement with frame pointers disabled:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9796316		9468236		9076191		8790305
       after	9796957		9464267		9076381		8785949
      
      So in summary, while listing the stack pointer as an output constraint
      is no longer necessary for newer versions of GCC, it's still needed for
      older versions.
      Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f5caf621
  16. 19 9月, 2017 1 次提交
  17. 25 8月, 2017 3 次提交
  18. 30 6月, 2017 1 次提交
  19. 22 6月, 2017 1 次提交
    • P
      KVM: x86: fix singlestepping over syscall · c8401dda
      Paolo Bonzini 提交于
      TF is handled a bit differently for syscall and sysret, compared
      to the other instructions: TF is checked after the instruction completes,
      so that the OS can disable #DB at a syscall by adding TF to FMASK.
      When the sysret is executed the #DB is taken "as if" the syscall insn
      just completed.
      
      KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
      Fix the behavior, otherwise you could get #DB on a user stack which is not
      nice.  This does not affect Linux guests, as they use an IST or task gate
      for #DB.
      
      This fixes CVE-2017-7518.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c8401dda
  20. 01 6月, 2017 1 次提交
    • N
      KVM: x86: avoid large stack allocations in em_fxrstor · 9d643f63
      Nick Desaulniers 提交于
      em_fxstor previously called fxstor_fixup.  Both created instances of
      struct fxregs_state on the stack, which triggered the warning:
      
      arch/x86/kvm/emulate.c:4018:12: warning: stack frame size of 1080 bytes
      in function
            'em_fxrstor' [-Wframe-larger-than=]
      static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
                 ^
      with CONFIG_FRAME_WARN set to 1024.
      
      This patch does the fixup in em_fxstor now, avoiding one additional
      struct fxregs_state, and now fxstor_fixup can be removed as it has no
      other call sites.
      
      Further, the calculation for offsets into xmm_space can be shared
      between em_fxstor and em_fxsave.
      Signed-off-by: NNick Desaulniers <nick.desaulniers@gmail.com>
      [Clean up calculation of offsets and fix it for 64-bit mode. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9d643f63
  21. 20 5月, 2017 1 次提交
  22. 27 4月, 2017 1 次提交
    • L
      KVM: x86: fix emulation of RSM and IRET instructions · 6ed071f0
      Ladi Prosek 提交于
      On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm
      on hflags is reverted later on in x86_emulate_instruction where hflags are
      overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests
      as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu.
      
      Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after
      an instruction is emulated, this commit deletes emul_flags altogether and
      makes the emulator access vcpu->arch.hflags using two new accessors. This
      way all changes, on the emulator side as well as in functions called from
      the emulator and accessing vcpu state with emul_to_vcpu, are preserved.
      
      More details on the bug and its manifestation with Windows and OVMF:
      
        It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD.
        I believe that the SMM part explains why we started seeing this only with
        OVMF.
      
        KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates
        the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because
        later on in x86_emulate_instruction we overwrite arch.hflags with
        ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call.
        The AMD-specific hflag of interest here is HF_NMI_MASK.
      
        When rebooting the system, Windows sends an NMI IPI to all but the current
        cpu to shut them down. Only after all of them are parked in HLT will the
        initiating cpu finish the restart. If NMI is masked, other cpus never get
        the memo and the initiating cpu spins forever, waiting for
        hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe.
      
      Fixes: a584539b ("KVM: x86: pass the whole hflags field to emulator and back")
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ed071f0
  23. 21 4月, 2017 1 次提交
  24. 12 1月, 2017 2 次提交
  25. 09 1月, 2017 1 次提交
  26. 25 11月, 2016 1 次提交
    • R
      KVM: x86: drop error recovery in em_jmp_far and em_ret_far · 2117d539
      Radim Krčmář 提交于
      em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
      bit mode, but syzkaller proved otherwise (and SDM agrees).
      Code segment was restored upon failure, but it was left uninitialized
      outside of long mode, which could lead to a leak of host kernel stack.
      We could have fixed that by always saving and restoring the CS, but we
      take a simpler approach and just break any guest that manages to fail
      as the error recovery is error-prone and modern CPUs don't need emulator
      for this.
      
      Found by syzkaller:
      
        WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] panic+0x1b7/0x3a3 kernel/panic.c:179
         [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
         [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
         [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
         [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
         [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
         [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
         [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
         [...] complete_emulated_io arch/x86/kvm/x86.c:6870
         [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
         [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
         [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: d1442d85 ("KVM: x86: Handle errors when RIP is set during far jumps")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      2117d539
  27. 17 11月, 2016 4 次提交
  28. 03 11月, 2016 1 次提交
  29. 14 7月, 2016 1 次提交
    • P
      x86/kvm: Audit and remove any unnecessary uses of module.h · 1767e931
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  In the case of
      kvm where it is modular, we can extend that to also include files
      that are building basic support functionality but not related
      to loading or registering the final module; such files also have
      no need whatsoever for module.h
      
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each instance for the
      presence of either and replace as needed.
      
      Several instances got replaced with moduleparam.h since that was
      really all that was required for those particular files.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160714001901.31603-8-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1767e931