1. 13 7月, 2017 3 次提交
  2. 10 7月, 2017 1 次提交
  3. 04 7月, 2017 1 次提交
  4. 03 7月, 2017 2 次提交
    • P
      x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 · 995f00a6
      Peter Feiner 提交于
      EPT A/D was enabled in the vmcs02 EPTP regardless of the vmcs12's EPTP
      value. The problem is that enabling A/D changes the behavior of L2's
      x86 page table walks as seen by L1. With A/D enabled, x86 page table
      walks are always treated as EPT writes.
      
      Commit ae1e2d10 ("kvm: nVMX: support EPT accessed/dirty bits",
      2017-03-30) tried to work around this problem by clearing the write
      bit in the exit qualification for EPT violations triggered by page
      walks.  However, that fixup introduced the opposite bug: page-table walks
      that actually set x86 A/D bits were *missing* the write bit in the exit
      qualification.
      
      This patch fixes the problem by disabling EPT A/D in the shadow MMU
      when EPT A/D is disabled in vmcs12's EPTP.
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      995f00a6
    • P
      x86: kvm: mmu: make spte mmio mask more explicit · dcdca5fe
      Peter Feiner 提交于
      Specify both a mask (i.e., bits to consider) and a value (i.e.,
      pattern of bits that indicates a special PTE) for mmio SPTEs. On
      Intel, this lets us pack even more information into the
      (SPTE_SPECIAL_MASK | EPT_VMX_RWX_MASK) mask we use for access
      tracking liberating all (SPTE_SPECIAL_MASK | (non-misconfigured-RWX))
      values.
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dcdca5fe
  5. 30 6月, 2017 2 次提交
    • J
      objtool, x86: Add several functions and files to the objtool whitelist · c207aee4
      Josh Poimboeuf 提交于
      In preparation for an objtool rewrite which will have broader checks,
      whitelist functions and files which cause problems because they do
      unusual things with the stack.
      
      These whitelists serve as a TODO list for which functions and files
      don't yet have undwarf unwinder coverage.  Eventually most of the
      whitelists can be removed in favor of manual CFI hint annotations or
      objtool improvements.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c207aee4
    • W
      KVM: LAPIC: Fix lapic timer injection delay · c8533544
      Wanpeng Li 提交于
      If the TSC deadline timer is programmed really close to the deadline or
      even in the past, the computation in vmx_set_hv_timer will program the
      absolute target tsc value to vmcs preemption timer field w/ delta == 0,
      then plays a vmentry and an upcoming vmx preemption timer fire vmexit
      dance, the lapic timer injection is delayed due to this duration. Actually
      the lapic timer which is emulated by hrtimer can handle this correctly.
      
      This patch fixes it by firing the lapic timer and injecting a timer interrupt
      immediately during the next vmentry if the TSC deadline timer is programmed
      really close to the deadline or even in the past. This saves ~300 cycles on
      the tsc_deadline_timer test of apic.flat.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c8533544
  6. 29 6月, 2017 1 次提交
  7. 13 6月, 2017 1 次提交
  8. 07 6月, 2017 5 次提交
  9. 06 6月, 2017 1 次提交
    • W
      KVM: nVMX: Fix exception injection · d4912215
      Wanpeng Li 提交于
       WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G           OE   4.12.0-rc3+ #23
       RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       Call Trace:
        ? kvm_check_async_pf_completion+0xef/0x120 [kvm]
        ? rcu_read_lock_sched_held+0x79/0x80
        vmx_queue_exception+0x104/0x160 [kvm_intel]
        ? vmx_queue_exception+0x104/0x160 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x240 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x240 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This is triggered occasionally by running both win7 and win2016 in L2, in
      addition, EPT is disabled on both L1 and L2. It can't be reproduced easily.
      
      Commit 0b6ac343 (KVM: nVMX: Correct handling of exception injection) mentioned
      that "KVM wants to inject page-faults which it got to the guest. This function
      assumes it is called with the exit reason in vmcs02 being a #PF exception".
      Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to
      L2) allows to check all exceptions for intercept during delivery to L2. However,
      there is no guarantee the exit reason is exception currently, when there is an
      external interrupt occurred on host, maybe a time interrupt for host which should
      not be injected to guest, and somewhere queues an exception, then the function
      nested_vmx_check_exception() will be called and the vmexit emulation codes will
      try to emulate the "Acknowledge interrupt on exit" behavior, the warning is
      triggered.
      
      Reusing the exit reason from the L2->L0 vmexit is wrong in this case,
      the reason must always be EXCEPTION_NMI when injecting an exception into
      L1 as a nested vmexit.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Fixes: e011c663 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d4912215
  10. 05 6月, 2017 1 次提交
    • A
      x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant · d6e41f11
      Andy Lutomirski 提交于
      When PCID is enabled, CR3's PCID bits can change during context
      switches, so KVM won't be able to treat CR3 as a per-mm constant any
      more.
      
      I structured this like the existing CR4 handling.  Under ordinary
      circumstances (PCID disabled or if the current PCID and the value
      that's already in the VMCS match), then we won't do an extra VMCS
      write, and we'll never do an extra direct CR3 read.  The overhead
      should be minimal.
      
      I disallowed using the new helper in non-atomic context because
      PCID support will cause CR3 to stop being constant in non-atomic
      process context.
      
      (Frankly, it also scares me a bit that KVM ever treated CR3 as
      constant, but it looks like it was okay before.)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d6e41f11
  11. 01 6月, 2017 1 次提交
  12. 30 5月, 2017 1 次提交
  13. 26 5月, 2017 1 次提交
    • J
      KVM: nVMX: Fix handling of lmsw instruction · e1d39b17
      Jan H. Schönherr 提交于
      The decision whether or not to exit from L2 to L1 on an lmsw instruction is
      based on bogus values: instead of using the information encoded within the
      exit qualification, it uses the data also used for the mov-to-cr
      instruction, which boils down to using whatever is in %eax at that point.
      
      Use the correct values instead.
      
      Without this fix, an L1 may not get notified when a 32-bit Linux L2
      switches its secondary CPUs to protected mode; the L1 is only notified on
      the next modification of CR0. This short time window poses a problem, when
      there is some other reason to exit to L1 in between. Then, L2 will be
      resumed in real mode and chaos ensues.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e1d39b17
  14. 15 5月, 2017 2 次提交
  15. 09 5月, 2017 3 次提交
  16. 05 5月, 2017 1 次提交
  17. 27 4月, 2017 2 次提交
  18. 21 4月, 2017 2 次提交
    • M
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin 提交于
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      context.
      
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: N"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      668fffa3
    • D
      KVM: VMX: drop vmm_exclusive module parameter · fe0e80be
      David Hildenbrand 提交于
      vmm_exclusive=0 leads to KVM setting X86_CR4_VMXE always and calling
      VMXON only when the vcpu is loaded. X86_CR4_VMXE is used as an
      indication in cpu_emergency_vmxoff() (called on kdump) if VMXOFF has to be
      called. This is obviously not the case if both are used independtly.
      Calling VMXOFF without a previous VMXON will result in an exception.
      
      In addition, X86_CR4_VMXE is used as a mean to test if VMX is already in
      use by another VMM in hardware_enable(). So there can't really be
      co-existance. If the other VMM is prepared for co-existance and does a
      similar check, only one VMM can exist. If the other VMM is not prepared
      and blindly sets/clears X86_CR4_VMXE, we will get inconsistencies with
      X86_CR4_VMXE.
      
      As we also had bug reports related to clearing of vmcs with vmm_exclusive=0
      this seems to be pretty much untested. So let's better drop it.
      
      While at it, directly move setting/clearing X86_CR4_VMXE into
      kvm_cpu_vmxon/off.
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe0e80be
  19. 14 4月, 2017 1 次提交
    • R
      KVM: nVMX: fix AD condition when handling EPT violation · 33251870
      Radim Krčmář 提交于
      I have introduced this bug when applying and simplifying Paolo's patch
      as we agreed on the list.  The original was "x &= ~y; if (z) x |= y;".
      
      Here is the story of a bad workflow:
      
        A maintainer was already testing with the intended change, but it was
        applied only to a testing repo on a different machine.  When the time
        to push tested patches to kvm/next came, he realized that this change
        was missing and quickly added it to the maintenance repo, didn't test
        again (because the change is trivial, right), and pushed the world to
        fire.
      
      Fixes: ae1e2d10 ("kvm: nVMX: support EPT accessed/dirty bits")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      33251870
  20. 07 4月, 2017 7 次提交
  21. 04 4月, 2017 1 次提交