1. 24 8月, 2017 1 次提交
    • J
      KVM: SVM: Enable Virtual GIF feature · 640bd6e5
      Janakarajan Natarajan 提交于
      Enable the Virtual GIF feature. This is done by setting bit 25 at position
      60h in the vmcb.
      
      With this feature enabled, the processor uses bit 9 at position 60h as the
      virtual GIF when executing STGI/CLGI instructions.
      
      Since the execution of STGI by the L1 hypervisor does not cause a return to
      the outermost (L0) hypervisor, the enable_irq_window and enable_nmi_window
      are modified.
      
      The IRQ window will be opened even if GIF is not set, under the assumption
      that on resuming the L1 hypervisor the IRQ will be held pending until the
      processor executes the STGI instruction.
      
      For the NMI window, the STGI intercept is set. This will assist in opening
      the window only when GIF=1.
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      640bd6e5
  2. 18 8月, 2017 2 次提交
  3. 08 8月, 2017 2 次提交
  4. 07 8月, 2017 2 次提交
  5. 02 8月, 2017 1 次提交
    • P
      KVM: nVMX: fixes to nested virt interrupt injection · b96fb439
      Paolo Bonzini 提交于
      There are three issues in nested_vmx_check_exception:
      
      1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported
      by Wanpeng Li;
      
      2) it should rebuild the interruption info and exit qualification fields
      from scratch, as reported by Jim Mattson, because the values from the
      L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes
      a page fault, the EPT misconfig's exit qualification is incorrect).
      
      3) CR2 and DR6 should not be written for exception intercept vmexits
      (CR2 only for AMD).
      
      This patch fixes the first two and adds a comment about the last,
      outlining the fix.
      
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b96fb439
  6. 14 7月, 2017 3 次提交
  7. 13 7月, 2017 4 次提交
  8. 30 6月, 2017 1 次提交
  9. 27 6月, 2017 5 次提交
  10. 01 6月, 2017 2 次提交
  11. 30 5月, 2017 1 次提交
  12. 18 5月, 2017 1 次提交
  13. 21 4月, 2017 1 次提交
    • M
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin 提交于
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      context.
      
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: N"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      668fffa3
  14. 07 4月, 2017 1 次提交
    • B
      kvm/svm: Setup MCG_CAP on AMD properly · 74f16909
      Borislav Petkov 提交于
      MCG_CAP[63:9] bits are reserved on AMD. However, on an AMD guest, this
      MSR returns 0x100010a. More specifically, bit 24 is set, which is simply
      wrong. That bit is MCG_SER_P and is present only on Intel. Thus, clean
      up the reserved bits in order not to confuse guests.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      74f16909
  15. 20 3月, 2017 1 次提交
    • D
      kvm: fix usage of uninit spinlock in avic_vm_destroy() · 3863dff0
      Dmitry Vyukov 提交于
      If avic is not enabled, avic_vm_init() does nothing and returns early.
      However, avic_vm_destroy() still tries to destroy what hasn't been created.
      The only bad consequence of this now is that avic_vm_destroy() uses
      svm_vm_data_hash_lock that hasn't been initialized (and is not meant
      to be used at all if avic is not enabled).
      
      Return early from avic_vm_destroy() if avic is not enabled.
      It has nothing to destroy.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: syzkaller@googlegroups.com
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      3863dff0
  16. 16 3月, 2017 1 次提交
    • T
      x86: Make the GDT remapping read-only on 64-bit · 45fc8757
      Thomas Garnier 提交于
      This patch makes the GDT remapped pages read-only, to prevent accidental
      (or intentional) corruption of this key data structure.
      
      This change is done only on 64-bit, because 32-bit needs it to be writable
      for TSS switches.
      
      The native_load_tr_desc function was adapted to correctly handle a
      read-only GDT. The LTR instruction always writes to the GDT TSS entry.
      This generates a page fault if the GDT is read-only. This change checks
      if the current GDT is a remap and swap GDTs as needed. This function was
      tested by booting multiple machines and checking hibernation works
      properly.
      
      KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the
      per-cpu variable was removed for functions to fetch the original GDT.
      Instead of reloading the previous GDT, VMX will reload the fixmap GDT as
      expected. For testing, VMs were started and restored on multiple
      configurations.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45fc8757
  17. 17 2月, 2017 1 次提交
  18. 15 2月, 2017 2 次提交
  19. 09 1月, 2017 1 次提交
  20. 08 12月, 2016 2 次提交
    • K
      KVM: x86: Add kvm_skip_emulated_instruction and use it. · 6affcbed
      Kyle Huey 提交于
      kvm_skip_emulated_instruction calls both
      kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
      skipping the emulated instruction and generating a trap if necessary.
      
      Replacing skip_emulated_instruction calls with
      kvm_skip_emulated_instruction is straightforward, except for:
      
      - ICEBP, which is already inside a trap, so avoid triggering another trap.
      - Instructions that can trigger exits to userspace, such as the IO insns,
        MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
        KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
        IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
        take precedence. The singlestep will be triggered again on the next
        instruction, which is the current behavior.
      - Task switch instructions which would require additional handling (e.g.
        the task switch bit) and are instead left alone.
      - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
        which do not trigger singlestep traps as mentioned previously.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6affcbed
    • K
      KVM: x86: Add a return value to kvm_emulate_cpuid · 6a908b62
      Kyle Huey 提交于
      Once skipping the emulated instruction can potentially trigger an exit to
      userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to
      propagate a return value.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6a908b62
  21. 25 11月, 2016 2 次提交
    • T
      kvm: svm: Add kvm_fast_pio_in support · 8370c3d0
      Tom Lendacky 提交于
      Update the I/O interception support to add the kvm_fast_pio_in function
      to speed up the in instruction similar to the out instruction.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      8370c3d0
    • T
      kvm: svm: Add support for additional SVM NPF error codes · 14727754
      Tom Lendacky 提交于
      AMD hardware adds two additional bits to aid in nested page fault handling.
      
      Bit 32 - NPF occurred while translating the guest's final physical address
      Bit 33 - NPF occurred while translating the guest page tables
      
      The guest page tables fault indicator can be used as an aid for nested
      virtualization. Using V0 for the host, V1 for the first level guest and
      V2 for the second level guest, when both V1 and V2 are using nested paging
      there are currently a number of unnecessary instruction emulations. When
      V2 is launched shadow paging is used in V1 for the nested tables of V2. As
      a result, KVM marks these pages as RO in the host nested page tables. When
      V2 exits and we resume V1, these pages are still marked RO.
      
      Every nested walk for a guest page table is treated as a user-level write
      access and this causes a lot of NPFs because the V1 page tables are marked
      RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
      sees a write to a read-only page, emulates the V1 instruction and unprotects
      the page (marking it RW). This patch looks for cases where we get a NPF due
      to a guest page table walk where the page was marked RO. It immediately
      unprotects the page and resumes the guest, leading to far fewer instruction
      emulations when nested virtualization is used.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      14727754
  22. 03 11月, 2016 1 次提交
    • P
      KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK · ea26e4ec
      Paolo Bonzini 提交于
      Since commit a545ab6a ("kvm: x86: add tsc_offset field to struct
      kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is
      cached and need not be fished out of the VMCS or VMCB.  This means
      that we can implement adjust_tsc_offset_guest and read_l1_tsc
      entirely in generic code.  The simplification is particularly
      significant for VMX code, where vmx->nested.vmcs01_tsc_offset
      was duplicating what is now in vcpu->arch.tsc_offset.  Therefore
      the vmcs01_tsc_offset can be dropped completely.
      
      More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK
      which, after commit 108b249c ("KVM: x86: introduce get_kvmclock_ns",
      2016-09-01) called read_l1_tsc while the VMCS was not loaded.
      It thus returned bogus values on Intel CPUs.
      
      Fixes: 108b249cReported-by: NRoman Kagan <rkagan@virtuozzo.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ea26e4ec
  23. 20 9月, 2016 1 次提交
  24. 16 9月, 2016 1 次提交