1. 17 3月, 2018 6 次提交
    • V
      x86/kvm/hyper-v: add reenlightenment MSRs support · a2e164e7
      Vitaly Kuznetsov 提交于
      Nested Hyper-V/Windows guest running on top of KVM will use TSC page
      clocksource in two cases:
      - L0 exposes invariant TSC (CPUID.80000007H:EDX[8]).
      - L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]).
      
      Exposing invariant TSC effectively blocks migration to hosts with different
      TSC frequencies, providing reenlightenment support will be needed when we
      start migrating nested workloads.
      
      Implement rudimentary support for reenlightenment MSRs. For now, these are
      just read/write MSRs with no effect.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2e164e7
    • K
      KVM: x86: Update the exit_qualification access bits while walking an address · ddd6f0e9
      KarimAllah Ahmed 提交于
      ... to avoid having a stale value when handling an EPT misconfig for MMIO
      regions.
      
      MMIO regions that are not passed-through to the guest are handled through
      EPT misconfigs. The first time a certain MMIO page is touched it causes an
      EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
      instead. Any subsequent accesses to the entry will generate an EPT
      misconfig.
      
      Things gets slightly complicated with nested guest handling for MMIO
      regions that are not passed through from L0 (i.e. emulated by L0
      user-space).
      
      An EPT violation for one of these MMIO regions from L2, exits to L0
      hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
      realize it is not present (or not sufficient to serve the request). Then L0
      injects an EPT violation to L1. L1 would then update its EPT mappings. The
      EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
      in "struct vcpu". The problem is that this variable is only updated on EPT
      violation and not on EPT misconfig. So if an EPT violation because of a
      read happened first, then an EPT misconfig because of a write happened
      afterwards. The L0 hypervisor will still contain exit_qualification value
      from the previous read instead of the write and end up injecting an EPT
      violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
      
      The EPT violation that is injected from L0 to L1 needs to have the correct
      EXIT_QUALIFICATION specially for the access bits because the individual
      access bits for MMIO EPTs are updated only on actual access of this
      specific type. So for the example above, the L1 hypervisor will keep
      updating only the read bit in the EPT then resume the L2 guest. The L2
      guest would end up causing another exit where the L0 *again* will inject
      another EPT violation to L1 hypervisor with *again* an out of date
      exit_qualification which indicates a read and not a write. Then this
      ping-pong just keeps happening without making any forward progress.
      
      The behavior of mapping MMIO regions changed in:
      
         commit a340b3e2 ("kvm: Map PFN-type memory regions as writable (if possible)")
      
      ... where an EPT violation for a read would also fixup the write bits to
      avoid another EPT violation which by acciddent would fix the bug mentioned
      above.
      
      This commit fixes this situation and ensures that the access bits for the
      exit_qualifcation is up to date. That ensures that even L1 hypervisor
      running with a KVM version before the commit mentioned above would still
      work.
      
      ( The description above assumes EPT to be available and used by L1
        hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
        guest while this MMIO region is emulated by the L0 user-space ).
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ddd6f0e9
    • M
      KVM: x86: Make enum conversion explicit in kvm_pdptr_read() · 1df372f4
      Matthias Kaehlcke 提交于
      The type 'enum kvm_reg_ex' is an extension of 'enum kvm_reg', however
      the extension is only semantical and the compiler doesn't know about the
      relationship between the two types. In kvm_pdptr_read() a value of the
      extended type is passed to kvm_x86_ops->cache_reg(), which expects a
      value of the base type. Clang raises the following warning about the
      type mismatch:
      
      arch/x86/kvm/kvm_cache_regs.h:44:32: warning: implicit conversion from
        enumeration type 'enum kvm_reg_ex' to different enumeration type
        'enum kvm_reg' [-Wenum-conversion]
          kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
      
      Cast VCPU_EXREG_PDPTR to 'enum kvm_reg' to make the compiler happy.
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Reviewed-by: NGuenter Roeck <groeck@chromium.org>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      1df372f4
    • V
      KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use · 0bcc3fb9
      Vitaly Kuznetsov 提交于
      Devices which use level-triggered interrupts under Windows 2016 with
      Hyper-V role enabled don't work: Windows disables EOI broadcast in SPIV
      unconditionally. Our in-kernel IOAPIC implementation emulates an old IOAPIC
      version which has no EOI register so EOI never happens.
      
      The issue was discovered and discussed a while ago:
      https://www.spinics.net/lists/kvm/msg148098.html
      
      While this is a guest OS bug (it should check that IOAPIC has the required
      capabilities before disabling EOI broadcast) we can workaround it in KVM:
      advertising DIRECTED_EOI with in-kernel IOAPIC makes little sense anyway.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      0bcc3fb9
    • J
      KVM: x86: Add support for AMD Core Perf Extension in guest · c51eb52b
      Janakarajan Natarajan 提交于
      Add support for AMD Core Performance counters in the guest. The base
      event select and counter MSRs are changed. In addition, with the core
      extension, there are 2 extra counters available for performance
      measurements for a total of 6.
      
      With the new MSRs, the logic to map them to the gp_counters[] is changed.
      New functions are added to check the validity of the get/set MSRs.
      
      If the guest has the X86_FEATURE_PERFCTR_CORE cpuid flag set, the number
      of counters available to the vcpu is set to 6. It the flag is not set
      then it is 4.
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      [Squashed "Expose AMD Core Perf Extension flag to guests" - Radim.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c51eb52b
    • J
      x86/msr: Add AMD Core Perf Extension MSRs · e84b7119
      Janakarajan Natarajan 提交于
      Add the EventSelect and Counter MSRs for AMD Core Perf Extension.
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      e84b7119
  2. 08 3月, 2018 1 次提交
  3. 07 3月, 2018 8 次提交
  4. 02 3月, 2018 11 次提交
    • H
      parisc: Reduce irq overhead when run in qemu · 636a415b
      Helge Deller 提交于
      When run under QEMU, calling mfctl(16) creates some overhead because the
      qemu timer has to be scaled and moved into the register. This patch
      reduces the number of calls to mfctl(16) by moving the calls out of the
      loops.
      
      Additionally, increase the minimal time interval to 8000 cycles instead
      of 500 to compensate possible QEMU delays when delivering interrupts.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 4.14+
      636a415b
    • H
      parisc: Use cr16 interval timers unconditionally on qemu · 5ffa8518
      Helge Deller 提交于
      When running on qemu we know that the (emulated) cr16 cpu-internal
      clocks are syncronized. So let's use them unconditionally on qemu.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 4.14+
      5ffa8518
    • H
      parisc: Check if secondary CPUs want own PDC calls · 0ed1fe4a
      Helge Deller 提交于
      The architecture specification says (for 64-bit systems): PDC is a per
      processor resource, and operating system software must be prepared to
      manage separate pointers to PDCE_PROC for each processor.  The address
      of PDCE_PROC for the monarch processor is stored in the Page Zero
      location MEM_PDC. The address of PDCE_PROC for each non-monarch
      processor is passed in gr26 when PDCE_RESET invokes OS_RENDEZ.
      
      Currently we still use one PDC for all CPUs, but in case we face a
      machine which is following the specification let's warn about it.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      0ed1fe4a
    • H
      parisc: Hide virtual kernel memory layout · fd8d0ca2
      Helge Deller 提交于
      For security reasons do not expose the virtual kernel memory layout to
      userspace.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Suggested-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org # 4.15
      Reviewed-by: NKees Cook <keescook@chromium.org>
      fd8d0ca2
    • J
      parisc: Fix ordering of cache and TLB flushes · 0adb24e0
      John David Anglin 提交于
      The change to flush_kernel_vmap_range() wasn't sufficient to avoid the
      SMP stalls.  The problem is some drivers call these routines with
      interrupts disabled.  Interrupts need to be enabled for flush_tlb_all()
      and flush_cache_all() to work.  This version adds checks to ensure
      interrupts are not disabled before calling routines that need IPI
      interrupts.  When interrupts are disabled, we now drop into slower code.
      
      The attached change fixes the ordering of cache and TLB flushes in
      several cases.  When we flush the cache using the existing PTE/TLB
      entries, we need to flush the TLB after doing the cache flush.  We don't
      need to do this when we flush the entire instruction and data caches as
      these flushes don't use the existing TLB entries.  The same is true for
      tmpalias region flushes.
      
      The flush_kernel_vmap_range() and invalidate_kernel_vmap_range()
      routines have been updated.
      
      Secondly, we added a new purge_kernel_dcache_range_asm() routine to
      pacache.S and use it in invalidate_kernel_vmap_range().  Nominally,
      purges are faster than flushes as the cache lines don't have to be
      written back to memory.
      
      Hopefully, this is sufficient to resolve the remaining problems due to
      cache speculation.  So far, testing indicates that this is the case.  I
      did work up a patch using tmpalias flushes, but there is a performance
      hit because we need the physical address for each page, and we also need
      to sequence access to the tmpalias flush code.  This increases the
      probability of stalls.
      
      Signed-off-by: John David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      0adb24e0
    • M
      sh: fix build error for empty CONFIG_BUILTIN_DTB_SOURCE · 1b1e4ee8
      Masahiro Yamada 提交于
      If CONFIG_USE_BUILTIN_DTB is enabled, but CONFIG_BUILTIN_DTB_SOURCE
      is empty (for example, allmodconfig), it fails to build, like this:
      
        make[2]: *** No rule to make target 'arch/sh/boot/dts/.dtb.o',
        needed by 'arch/sh/boot/dts/built-in.o'.  Stop.
      
      Surround obj-y with ifneq ... endif.
      
      I replaced $(CONFIG_USE_BUILTIN_DTB) with 'y' since this is always
      the case from the following code from arch/sh/Makefile:
      
        core-$(CONFIG_USE_BUILTIN_DTB)  += arch/sh/boot/dts/
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      1b1e4ee8
    • R
      KVM: x86: fix vcpu initialization with userspace lapic · b7e31be3
      Radim Krčmář 提交于
      Moving the code around broke this rare configuration.
      Use this opportunity to finally call lapic reset from vcpu reset.
      
      Reported-by: syzbot+fb7a33a4b6c35007a72b@syzkaller.appspotmail.com
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Fixes: 0b2e9904 ("KVM: x86: move LAPIC initialization after VMCS creation")
      Cc: stable@vger.kernel.org
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b7e31be3
    • W
      KVM: X86: Allow userspace to define the microcode version · 518e7b94
      Wanpeng Li 提交于
      Linux (among the others) has checks to make sure that certain features
      aren't enabled on a certain family/model/stepping if the microcode version
      isn't greater than or equal to a known good version.
      
      By exposing the real microcode version, we're preventing buggy guests that
      don't check that they are running virtualized (i.e., they should trust the
      hypervisor) from disabling features that are effectively not buggy.
      Suggested-by: NFilippo Sironi <sironi@amazon.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      518e7b94
    • W
      KVM: X86: Introduce kvm_get_msr_feature() · 66421c1e
      Wanpeng Li 提交于
      Introduce kvm_get_msr_feature() to handle the msrs which are supported
      by different vendors and sharing the same emulation logic.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      66421c1e
    • T
      KVM: SVM: Add MSR-based feature support for serializing LFENCE · d1d93fa9
      Tom Lendacky 提交于
      In order to determine if LFENCE is a serializing instruction on AMD
      processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
      of bit 1 checked.  This patch will add support to allow a guest to
      properly make this determination.
      
      Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
      to the list of MSR-based features.  If LFENCE is serializing, then the
      feature is supported, allowing the hypervisor to set the value of the
      MSR that guest will see.  Support is also added to write (hypervisor only)
      and read the MSR value for the guest.  A write by the guest will result in
      a #GP.  A read by the guest will return the value as set by the host.  In
      this way, the support to expose the feature to the guest is controlled by
      the hypervisor.
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d1d93fa9
    • T
      KVM: x86: Add a framework for supporting MSR-based features · 801e459a
      Tom Lendacky 提交于
      Provide a new KVM capability that allows bits within MSRs to be recognized
      as features.  Two new ioctls are added to the /dev/kvm ioctl routine to
      retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
      callback is used to determine support for the listed MSR-based features.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Tweaked documentation. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      801e459a
  5. 01 3月, 2018 7 次提交
  6. 28 2月, 2018 7 次提交