1. 09 2月, 2021 1 次提交
  2. 04 2月, 2021 2 次提交
    • J
      KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls · 79033beb
      Joao Martins 提交于
      Disambiguate Xen vs. Hyper-V calls by adding 'orl $0x80000000, %eax'
      at the start of the Hyper-V hypercall page when Xen hypercalls are
      also enabled.
      
      That bit is reserved in the Hyper-V ABI, and those hypercall numbers
      will never be used by Xen (because it does precisely the same trick).
      
      Switch to using kvm_vcpu_write_guest() while we're at it, instead of
      open-coding it.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      79033beb
    • J
      KVM: x86: use static calls to reduce kvm_x86_ops overhead · b3646477
      Jason Baron 提交于
      Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are
      covered here except for 'pmu_ops and 'nested ops'.
      
      Here are some numbers running cpuid in a loop of 1 million calls averaged
      over 5 runs, measured in the vm (lower is better).
      
      Intel Xeon 3000MHz:
      
                 |default    |mitigations=off
      -------------------------------------
      vanilla    |.671s      |.486s
      static call|.573s(-15%)|.458s(-6%)
      
      AMD EPYC 2500MHz:
      
                 |default    |mitigations=off
      -------------------------------------
      vanilla    |.710s      |.609s
      static call|.664s(-6%) |.609s(0%)
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b3646477
  3. 15 11月, 2020 1 次提交
  4. 28 9月, 2020 2 次提交
  5. 27 9月, 2020 1 次提交
  6. 24 8月, 2020 1 次提交
  7. 11 8月, 2020 1 次提交
    • J
      x86/kvm/hyper-v: Synic default SCONTROL MSR needs to be enabled · 99b48ecc
      Jon Doron 提交于
      Based on an analysis of the HyperV firmwares (Gen1 and Gen2) it seems
      like the SCONTROL is not being set to the ENABLED state as like we have
      thought.
      
      Also from a test done by Vitaly Kuznetsov, running a nested HyperV it
      was concluded that the first access to the SCONTROL MSR with a read
      resulted with the value of 0x1, aka HV_SYNIC_CONTROL_ENABLE.
      
      It's important to note that this diverges from the value states in the
      HyperV TLFS of 0.
      Signed-off-by: NJon Doron <arilou@gmail.com>
      Message-Id: <20200717125238.1103096-2-arilou@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      99b48ecc
  8. 04 6月, 2020 1 次提交
  9. 01 6月, 2020 3 次提交
    • J
      x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls · b187038b
      Jon Doron 提交于
      There is another mode for the synthetic debugger which uses hypercalls
      to send/recv network data instead of the MSR interface.
      
      This interface is much slower and less recommended since you might get
      a lot of VMExits while KDVM polling for new packets to recv, rather
      than simply checking the pending page to see if there is data avialble
      and then request.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NJon Doron <arilou@gmail.com>
      Message-Id: <20200529134543.1127440-6-arilou@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b187038b
    • J
      x86/kvm/hyper-v: enable hypercalls regardless of hypercall page · 45c38973
      Jon Doron 提交于
      Microsoft's kdvm.dll dbgtransport module does not respect the hypercall
      page and simply identifies the CPU being used (AMD/Intel) and according
      to it simply makes hypercalls with the relevant instruction
      (vmmcall/vmcall respectively).
      
      The relevant function in kdvm is KdHvConnectHypervisor which first checks
      if the hypercall page has been enabled via HV_X64_MSR_HYPERCALL_ENABLE,
      and in case it was not it simply sets the HV_X64_MSR_GUEST_OS_ID to
      0x1000101010001 which means:
      build_number = 0x0001
      service_version = 0x01
      minor_version = 0x01
      major_version = 0x01
      os_id = 0x00 (Undefined)
      vendor_id = 1 (Microsoft)
      os_type = 0 (A value of 0 indicates a proprietary, closed source OS)
      
      and starts issuing the hypercall without setting the hypercall page.
      
      To resolve this issue simply enable hypercalls also if the guest_os_id
      is not 0.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NJon Doron <arilou@gmail.com>
      Message-Id: <20200529134543.1127440-5-arilou@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45c38973
    • J
      x86/kvm/hyper-v: Add support for synthetic debugger interface · f97f5a56
      Jon Doron 提交于
      Add support for Hyper-V synthetic debugger (syndbg) interface.
      The syndbg interface is using MSRs to emulate a way to send/recv packets
      data.
      
      The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled
      and if it supports the synthetic debugger interface it will attempt to
      use it, instead of trying to initialize a network adapter.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NJon Doron <arilou@gmail.com>
      Message-Id: <20200529134543.1127440-4-arilou@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f97f5a56
  10. 20 5月, 2020 1 次提交
  11. 08 5月, 2020 1 次提交
  12. 23 4月, 2020 1 次提交
    • P
      KVM: x86: move nested-related kvm_x86_ops to a separate struct · 33b22172
      Paolo Bonzini 提交于
      Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to
      nested virtualization into a separate struct.
      
      As a result, these ops will always be non-NULL on VMX.  This is not a problem:
      
      * check_nested_events is only called if is_guest_mode(vcpu) returns true
      
      * get_nested_state treats VMXOFF state the same as nested being disabled
      
      * set_nested_state fails if you attempt to set nested state while
        nesting is disabled
      
      * nested_enable_evmcs could already be called on a CPU without VMX enabled
        in CPUID.
      
      * nested_get_evmcs_version was fixed in the previous patch
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33b22172
  13. 21 4月, 2020 1 次提交
  14. 31 3月, 2020 1 次提交
  15. 05 2月, 2020 1 次提交
  16. 28 1月, 2020 1 次提交
  17. 21 1月, 2020 2 次提交
  18. 09 1月, 2020 1 次提交
  19. 24 9月, 2019 2 次提交
    • V
      KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible · b2d8b167
      Vitaly Kuznetsov 提交于
      Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
      guarantee that two virtual processors won't end up running on sibling SMT
      threads without knowing about it. This is done as an optimization as in
      this case there is nothing the guest can do to protect itself against MDS
      and issuing additional flush requests is just pointless. On bare metal the
      topology is known, however, when Hyper-V is running nested (e.g. on top of
      KVM) it needs an additional piece of information: a confirmation that the
      exposed topology (wrt vCPU placement on different SMT threads) is
      trustworthy.
      
      NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
      TLFS as follows: "Indicates that a virtual processor will never share a
      physical core with another virtual processor, except for virtual processors
      that are reported as sibling SMT threads." From KVM we can give such
      guarantee in two cases:
      - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
       as it can become re-enabled during the lifetime of the guest).
      - vCPUs are properly pinned so the scheduler won't put them on sibling
      SMT threads (when they're not reported as such).
      
      This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
      first case. The second case is outside of KVM's domain of responsibility
      (as vCPU pinning is actually done by someone who manages KVM's userspace -
      e.g. libvirt pinning QEMU threads).
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2d8b167
    • W
      KVM: hyperv: Fix Direct Synthetic timers assert an interrupt w/o lapic_in_kernel · a073d7e3
      Wanpeng Li 提交于
      Reported by syzkaller:
      
      	kasan: GPF could be caused by NULL-ptr deref or user memory access
      	general protection fault: 0000 [#1] PREEMPT SMP KASAN
      	RIP: 0010:__apic_accept_irq+0x46/0x740 arch/x86/kvm/lapic.c:1029
      	Call Trace:
      	kvm_apic_set_irq+0xb4/0x140 arch/x86/kvm/lapic.c:558
      	stimer_notify_direct arch/x86/kvm/hyperv.c:648 [inline]
      	stimer_expiration arch/x86/kvm/hyperv.c:659 [inline]
      	kvm_hv_process_stimers+0x594/0x1650 arch/x86/kvm/hyperv.c:686
      	vcpu_enter_guest+0x2b2a/0x54b0 arch/x86/kvm/x86.c:7896
      	vcpu_run+0x393/0xd40 arch/x86/kvm/x86.c:8152
      	kvm_arch_vcpu_ioctl_run+0x636/0x900 arch/x86/kvm/x86.c:8360
      	kvm_vcpu_ioctl+0x6cf/0xaf0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2765
      
      The testcase programs HV_X64_MSR_STIMERn_CONFIG/HV_X64_MSR_STIMERn_COUNT,
      in addition, there is no lapic in the kernel, the counters value are small
      enough in order that kvm_hv_process_stimers() inject this already-expired
      timer interrupt into the guest through lapic in the kernel which triggers
      the NULL deferencing. This patch fixes it by don't advertise direct mode
      synthetic timers and discarding the inject when lapic is not in kernel.
      
      syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=1752fe0a600000
      
      Reported-by: syzbot+dff25ee91f0c7d5c1695@syzkaller.appspotmail.com
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a073d7e3
  20. 28 8月, 2019 1 次提交
  21. 15 7月, 2019 1 次提交
    • A
      x86: kvm: avoid -Wsometimes-uninitized warning · f4e4805e
      Arnd Bergmann 提交于
      Clang notices a code path in which some variables are never
      initialized, but fails to figure out that this can never happen
      on i386 because is_64_bit_mode() always returns false.
      
      arch/x86/kvm/hyperv.c:1610:6: error: variable 'ingpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
              if (!longmode) {
                  ^~~~~~~~~
      arch/x86/kvm/hyperv.c:1632:55: note: uninitialized use occurs here
              trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa);
                                                                   ^~~~~
      arch/x86/kvm/hyperv.c:1610:2: note: remove the 'if' if its condition is always true
              if (!longmode) {
              ^~~~~~~~~~~~~~~
      arch/x86/kvm/hyperv.c:1595:18: note: initialize the variable 'ingpa' to silence this warning
              u64 param, ingpa, outgpa, ret = HV_STATUS_SUCCESS;
                              ^
                               = 0
      arch/x86/kvm/hyperv.c:1610:6: error: variable 'outgpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
      arch/x86/kvm/hyperv.c:1610:6: error: variable 'param' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
      
      Flip the condition around to avoid the conditional execution on i386.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4e4805e
  22. 19 6月, 2019 1 次提交
  23. 01 5月, 2019 1 次提交
    • S
      KVM: x86: Omit caching logic for always-available GPRs · de3cd117
      Sean Christopherson 提交于
      Except for RSP and RIP, which are held in VMX's VMCS, GPRs are always
      treated "available and dirtly" on both VMX and SVM, i.e. are
      unconditionally loaded/saved immediately before/after VM-Enter/VM-Exit.
      
      Eliminating the unnecessary caching code reduces the size of KVM by a
      non-trivial amount, much of which comes from the most common code paths.
      E.g. on x86_64, kvm_emulate_cpuid() is reduced from 342 to 182 bytes and
      kvm_emulate_hypercall() from 1362 to 1143, with the total size of KVM
      dropping by ~1000 bytes.  With CONFIG_RETPOLINE=y, the numbers are even
      more pronounced, e.g.: 353->182, 1418->1172 and well over 2000 bytes.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      de3cd117
  24. 19 4月, 2019 1 次提交
    • V
      x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 · da66761c
      Vitaly Kuznetsov 提交于
      It was reported that with some special Multi Processor Group configuration,
      e.g:
       bcdedit.exe /set groupsize 1
       bcdedit.exe /set maxgroup on
       bcdedit.exe /set groupaware on
      for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
      is in use.
      
      Tracing kvm_hv_flush_tlb immediately reveals the issue:
      
       kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
      
      The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
      however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
      We don't flush anything and apparently it's not what Windows expects.
      
      TLFS doesn't say anything about such requests and newer Windows versions
      seem to be unaffected. This all feels like a WS2012 bug, which is, however,
      easy to workaround in KVM: let's flush everything when we see an empty
      flush request, over-flushing doesn't hurt.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da66761c
  25. 29 3月, 2019 1 次提交
    • V
      x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init · 013cc6eb
      Vitaly Kuznetsov 提交于
      When userspace initializes guest vCPUs it may want to zero all supported
      MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/
      HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5 ("kvm/x86: Update SynIC
      timers on guest entry only") we began doing stimer_mark_pending()
      unconditionally on every config change.
      
      The issue I'm observing manifests itself as following:
      - Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as
        pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER;
      - kvm_hv_has_stimer_pending() starts returning true;
      - kvm_vcpu_has_events() starts returning true;
      - kvm_arch_vcpu_runnable() starts returning true;
      - when kvm_arch_vcpu_ioctl_run() gets into
        (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case:
        - kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns
          immediately, avoiding normal wait path;
        - -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing
          userspace to retry.
      
      So instead of normal wait path we get a busy loop on all secondary vCPUs
      before they get INIT signal. This seems to be undesirable, especially given
      that this happens even when Hyper-V extensions are not used.
      
      Generally, it seems to be pointless to mark an stimer as pending in
      stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing
      kvm_hv_process_stimers() will do is clear the corresponding bit. We may
      just not mark disabled timers as pending instead.
      
      Fixes: f3b138c5 ("kvm/x86: Update SynIC timers on guest entry only")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      013cc6eb
  26. 21 2月, 2019 1 次提交
    • B
      kvm: x86: Add memcg accounting to KVM allocations · 254272ce
      Ben Gardon 提交于
      There are many KVM kernel memory allocations which are tied to the life of
      the VM process and should be charged to the VM process's cgroup. If the
      allocations aren't tied to the process, the OOM killer will not know
      that killing the process will free the associated kernel memory.
      Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
      charged to the VM process's cgroup.
      
      Tested:
      	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
      	introduced no new failures.
      	Ran a kernel memory accounting test which creates a VM to touch
      	memory and then checks that the kernel memory allocated for the
      	process is within certain bounds.
      	With this patch we account for much more of the vmalloc and slab memory
      	allocated for the VM.
      
      There remain a few allocations which should be charged to the VM's
      cgroup but are not. In x86, they include:
      	vcpu->arch.pio_data
      There allocations are unaccounted in this patch because they are mapped
      to userspace, and accounting them to a cgroup causes problems. This
      should be addressed in a future patch.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      254272ce
  27. 26 1月, 2019 4 次提交
  28. 15 12月, 2018 4 次提交