1. 05 3月, 2021 1 次提交
  2. 09 2月, 2021 2 次提交
    • V
      KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context · f2bc14b6
      Vitaly Kuznetsov 提交于
      Currently, Hyper-V context is part of 'struct kvm_vcpu_arch' and is always
      available. As a preparation to allocating it dynamically, check that it is
      not NULL at call sites which can normally proceed without it i.e. the
      behavior is identical to the situation when Hyper-V emulation is not being
      used by the guest.
      
      When Hyper-V context for a particular vCPU is not allocated, we may still
      need to get 'vp_index' from there. E.g. in a hypothetical situation when
      Hyper-V emulation was enabled on one CPU and wasn't on another, Hyper-V
      style send-IPI hypercall may still be used. Luckily, vp_index is always
      initialized to kvm_vcpu_get_idx() and can only be changed when Hyper-V
      context is present. Introduce kvm_hv_get_vpindex() helper for
      simplification.
      
      No functional change intended.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210126134816.1880136-12-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f2bc14b6
    • V
      KVM: x86: hyper-v: Rename vcpu_to_synic()/synic_to_vcpu() · e0121fa2
      Vitaly Kuznetsov 提交于
      vcpu_to_synic()'s argument is almost always 'vcpu' so there's no need to
      have an additional prefix. Also, as this is used outside of hyper-v
      emulation code, add '_hv_' part to make it clear what this s. This makes
      the naming more consistent with to_hv_vcpu().
      
      Rename synic_to_vcpu() to hv_synic_to_vcpu() for consistency.
      
      No functional change intended.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210126134816.1880136-6-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0121fa2
  3. 04 2月, 2021 2 次提交
    • J
      KVM: x86: use static calls to reduce kvm_x86_ops overhead · b3646477
      Jason Baron 提交于
      Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are
      covered here except for 'pmu_ops and 'nested ops'.
      
      Here are some numbers running cpuid in a loop of 1 million calls averaged
      over 5 runs, measured in the vm (lower is better).
      
      Intel Xeon 3000MHz:
      
                 |default    |mitigations=off
      -------------------------------------
      vanilla    |.671s      |.486s
      static call|.573s(-15%)|.458s(-6%)
      
      AMD EPYC 2500MHz:
      
                 |default    |mitigations=off
      -------------------------------------
      vanilla    |.710s      |.609s
      static call|.664s(-6%) |.609s(0%)
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b3646477
    • C
      KVM: Stop using deprecated jump label APIs · 6e4e3b4d
      Cun Li 提交于
      The use of 'struct static_key' and 'static_key_false' is
      deprecated. Use the new API.
      Signed-off-by: NCun Li <cun.jia.li@gmail.com>
      Message-Id: <20210111152435.50275-1-cun.jia.li@gmail.com>
      [Make it compile.  While at it, rename kvm_no_apic_vcpu to
       kvm_has_noapic_vcpu; the former reads too much like "true if
       no vCPU has an APIC". - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6e4e3b4d
  4. 08 1月, 2021 2 次提交
    • T
      KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2
      Tom Lendacky 提交于
      Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
      where the guest vCPU register state is updated and then the vCPU is VMRUN
      to begin execution of the AP. For an SEV-ES guest, this won't work because
      the guest register state is encrypted.
      
      Following the GHCB specification, the hypervisor must not alter the guest
      register state, so KVM must track an AP/vCPU boot. Should the guest want
      to park the AP, it must use the AP Reset Hold exit event in place of, for
      example, a HLT loop.
      
      First AP boot (first INIT-SIPI-SIPI sequence):
        Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
        support. It is up to the guest to transfer control of the AP to the
        proper location.
      
      Subsequent AP boot:
        KVM will expect to receive an AP Reset Hold exit event indicating that
        the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
        awaken it. When the AP Reset Hold exit event is received, KVM will place
        the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
        sequence, KVM will make the vCPU runnable. It is again up to the guest
        to then transfer control of the AP to the proper location.
      
        To differentiate between an actual HLT and an AP Reset Hold, a new MP
        state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
        placed in upon receiving the AP Reset Hold exit event. Additionally, to
        communicate the AP Reset Hold exit event up to userspace (if needed), a
        new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
      
      A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
      to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
      original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
      a new function that, for non SEV-ES guests, invokes the original SIPI
      delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
      implements the logic above.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      647daca2
    • S
      KVM: x86: change in pv_eoi_get_pending() to make code more readable · de7860c8
      Stephen Zhang 提交于
      Signed-off-by: NStephen Zhang <stephenzhangzsd@gmail.com>
      Message-Id: <1608277897-1932-1-git-send-email-stephenzhangzsd@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      de7860c8
  5. 10 12月, 2020 1 次提交
  6. 27 11月, 2020 1 次提交
  7. 15 11月, 2020 1 次提交
    • P
      KVM: x86: fix apic_accept_events vs check_nested_events · 1c96dcce
      Paolo Bonzini 提交于
      vmx_apic_init_signal_blocked is buggy in that it returns true
      even in VMX non-root mode.  In non-root mode, however, INITs
      are not latched, they just cause a vmexit.  Previously,
      KVM was waiting for them to be processed when kvm_apic_accept_events
      and in the meanwhile it ate the SIPIs that the processor received.
      
      However, in order to implement the wait-for-SIPI activity state,
      KVM will have to process KVM_APIC_SIPI in vmx_check_nested_events,
      and it will not be possible anymore to disregard SIPIs in non-root
      mode as the code is currently doing.
      
      By calling kvm_x86_ops.nested_ops->check_events, we can force a vmexit
      (with the side-effect of latching INITs) before incorrectly injecting
      an INIT or SIPI in a guest, and therefore vmx_apic_init_signal_blocked
      can do the right thing.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1c96dcce
  8. 28 9月, 2020 6 次提交
  9. 24 8月, 2020 1 次提交
  10. 31 7月, 2020 2 次提交
  11. 27 7月, 2020 1 次提交
  12. 09 7月, 2020 2 次提交
  13. 23 6月, 2020 2 次提交
    • P
      KVM: LAPIC: ensure APIC map is up to date on concurrent update requests · 44d52717
      Paolo Bonzini 提交于
      The following race can cause lost map update events:
      
               cpu1                            cpu2
      
                                      apic_map_dirty = true
        ------------------------------------------------------------
                                      kvm_recalculate_apic_map:
                                           pass check
                                               mutex_lock(&kvm->arch.apic_map_lock);
                                               if (!kvm->arch.apic_map_dirty)
                                           and in process of updating map
        -------------------------------------------------------------
          other calls to
             apic_map_dirty = true         might be too late for affected cpu
        -------------------------------------------------------------
                                           apic_map_dirty = false
        -------------------------------------------------------------
          kvm_recalculate_apic_map:
          bail out on
            if (!kvm->arch.apic_map_dirty)
      
      To fix it, record the beginning of an update of the APIC map in
      apic_map_dirty.  If another APIC map change switches apic_map_dirty
      back to DIRTY during the update, kvm_recalculate_apic_map should not
      make it CLEAN, and the other caller will go through the slow path.
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      44d52717
    • I
      kvm: lapic: fix broken vcpu hotplug · af28dfac
      Igor Mammedov 提交于
      Guest fails to online hotplugged CPU with error
        smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
      
      It's caused by the fact that kvm_apic_set_state(), which used to call
      recalculate_apic_map() unconditionally and pulled hotplugged CPU into
      apic map, is updating map conditionally on state changes.  In this case
      the APIC map is not considered dirty and the is not updated.
      
      Fix the issue by forcing unconditional update from kvm_apic_set_state(),
      like it used to be.
      
      Fixes: 4abaffce ("KVM: LAPIC: Recalculate apic map in batch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Message-Id: <20200622160830.426022-1-imammedo@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af28dfac
  14. 28 5月, 2020 1 次提交
  15. 16 5月, 2020 2 次提交
  16. 14 5月, 2020 1 次提交
    • D
      kvm: Replace vcpu->swait with rcuwait · da4ad88c
      Davidlohr Bueso 提交于
      The use of any sort of waitqueue (simple or regular) for
      wait/waking vcpus has always been an overkill and semantically
      wrong. Because this is per-vcpu (which is blocked) there is
      only ever a single waiting vcpu, thus no need for any sort of
      queue.
      
      As such, make use of the rcuwait primitive, with the following
      considerations:
      
        - rcuwait already provides the proper barriers that serialize
        concurrent waiter and waker.
      
        - Task wakeup is done in rcu read critical region, with a
        stable task pointer.
      
        - Because there is no concurrency among waiters, we need
        not worry about rcuwait_wait_event() calls corrupting
        the wait->task. As a consequence, this saves the locking
        done in swait when modifying the queue. This also applies
        to per-vcore wait for powerpc kvm-hv.
      
      The x86 tscdeadline_latency test mentioned in 8577370f
      ("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
      latency is reduced by around 15-20% with this change.
      
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: kvmarm@lists.cs.columbia.edu
      Cc: linux-mips@vger.kernel.org
      Reviewed-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Message-Id: <20200424054837.5138-6-dave@stgolabs.net>
      [Avoid extra logic changes. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da4ad88c
  17. 16 4月, 2020 1 次提交
    • P
      KVM: x86: Return updated timer current count register from KVM_GET_LAPIC · 24647e0a
      Peter Shier 提交于
      kvm_vcpu_ioctl_get_lapic (implements KVM_GET_LAPIC ioctl) does a bulk copy
      of the LAPIC registers but must take into account that the one-shot and
      periodic timer current count register is computed upon reads and is not
      present in register state. When restoring LAPIC state (e.g. after
      migration), restart timers from their their current count values at time of
      save.
      
      Note: When a one-shot timer expires, the code in arch/x86/kvm/lapic.c does
      not zero the value of the LAPIC initial count register (emulating HW
      behavior). If no other timer is run and pending prior to a subsequent
      KVM_GET_LAPIC call, the returned register set will include the expired
      one-shot initial count. On a subsequent KVM_SET_LAPIC call the code will
      see a non-zero initial count and start a new one-shot timer using the
      expired timer's count. This is a prior existing bug and will be addressed
      in a separate patch. Thanks to jmattson@google.com for this find.
      Signed-off-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <20181010225653.238911-1-pshier@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      24647e0a
  18. 07 4月, 2020 1 次提交
  19. 31 3月, 2020 1 次提交
  20. 26 3月, 2020 1 次提交
  21. 24 3月, 2020 1 次提交
  22. 23 3月, 2020 1 次提交
    • H
      KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context · edec6e01
      He Zhe 提交于
      apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but
      started later with HRTIMER_MODE_ABS, which may cause the following warning
      in PREEMPT_RT kernel.
      
      WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0
      CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1
      Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018
      RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0
      Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45
            b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34
            c5 a0 26 bf 93 e9 a1 fd ff ff <0f> 0b e9 fd fc ff
            ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91
      RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110
      RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf
      R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00
      FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0
      Call Trace:
      ? kvm_release_pfn_clean+0x22/0x60 [kvm]
      start_sw_timer+0x85/0x230 [kvm]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm]
      vmx_pre_block+0x1cb/0x260 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel]
      ? kvm_apic_has_interrupt+0x46/0x80 [kvm]
      kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm]
      ? _raw_spin_unlock_irqrestore+0x18/0x50
      ? _copy_to_user+0x2c/0x30
      kvm_vcpu_ioctl+0x235/0x660 [kvm]
      ? rt_spin_unlock+0x2c/0x50
      do_vfs_ioctl+0x3e4/0x650
      ? __fget+0x7a/0xa0
      ksys_ioctl+0x67/0x90
      __x64_sys_ioctl+0x1a/0x20
      do_syscall_64+0x4d/0x120
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7f4027cc54a7
      Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00
            00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
            00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff
            73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48
      RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7
      RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
      RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff
      R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840
      --[ end trace 0000000000000002 ]--
      Signed-off-by: NHe Zhe <zhe.he@windriver.com>
      Message-Id: <1584687967-332859-1-git-send-email-zhe.he@windriver.com>
      Reviewed-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      edec6e01
  23. 17 3月, 2020 2 次提交
  24. 22 2月, 2020 2 次提交
  25. 13 2月, 2020 1 次提交
  26. 05 2月, 2020 1 次提交