提交 · b26a695a1d78cc415fe26d74d0463f5d887980de · openeuler / Kernel

05 2月, 2020 1 次提交

kvm: lapic: Introduce APICv update helper function · b26a695a

由 Suravee Suthikulpanit 提交于 11月 14, 2019

Re-factor code into a helper function for setting lapic parameters when
activate/deactivate APICv, and export the function for subsequent usage.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b26a695a

28 1月, 2020 3 次提交

Adding 'else' to reduce checking. · 17ac43a8

由 Haiwei Li 提交于 1月 16, 2020

These two conditions are in conflict, adding 'else' to reduce checking.
Signed-off-by: NHaiwei Li <lihaiwei@tencent.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17ac43a8

KVM: apic: short-circuit kvm_apic_accept_pic_intr() when pic intr is accepted · 3ce4dc17

由 Miaohe Lin 提交于 1月 18, 2020

Short-circuit kvm_apic_accept_pic_intr() when pic intr is accepted, there
is no need to proceed further. Also remove unnecessary var r.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ce4dc17

KVM: x86: Protect kvm_lapic_reg_write() from Spectre-v1/L1TF attacks · 4bf79cb0

由 Marios Pomonis 提交于 12月 11, 2019

This fixes a Spectre-v1/L1TF vulnerability in kvm_lapic_reg_write().
This function contains index computations based on the
(attacker-controlled) MSR number.

Fixes: 0105d1a5 ("KVM: x2apic interface to lapic")
Signed-off-by: NNick Finco <nifi@google.com>
Signed-off-by: NMarios Pomonis <pomonis@google.com>
Reviewed-by: NAndrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4bf79cb0

21 1月, 2020 2 次提交

KVM: Fix some grammar mistakes · 00116795

由 Miaohe Lin 提交于 12月 11, 2019

Fix some grammar mistakes in the comments.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00116795

KVM: Fix some comment typos and missing parentheses · 67b0ae43

由 Miaohe Lin 提交于 12月 11, 2019

Fix some typos and add missing parentheses in the comments.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67b0ae43

09 1月, 2020 3 次提交

KVM: X86: Fix callers of kvm_apic_match_dest() to use correct macros · 5c69d5c1

由 Peter Xu 提交于 12月 04, 2019

Callers of kvm_apic_match_dest() should always pass in APIC_DEST_*
macros for either dest_mode and short_hand parameters.  Fix up all the
callers of kvm_apic_match_dest() that are not following the rule.

Since at it, rename the parameter from short_hand to shorthand in
kvm_apic_match_dest(), as suggested by Vitaly.
Reported-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c69d5c1

KVM: X86: Drop KVM_APIC_SHORT_MASK and KVM_APIC_DEST_MASK · ac8ef992

由 Peter Xu 提交于 12月 04, 2019

We have both APIC_SHORT_MASK and KVM_APIC_SHORT_MASK defined for the
shorthand mask.  Similarly, we have both APIC_DEST_MASK and
KVM_APIC_DEST_MASK defined for the destination mode mask.

Drop the KVM_APIC_* macros and replace the only user of them to use
the APIC_DEST_* macros instead.  At the meantime, move APIC_SHORT_MASK
and APIC_DEST_MASK from lapic.c to lapic.h.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ac8ef992

KVM: X86: Fix kvm_bitmap_or_dest_vcpus() to use irq shorthand · b4b29636

由 Peter Xu 提交于 12月 04, 2019

The 3rd parameter of kvm_apic_match_dest() is the irq shorthand,
rather than the irq delivery mode.

Fixes: 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b4b29636

15 11月, 2019 3 次提交

KVM: x86: deliver KVM IOAPIC scan request to target vCPUs · 7ee30bc1

由 Nitesh Narayan Lal 提交于 11月 07, 2019

In IOAPIC fixed delivery mode instead of flushing the scan
requests to all vCPUs, we should only send the requests to
vCPUs specified within the destination field.

This patch introduces kvm_get_dest_vcpus_mask() API which
retrieves an array of target vCPUs by using
kvm_apic_map_get_dest_lapic() and then based on the
vcpus_idx, it sets the bit in a bitmap. However, if the above
fails kvm_get_dest_vcpus_mask() finds the target vCPUs by
traversing all available vCPUs. Followed by setting the
bits in the bitmap.

If we had different vCPUs in the previous request for the
same redirection table entry then bits corresponding to
these vCPUs are also set. This to done to keep
ioapic_handled_vectors synchronized.

This bitmap is then eventually passed on to
kvm_make_vcpus_request_mask() to generate a masked request
only for the target vCPUs.

This would enable us to reduce the latency overhead on isolated
vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN.
Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NNitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ee30bc1

KVM: APIC: add helper func to remove duplicate code in kvm_pv_send_ipi · 1a686237

由 Miaohe Lin 提交于 11月 09, 2019

There are some duplicate code in kvm_pv_send_ipi when deal with ipi
bitmap. Add helper func to remove it, and eliminate odd out label,
get rid of unnecessary kvm_lapic_irq field init and so on.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a686237

KVM: x86: Prevent set vCPU into INIT/SIPI_RECEIVED state when INIT are latched · 27cbe7d6

由 Liran Alon 提交于 11月 11, 2019

Commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
operation.

However, current API of KVM_SET_MP_STATE allows userspace to put vCPU
into KVM_MP_STATE_SIPI_RECEIVED or KVM_MP_STATE_INIT_RECEIVED even when
vCPU is in VMX operation.

Fix this by introducing a util method to check if vCPU state latch INIT
signals and use it in KVM_SET_MP_STATE handler.

Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
Reported-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27cbe7d6

23 10月, 2019 1 次提交

KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update · 5c94ac5d

由 Miaohe Lin 提交于 10月 18, 2019

Guest physical APIC ID may not equal to vcpu->vcpu_id in some case.
We may set the wrong physical id in avic_handle_ldr_update as we
always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page
instead.
Export and use kvm_xapic_id here and in avic_handle_apic_id_update
as suggested by Vitaly.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c94ac5d

26 9月, 2019 1 次提交

KVM: LAPIC: Loosen filter for adaptive tuning of lapic_timer_advance_ns · a0f0037e

由 Wanpeng Li 提交于 9月 26, 2019

5000 guest cycles delta is easy to encounter on desktop, per-vCPU
lapic_timer_advance_ns always keeps at 1000ns initial value, let's
loosen the filter a bit to let adaptive tuning make progress.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a0f0037e

24 9月, 2019 1 次提交

KVM: LAPIC: Tune lapic_timer_advance_ns smoothly · d0f5a86a

由 Wanpeng Li 提交于 9月 17, 2019

Filter out drastic fluctuation and random fluctuation, remove
timer_advance_adjust_done altogether, the adjustment would be
continuous.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0f5a86a

12 9月, 2019 2 次提交

KVM: x86: Fix INIT signal handling in various CPU states · 4b9852f4

由 Liran Alon 提交于 8月 26, 2019

Commit cd7764fe ("KVM: x86: latch INITs while in system management mode")
changed code to latch INIT while vCPU is in SMM and process latched INIT
when leaving SMM. It left a subtle remark in commit message that similar
treatment should also be done while vCPU is in VMX non-root-mode.

However, INIT signals should actually be latched in various vCPU states:
(*) For both Intel and AMD, INIT signals should be latched while vCPU
is in SMM.
(*) For Intel, INIT should also be latched while vCPU is in VMX
operation and later processed when vCPU leaves VMX operation by
executing VMXOFF.
(*) For AMD, INIT should also be latched while vCPU runs with GIF=0
or in guest-mode with intercept defined on INIT signal.

To fix this:
1) Add kvm_x86_ops->apic_init_signal_blocked() such that each CPU vendor
can define the various CPU states in which INIT signals should be
blocked and modify kvm_apic_accept_events() to use it.
2) Modify vmx_check_nested_events() to check for pending INIT signal
while vCPU in guest-mode. If so, emualte vmexit on
EXIT_REASON_INIT_SIGNAL. Note that nSVM should have similar behaviour
but is currently left as a TODO comment to implement in the future
because nSVM don't yet implement svm_check_nested_events().

Note: Currently KVM nVMX implementation don't support VMX wait-for-SIPI
activity state as specified in MSR_IA32_VMX_MISC bits 6:8 exposed to
guest (See nested_vmx_setup_ctls_msrs()).
If and when support for this activity state will be implemented,
kvm_check_nested_events() would need to avoid emulating vmexit on
INIT signal in case activity-state is wait-for-SIPI. In addition,
kvm_apic_accept_events() would need to be modified to avoid discarding
SIPI in case VMX activity-state is wait-for-SIPI but instead delay
SIPI processing to vmx_check_nested_events() that would clear
pending APIC events and emulate vmexit on SIPI.
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Co-developed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4b9852f4

KVM: LAPIC: Micro optimize IPI latency · 2b0911d1

由 Wanpeng Li 提交于 9月 05, 2019

This patch optimizes the virtual IPI emulation sequence:

write ICR2                     write ICR2
write ICR                      read ICR2
read ICR            ==>        send virtual IPI
read ICR2                      write ICR
send virtual IPI

It can reduce kvm-unit-tests/vmexit.flat IPI testing latency(from sender
send IPI to sender receive the ACK) from 3319 cycles to 3203 cycles on
SKylake server.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b0911d1

14 8月, 2019 1 次提交

kvm: x86: skip populating logical dest map if apic is not sw enabled · b14c876b

由 Radim Krcmar 提交于 8月 13, 2019

recalculate_apic_map does not santize ldr and it's possible that
multiple bits are set. In that case, a previous valid entry
can potentially be overwritten by an invalid one.

This condition is hit when booting a 32 bit, >8 CPU, RHEL6 guest and then
triggering a crash to boot a kdump kernel. This is the sequence of
events:
1. Linux boots in bigsmp mode and enables PhysFlat, however, it still
writes to the LDR which probably will never be used.
2. However, when booting into kdump, the stale LDR values remain as
they are not cleared by the guest and there isn't a apic reset.
3. kdump boots with 1 cpu, and uses Logical Destination Mode but the
logical map has been overwritten and points to an inactive vcpu.
Signed-off-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b14c876b

05 8月, 2019 1 次提交

KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire · a48d06f9

由 Wanpeng Li 提交于 8月 01, 2019

kvm_set_pending_timer() will take care to wake up the sleeping vCPU which
has pending timer, don't need to check this in apic_timer_expired() again.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a48d06f9

02 8月, 2019 1 次提交

KVM: LAPIC: Mark hrtimer to expire in hard interrupt context · 2c0d278f

由 Sebastian Andrzej Siewior 提交于 7月 26, 2019

On PREEMPT_RT enabled kernels unmarked hrtimers are moved into soft
interrupt expiry mode by default.

While that's not a functional requirement for the KVM local APIC timer
emulation, it's a latency issue which can be avoided by marking the timer
so hard interrupt context expiry is enforced.

No functional change.

[ tglx: Split out from larger combo patch. Add changelog. ]
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.363363474@linutronix.de

2c0d278f

20 7月, 2019 1 次提交

KVM: LAPIC: Inject timer interrupt via posted interrupt · 0c5f81da

由 Wanpeng Li 提交于 7月 06, 2019

Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers firing on the same pCPUs where the
vCPUs reside. There is no hardware virtual timer on Intel for guest
like ARM, so both programming timer in guest and the emulated timer fires
incur vmexits. This patch tries to avoid vmexit when the emulated timer
fires, at least in dedicated instance scenario when nohz_full is enabled.

In that case, the emulated timers can be offload to the nearest busy
housekeeping cpus since APICv has been found for several years in server
processors. The guest timer interrupt can then be injected via posted interrupts,
which are delivered by the housekeeping cpu once the emulated timer fires.

The host should tuned so that vCPUs are placed on isolated physical
processors, and with several pCPUs surplus for busy housekeeping.
If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
~3% redis performance benefit can be observed on Skylake server, and the
number of external interrupt vmexits drops substantially. Without patch

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )

While with patch:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0c5f81da

18 7月, 2019 1 次提交

KVM: LAPIC: Make lapic timer unpinned · 4d151bf3

由 Wanpeng Li 提交于 7月 06, 2019

Commit 61abdbe0 ("kvm: x86: make lapic hrtimer pinned") pinned the
lapic timer to avoid to wait until the next kvm exit for the guest to
see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick
after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned
will be used in follow up patches.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4d151bf3

16 7月, 2019 1 次提交

kvm: x86: ioapic and apic debug macros cleanup · 0d88800d

由 Yi Wang 提交于 7月 06, 2019

The ioapic_debug and apic_debug have been not used
for years, and kvm tracepoints are enough for debugging,
so remove them as Paolo suggested.

However, there may be something wrong when pv evi get/put
user, so it's better to retain some log there.
Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d88800d

06 7月, 2019 1 次提交

KVM: LAPIC: Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane · 548f7fb2

由 Wanpeng Li 提交于 7月 05, 2019

Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane which
can happen sporadically in product environment.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

548f7fb2

05 7月, 2019 2 次提交

kvm: LAPIC: write down valid APIC registers · 01402cf8

由 Paolo Bonzini 提交于 7月 05, 2019

Replace a magic 64-bit mask with a list of valid registers, computing
the same mask in the end.
Suggested-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

01402cf8

KVM: LAPIC: ARBPRI is a reserved register for x2APIC · 101628de

由 Paolo Bonzini 提交于 7月 05, 2019

kvm-unit-tests were adjusted to match bare metal behavior, but KVM
itself was not doing what bare metal does; fix that.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

101628de

03 7月, 2019 1 次提交

KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC · bb34e690

由 Wanpeng Li 提交于 7月 02, 2019

Thomas reported that:

 | Background:
 |
 |    In preparation of supporting IPI shorthands I changed the CPU offline
 |    code to software disable the local APIC instead of just masking it.
 |    That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV
 |    register.
 |
 | Failure:
 |
 |    When the CPU comes back online the startup code triggers occasionally
 |    the warning in apic_pending_intr_clear(). That complains that the IRRs
 |    are not empty.
 |
 |    The offending vector is the local APIC timer vector who's IRR bit is set
 |    and stays set.
 |
 | It took me quite some time to reproduce the issue locally, but now I can
 | see what happens.
 |
 | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1
 | (and hardware support) it behaves correctly.
 |
 | Here is the series of events:
 |
 |     Guest CPU
 |
 |     goes down
 |
 |       native_cpu_disable()
 |
 | 			apic_soft_disable();
 |
 |     play_dead()
 |
 |     ....
 |
 |     startup()
 |
 |       if (apic_enabled())
 |         apic_pending_intr_clear()	<- Not taken
 |
 |      enable APIC
 |
 |         apic_pending_intr_clear()	<- Triggers warning because IRR is stale
 |
 | When this happens then the deadline timer or the regular APIC timer -
 | happens with both, has fired shortly before the APIC is disabled, but the
 | interrupt was not serviced because the guest CPU was in an interrupt
 | disabled region at that point.
 |
 | The state of the timer vector ISR/IRR bits:
 |
 |     	     	       	        ISR     IRR
 | before apic_soft_disable()    0	      1
 | after apic_soft_disable()     0	      1
 |
 | On startup		      		 0	      1
 |
 | Now one would assume that the IRR is cleared after the INIT reset, but this
 | happens only on CPU0.
 |
 | Why?
 |
 | Because our CPU0 hotplug is just for testing to make sure nothing breaks
 | and goes through an NMI wakeup vehicle because INIT would send it through
 | the boots-trap code which is not really working if that CPU was not
 | physically unplugged.
 |
 | Now looking at a real world APIC the situation in that case is:
 |
 |     	     	       	      	ISR     IRR
 | before apic_soft_disable()    0	      1
 | after apic_soft_disable()     0	      1
 |
 | On startup		      		 0	      0
 |
 | Why?
 |
 | Once the dying CPU reenables interrupts the pending interrupt gets
 | delivered as a spurious interupt and then the state is clear.
 |
 | While that CPU0 hotplug test case is surely an esoteric issue, the APIC
 | emulation is still wrong, Even if the play_dead() code would not enable
 | interrupts then the pending IRR bit would turn into an ISR .. interrupt
 | when the APIC is reenabled on startup.

From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled
* Pending interrupts in the IRR and ISR registers are held and require
  masking or handling by the CPU.

In Thomas's testing, hardware cpu will not respect soft disable LAPIC
when IRR has already been set or APICv posted-interrupt is in flight,
so we can skip soft disable APIC checking when clearing IRR and set ISR,
continue to respect soft disable APIC when attempting to set IRR.
Reported-by: NRong Chen <rong.a.chen@intel.com>
Reported-by: NFeng Tang <feng.tang@intel.com>
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rong Chen <rong.a.chen@intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb34e690

20 6月, 2019 1 次提交

KVM: x86: Fix apic dangling pointer in vcpu · a251fb90

由 Saar Amar 提交于 5月 06, 2019

The function kvm_create_lapic() attempts to allocate the apic structure
and sets a pointer to it in the virtual processor structure. However, if
get_zeroed_page() failed, the function frees the apic chunk, but forgets
to set the pointer in the vcpu to NULL. It's not a security issue since
there isn't a use of that pointer if kvm_create_lapic() returns error,
but it's more accurate that way.
Signed-off-by: NSaar Amar <saaramar@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a251fb90

19 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 · 20c8ccb1

由 Thomas Gleixner 提交于 6月 04, 2019

Based on 1 normalized pattern(s):

  this work is licensed under the terms of the gnu gpl version 2 see
  the copying file in the top level directory

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 35 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NEnrico Weigelt <info@metux.net>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

20c8ccb1

18 6月, 2019 2 次提交

kvm: x86: offset is ensure to be in range · 4cb8b116

由 Wei Yang 提交于 3月 31, 2019

In function apic_mmio_write(), the offset has been checked in:

   * apic_mmio_in_range()
   * offset & 0xf

These two ensures offset is in range [0x010, 0xff0].
Signed-off-by: NWei Yang <richardw.yang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4cb8b116

kvm: x86: use same convention to name kvm_lapic_{set,clear}_vector() · ee171d2f

由 Wei Yang 提交于 3月 31, 2019

apic_clear_vector() is the counterpart of kvm_lapic_set_vector(),
while they have different naming convention.

Rename it and move together to arch/x86/kvm/lapic.h. Also fix one typo
in comment by hand.
Signed-off-by: NWei Yang <richardw.yang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee171d2f

05 6月, 2019 3 次提交

KVM: LAPIC: Optimize timer latency further · b6c4bc65

由 Wanpeng Li 提交于 5月 20, 2019

Advance lapic timer tries to hidden the hypervisor overhead between the
host emulated timer fires and the guest awares the timer is fired. However,
it just hidden the time between apic_timer_fn/handle_preemption_timer ->
wait_lapic_expire, instead of the real position of vmentry which is
mentioned in the orignial commit d0659d94 ("KVM: x86: add option to
advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between
the end of wait_lapic_expire and before world switch on my haswell desktop.

This patch tries to narrow the last gap(wait_lapic_expire -> world switch),
it takes the real overhead time between apic_timer_fn/handle_preemption_timer
and before world switch into consideration when adaptively tuning timer
advancement. The patch can reduce 40% latency (~1600+ cycles to ~1000+ cycles
on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
busy waits.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6c4bc65

KVM: LAPIC: Delay trace_kvm_wait_lapic_expire tracepoint to after vmexit · ec0671d5

由 Wanpeng Li 提交于 5月 20, 2019

wait_lapic_expire() call was moved above guest_enter_irqoff() because of
its tracepoint, which violated the RCU extended quiescent state invoked
by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint
below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before
VM-Enter, but trace it after VM-Exit. This can help us to move
wait_lapic_expire() just before vmentry in the later patch.

[1] Commit 8b89fe1f ("kvm: x86: move tracepoints outside extended quiescent state")
[2] https://patchwork.kernel.org/patch/7821111/

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
[Track whether wait_lapic_expire was called, and do not invoke the tracepoint
 if not. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec0671d5

KVM: LAPIC: Extract adaptive tune timer advancement logic · 84ea3aca

由 Wanpeng Li 提交于 5月 20, 2019

Extract adaptive tune timer advancement logic to a single function.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
[Rename new function. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

84ea3aca

01 5月, 2019 5 次提交

KVM: lapic: Check for a pending timer intr prior to start_hv_timer() · 4ca88b3f

由 Sean Christopherson 提交于 4月 16, 2019

Checking for a pending non-periodic interrupt in start_hv_timer() leads
to restart_apic_timer() making an unnecessary call to start_sw_timer()
due to start_hv_timer() returning false.

Alternatively, start_hv_timer() could return %true when there is a
pending non-periodic interrupt, but that approach is less intuitive,
i.e. would require a beefy comment to explain an otherwise simple check.

Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Suggested-by: NLiran Alon <liran.alon@oracle.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ca88b3f

KVM: lapic: Refactor ->set_hv_timer to use an explicit expired param · f9927982

由 Sean Christopherson 提交于 4月 16, 2019

Refactor kvm_x86_ops->set_hv_timer to use an explicit parameter for
stating that the timer has expired.  Overloading the return value is
unnecessarily clever, e.g. can lead to confusion over the proper return
value from start_hv_timer() when r==1.

Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f9927982

KVM: lapic: Explicitly cancel the hv timer if it's pre-expired · f1ba5cfb

由 Sean Christopherson 提交于 4月 16, 2019

Explicitly call cancel_hv_timer() instead of returning %false to coerce
restart_apic_timer() into canceling it by way of start_sw_timer().

Functionally, the existing code is correct in the sense that it doesn't
doing anything visibily wrong, e.g. generate spurious interrupts or miss
an interrupt.  But it's extremely confusing and inefficient, e.g. there
are multiple extraneous calls to apic_timer_expired() that effectively
get dropped due to @timer_pending being %true.

Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f1ba5cfb

KVM: lapic: Busy wait for timer to expire when using hv_timer · ee66e453

由 Sean Christopherson 提交于 4月 16, 2019

...now that VMX's preemption timer, i.e. the hv_timer, also adjusts its
programmed time based on lapic_timer_advance_ns.  Without the delay, a
guest can see a timer interrupt arrive before the requested time when
KVM is using the hv_timer to emulate the guest's interrupt.

Fixes: c5ce8235 ("KVM: VMX: Optimize tscdeadline timer latency")
Cc: <stable@vger.kernel.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee66e453

KVM: lapic: Check for in-kernel LAPIC before deferencing apic pointer · b904cb8d

由 Sean Christopherson 提交于 4月 25, 2019

...to avoid dereferencing a null pointer when querying the per-vCPU
timer advance.

Fixes: 39497d76 ("KVM: lapic: Track lapic timer advance per vCPU")
Reported-by: syzbot+f7e65445a40d3e0e4ebf@syzkaller.appspotmail.com
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b904cb8d

19 4月, 2019 1 次提交

KVM: lapic: Convert guest TSC to host time domain if necessary · b6aa57c6

由 Sean Christopherson 提交于 4月 17, 2019

To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event.  In
the event that the adjustments are too aggressive, i.e. the timer fires
earlier than the guest expects, KVM busy waits immediately prior to
entering the guest.

Currently, KVM manually converts the delay from nanoseconds to clock
cycles.  But, the conversion is done in the guest's time domain, while
the delay occurs in the host's time domain.  This is perfectly ok when
the guest and host are using the same TSC ratio, but if the guest is
using a different ratio then the delay may not be accurate and could
wait too little or too long.

When the guest is not using the host's ratio, convert the delay from
guest clock cycles to host nanoseconds and use ndelay() instead of
__delay() to provide more accurate timing.  Because converting to
nanoseconds is relatively expensive, e.g. requires division and more
multiplication ops, continue using __delay() directly when guest and
host TSCs are running at the same ratio.

Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6aa57c6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功