提交 · be7b263ea925324e54e48c3558d4719be5374053 · openanolis / cloud-kernel

10 11月, 2015 13 次提交

KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC · be7b263e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be7b263e

KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded · ff2c3a18

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff2c3a18

KVM: VMX: Enable and initialize VMX TSC scaling · 64903d61

由 Haozhong Zhang 提交于 10月 20, 2015

This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64903d61

KVM: x86: Use the correct vcpu's TSC rate to compute time scale · 27cca94e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27cca94e

KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc() · 4ba76538

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ba76538

KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() · 58ea6767

由 Haozhong Zhang 提交于 10月 20, 2015

For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58ea6767

KVM: x86: Replace call-back compute_tsc_offset() with a common function · 07c1419a

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07c1419a

KVM: x86: Replace call-back set_tsc_khz() with a common function · 381d585c

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

381d585c

KVM: x86: Add a common TSC scaling function · 35181e86

由 Haozhong Zhang 提交于 10月 20, 2015

VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr.  Remove
 BUG_ON.  - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35181e86

KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch · ad721883

由 Haozhong Zhang 提交于 10月 20, 2015

This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad721883

KVM: x86: Collect information for setting TSC scaling ratio · bc9b961b

由 Haozhong Zhang 提交于 10月 20, 2015

The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bc9b961b

KVM: x86: declare a few variables as __read_mostly · 893590c7

由 Paolo Bonzini 提交于 11月 06, 2015

These include module parameters and variables that are set by
kvm_x86_ops->hardware_setup.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

893590c7

KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common · 450869d6

由 Paolo Bonzini 提交于 11月 04, 2015

They are exactly the same, except that handle_mmio_page_fault
has an unused argument and a call to WARN_ON.  Remove the unused
argument from the callers, and move the warning to (the former)
handle_mmio_page_fault_common.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

450869d6

05 11月, 2015 1 次提交

KVM: VMX: Fix commit which broke PML · a3eaa864

由 Kai Huang 提交于 11月 04, 2015

I found PML was broken since below commit:

	commit feda805f
	Author: Xiao Guangrong <guangrong.xiao@linux.intel.com>
	Date:   Wed Sep 9 14:05:55 2015 +0800

	KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update

	Unify the update in vmx_cpuid_update()
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
	[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

The reason is in above commit vmx_cpuid_update calls vmx_secondary_exec_control,
in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally (as
PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called after
vcpu is created, PML will be disabled unexpectedly while log-dirty code still
thinks PML is used.

Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control
only when PML is not supported or not enabled (!enable_pml). This is more
reasonable as PML is currently either always enabled or disabled. With this
explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not
needed so also rename vmx_enable{disable}_pml to vmx_create{destroy}_pml_buffer.

Fixes: feda805fSigned-off-by: NKai Huang <kai.huang@linux.intel.com>
[While at it, change a wrong ASSERT to an "if".  The condition can happen
 if creating the VCPU fails with ENOMEM. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a3eaa864

04 11月, 2015 10 次提交

KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0() · 879ae188

由 Laszlo Ersek 提交于 11月 04, 2015

Commit b18d5431 ("KVM: x86: fix CR0.CD virtualization") was
technically correct, but it broke OVMF guests by slowing down various
parts of the firmware.

Commit fb279950 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") quirked the
first function modified by b18d5431, vmx_get_mt_mask(), for OVMF's
sake. This restored the speed of the OVMF code that runs before
PlatformPei (including the memory intensive LZMA decompression in SEC).

This patch extends the quirk to the second function modified by
b18d5431, kvm_set_cr0(). It eliminates the intrusive slowdown that
hits the EFI_MP_SERVICES_PROTOCOL implementation of edk2's
UefiCpuPkg/CpuDxe -- which is built into OVMF --, when CpuDxe starts up
all APs at once for initialization, in order to count them.

We also carry over the kvm_arch_has_noncoherent_dma() sub-condition from
the other half of the original commit b18d5431.

Fixes: b18d5431
Cc: stable@vger.kernel.org
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: NJanusz Mocek <januszmk6@gmail.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>#
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

879ae188

KVM: x86: allow RSM from 64-bit mode · 89651a3d

由 Paolo Bonzini 提交于 11月 03, 2015

The SDM says that exiting system management mode from 64-bit mode
is invalid, but that would be too good to be true.  But actually,
most of the code is already there to support exiting from compat
mode (EFER.LME=1, EFER.LMA=0).  Getting all the way from 64-bit
mode to real mode only requires clearing CS.L and CR4.PCIDE.

Cc: stable@vger.kernel.org
Fixes: 660a5d51Tested-by: NLaszlo Ersek <lersek@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

89651a3d

KVM: VMX: fix SMEP and SMAP without EPT · 656ec4a4

由 Radim Krčmář 提交于 11月 02, 2015

The comment in code had it mostly right, but we enable paging for
emulated real mode regardless of EPT.

Without EPT (which implies emulated real mode), secondary VCPUs won't
start unless we disable SM[AE]P when the guest doesn't use paging.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

656ec4a4

KVM: x86: move kvm_set_irq_inatomic to legacy device assignment · 8a22f234

由 Paolo Bonzini 提交于 10月 28, 2015

The function is not used outside device assignment, and
kvm_arch_set_irq_inatomic has a different prototype.  Move it here and
make it static to avoid confusion.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a22f234

KVM: device assignment: remove pointless #ifdefs · 76954056

由 Paolo Bonzini 提交于 10月 28, 2015

The symbols are always defined.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

76954056

KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic · b97e6de9

由 Paolo Bonzini 提交于 10月 28, 2015

We do not want to do too much work in atomic context, in particular
not walking all the VCPUs of the virtual machine.  So we want
to distinguish the architecture-specific injection function for irqfd
from kvm_set_msi.  Since it's still empty, reuse the newly added
kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b97e6de9

KVM: x86: zero apic_arb_prio on reset · 0669a510

由 Radim Krčmář 提交于 10月 30, 2015

BSP doesn't get INIT so its apic_arb_prio isn't zeroed after reboot.
BSP won't get lowest priority interrupts until other VCPUs get enough
interrupts to match their pre-reboot apic_arb_prio.

That behavior doesn't fit into KVM's round-robin-like interpretation of
lowest priority delivery ... userspace should KVM_SET_LAPIC on reset, so
just zero apic_arb_prio there.
Reported-by: NYuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0669a510

KVM: x86: handle SMBASE as physical address in RSM · f40606b1

由 Radim Krčmář 提交于 10月 30, 2015

GET_SMSTATE depends on real mode to ensure that smbase+offset is treated
as a physical address, which has already caused a bug after shuffling
the code.  Enforce physical addressing.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reported-by: NLaszlo Ersek <lersek@redhat.com>
Tested-by: NLaszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f40606b1

KVM: x86: add read_phys to x86_emulate_ops · 7a036a6f

由 Radim Krčmář 提交于 10月 30, 2015

We want to read the physical memory when emulating RSM.

X86EMUL_IO_NEEDED is returned on all errors for consistency with other
helpers.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Tested-by: NLaszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a036a6f

KVM: x86: removing unused variable · 2da29bcc

由 Saurabh Sengar 提交于 10月 30, 2015

removing unused variables, found by coccinelle
Signed-off-by: NSaurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2da29bcc

19 10月, 2015 2 次提交

KVM: x86: MMU: Initialize force_pt_level before calling mapping_level() · 8c85ac1c

由 Takuya Yoshikawa 提交于 10月 19, 2015

Commit fd136902 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap()
call in mapping_level()") forgot to initialize force_pt_level to false
in FNAME(page_fault)() before calling mapping_level() like
nonpaging_map() does.  This can sometimes result in forcing page table
level mapping unnecessarily.

Fix this and move the first *force_pt_level check in mapping_level()
before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that
the variable must be initialized before mapping_level() gets called.

This change can also avoid calling kvm_vcpu_gfn_to_memslot() when
!check_hugepage_cache_consistency() check in tdp_page_fault() forces
page table level mapping.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8c85ac1c

kvm: x86: zero EFER on INIT · 5690891b

由 Paolo Bonzini 提交于 10月 19, 2015

Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
without clearing EFER.LME first (which it should not know about).
Yang Zhang from Intel confirmed that the manual is wrong and EFER is
cleared to zero on INIT.

Fixes: d28bc9dd
Cc: stable@vger.kernel.org
Cc: Yang Z Zhang <yang.z.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5690891b

16 10月, 2015 11 次提交

KVM: x86: move steal time initialization to vcpu entry time · 7cae2bed

由 Marcelo Tosatti 提交于 10月 14, 2015

As reported at https://bugs.launchpad.net/qemu/+bug/1494350,
it is possible to have vcpu->arch.st.last_steal initialized
from a thread other than vcpu thread, say the iothread, via
KVM_SET_MSRS.

Which can cause an overflow later (when subtracting from vcpu threads
sched_info.run_delay).

To avoid that, move steal time accumulation to vcpu entry time,
before copying steal time data to guest.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cae2bed

KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level() · 5225fdf8

由 Takuya Yoshikawa 提交于 10月 16, 2015

Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be
avoided since getting a slot by binary search may not be negligible,
especially for virtual machines with many memory slots.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5225fdf8

KVM: x86: MMU: Remove mapping_level_dirty_bitmap() · d8aacf5d

由 Takuya Yoshikawa 提交于 10月 16, 2015

Now that it has only one caller, and its name is not so helpful for
readers, remove it.  The new memslot_valid_for_gpte() function
makes it possible to share the common code between
gfn_to_memslot_dirty_bitmap() and mapping_level().
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d8aacf5d

KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level() · fd136902

由 Takuya Yoshikawa 提交于 10月 16, 2015

This is necessary to eliminate an extra memory slot search later.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd136902

KVM: x86: MMU: Simplify force_pt_level calculation code in FNAME(page_fault)() · 5ed5c5c8

由 Takuya Yoshikawa 提交于 10月 16, 2015

As a bonus, an extra memory slot search can be eliminated when
is_self_change_mapping is true.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ed5c5c8

KVM: x86: MMU: Make force_pt_level bool · cd1872f0

由 Takuya Yoshikawa 提交于 10月 16, 2015

This will be passed to a function later.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cd1872f0

kvm: svm: Only propagate next_rip when guest supports it · 6092d3d3

由 Joerg Roedel 提交于 10月 14, 2015

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das <bsd@redhat.com>
Cc: Dirk Mueller <dmueller@suse.com>
Tested-By: NDirk Mueller <dmueller@suse.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6092d3d3

KVM: x86: manually unroll bad_mt_xwr loop · 951f9fd7

由 Paolo Bonzini 提交于 9月 23, 2015

The loop is computing one of two constants, it can be simpler to write
everything inline.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

951f9fd7

KVM: nVMX: expose VPID capability to L1 · 089d7b6e

由 Wanpeng Li 提交于 10月 13, 2015

Expose VPID capability to L1. For nested guests, we don't do anything
specific for single context invalidation. Hence, only advertise support
for global context invalidation. The major benefit of nested VPID comes
from having separate vpids when switching between L1 and L2, and also
when L2's vCPUs not sched in/out on L1.
Reviewed-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

089d7b6e

KVM: nVMX: nested VPID emulation · 5c614b35

由 Wanpeng Li 提交于 10月 13, 2015

VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
kernel    Linux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.60000 2.88000  nested VPID
kernel    Linux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  vanilla
Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c614b35

KVM: nVMX: emulate the INVVPID instruction · 99b83ac8

由 Wanpeng Li 提交于 10月 13, 2015

Add the INVVPID instruction emulation.
Reviewed-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

99b83ac8

14 10月, 2015 3 次提交

KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid · dd5f5341

由 Wanpeng Li 提交于 9月 23, 2015

Introduce __vmx_flush_tlb() to handle specific vpid.
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dd5f5341

KVM: VMX: adjust interface to allocate/free_vpid · 991e7a0e

由 Wanpeng Li 提交于 9月 16, 2015

Adjust allocate/free_vid so that they can be reused for the nested vpid.
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

991e7a0e

KVM: x86: don't notify userspace IOAPIC on edge EOI · 13db7734

由 Radim Krčmář 提交于 10月 08, 2015

On real hardware, edge-triggered interrupts don't set a bit in TMR,
which means that IOAPIC isn't notified on EOI.  Do the same here.

Staying in guest/kernel mode after edge EOI is what we want for most
devices.  If some bugs could be nicely worked around with edge EOI
notifications, we should invest in a better interface.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13db7734

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功