提交 · 609e36d372ad9329269e4a1467bd35311893d1d6 · openeuler / raspberrypi-kernel

04 6月, 2015 3 次提交

KVM: x86: pass host_initiated to functions that read MSRs · 609e36d3

由 Paolo Bonzini 提交于 4月 08, 2015

SMBASE is only readable from SMM for the VCPU, but it must be always
accessible if userspace is accessing it.  Thus, all functions that
read MSRs are changed to accept a struct msr_data; the host_initiated
and index fields are pre-initialized, while the data field is filled
on return.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

609e36d3

KVM: x86: introduce num_emulated_msrs · 62ef68bb

由 Paolo Bonzini 提交于 5月 05, 2015

We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if
the host CPU does not support SMM virtualization.  Introduce the
logic to do that, and also move paravirt MSRs to emulated_msrs for
simplicity and to get rid of KVM_SAVE_MSRS_BEGIN.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

62ef68bb

KVM: x86: clear hidden CPU state at reset time · e69fab5d

由 Paolo Bonzini 提交于 6月 04, 2015

This was noticed by Radim while reviewing the implementation of
system management mode.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e69fab5d

29 5月, 2015 1 次提交

KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR · b7e60c5a

由 Marcelo Tosatti 提交于 5月 28, 2015

Initialize kvmclock base, on kvmclock system MSR write time,
so that the guest sees kvmclock counting from zero.

This matches baremetal behaviour when kvmclock in guest
sets sched clock stable.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
[Remove unnecessary comment. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b7e60c5a

28 5月, 2015 2 次提交

KVM: x86: add module parameter to disable periodic kvmclock sync · 630994b3

由 Marcelo Tosatti 提交于 5月 12, 2015

The periodic kvmclock sync can be an undesired source of latencies.

When running cyclictest on a guest, a latency spike is visible.
With kvmclock periodic sync disabled, the spike is gone.

Guests should use ntp which means the propagations of ntp corrections
from the host clock are unnecessary.

v2:
-> Make parameter read-only (Radim)
-> Return early on kvmclock_sync_fn (Andrew)
Reported-and-tested-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

630994b3

KVM: add "new" argument to kvm_arch_commit_memory_region · f36f3f28

由 Paolo Bonzini 提交于 5月 18, 2015

This lets the function access the new memory slot without going through
kvm_memslots and id_to_memslot.  It will simplify the code when more
than one address space will be supported.

Unfortunately, the "const"ness of the new argument must be casted
away in two places.  Fixing KVM to accept const struct kvm_memory_slot
pointers would require modifications in pretty much all architectures,
and is left for later.
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f36f3f28

26 5月, 2015 3 次提交

KVM: add memslots argument to kvm_arch_memslots_updated · 15f46015

由 Paolo Bonzini 提交于 5月 17, 2015

Prepare for the case of multiple address spaces.
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

15f46015

KVM: const-ify uses of struct kvm_userspace_memory_region · 09170a49

由 Paolo Bonzini 提交于 5月 18, 2015

Architecture-specific helpers are not supposed to muck with
struct kvm_userspace_memory_region contents.  Add const to
enforce this.

In order to eliminate the only write in __kvm_set_memory_region,
the cleaning of deleted slots is pulled up from update_memslots
to __kvm_set_memory_region.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09170a49

KVM: use kvm_memslots whenever possible · 9f6b8029

由 Paolo Bonzini 提交于 5月 17, 2015

kvm_memslots provides lockdep checking.  Use it consistently instead of
explicit dereferencing of kvm->memslots.
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9f6b8029

20 5月, 2015 5 次提交

kvm/fpu: Enable eager restore kvm FPU for MPX · c447e76b

由 Liang Li 提交于 5月 21, 2015

The MPX feature requires eager KVM FPU restore support. We have verified
that MPX cannot work correctly with the current lazy KVM FPU restore
mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
exposed to VM.
Signed-off-by: NYang Zhang <yang.z.zhang@intel.com>
Signed-off-by: NLiang Li <liang.z.li@intel.com>
[Also activate the FPU on AMD processors. - Paolo]
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c447e76b

kvm: fix crash in kvm_vcpu_reload_apic_access_page · e8fd5e9e

由 Andrea Arcangeli 提交于 5月 08, 2015

memslot->userfault_addr is set by the kernel with a mmap executed
from the kernel but the userland can still munmap it and lead to the
below oops after memslot->userfault_addr points to a host virtual
address that has no vma or mapping.

[  327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe
[  327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50
[  327.538474] PGD 1a01067 PUD 1a03067 PMD 0
[  327.538529] Oops: 0000 [#1] SMP
[  327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me
[  327.539488]  mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod
[  327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1
[  327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014
[  327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000
[  327.540184] RIP: 0010:[<ffffffff811a7b55>]  [<ffffffff811a7b55>] put_page+0x5/0x50
[  327.540261] RSP: 0018:ffff880317c5bcf8  EFLAGS: 00010246
[  327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000
[  327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe
[  327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000
[  327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe
[  327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50
[  327.540643] FS:  00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000
[  327.540717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0
[  327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  327.540974] Stack:
[  327.541008]  ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0
[  327.541093]  ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d
[  327.541177]  00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000
[  327.541261] Call Trace:
[  327.541321]  [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm]
[  327.543615]  [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm]
[  327.545918]  [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm]
[  327.548211]  [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm]
[  327.550500]  [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm]
[  327.552768]  [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50
[  327.555069]  [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30
[  327.557373]  [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80
[  327.559663]  [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530
[  327.561917]  [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0
[  327.564185]  [<ffffffff816de829>] system_call_fastpath+0x16/0x1b
[  327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8fd5e9e

KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed · d81135a5

由 Xiao Guangrong 提交于 5月 13, 2015

CR0.CD and CR0.NW are not used by shadow page table so that need
not adjust mmu if these two bit are changed
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d81135a5

KVM: MMU: fix MTRR update · efdfe536

由 Xiao Guangrong 提交于 5月 13, 2015

Currently, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that
   the original root shadow page will be reused

2) the cache type is set on the last spte, that means we should sync
   the last sptes when MTRR is changed

This patch fixs this issue by drop all the spte in the gfn range which
is being updated by MTRR
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

efdfe536

KVM: MMU: fix SMAP virtualization · edc90b7d

由 Xiao Guangrong 提交于 5月 11, 2015

KVM may turn a user page to a kernel page when kernel writes a readonly
user page if CR0.WP = 1. This shadow page entry will be reused after
SMAP is enabled so that kernel is allowed to access this user page

Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
once CR4.SMAP is updated
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

edc90b7d

11 5月, 2015 1 次提交

KVM: MMU: fix SMAP virtualization · 0be0226f

由 Xiao Guangrong 提交于 5月 11, 2015

KVM may turn a user page to a kernel page when kernel writes a readonly
user page if CR0.WP = 1. This shadow page entry will be reused after
SMAP is enabled so that kernel is allowed to access this user page

Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
once CR4.SMAP is updated
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0be0226f

07 5月, 2015 6 次提交

KVM: x86: fix initial PAT value · 74545705

由 Radim Krčmář 提交于 4月 27, 2015

PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT.
VMX used a wrong value (host's PAT) and while SVM used the right one,
it never got to arch.pat.

This is not an issue with QEMU as it will force the correct value.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

74545705

kvm,x86: load guest FPU context more eagerly · 653f52c3

由 Rik van Riel 提交于 4月 23, 2015

Currently KVM will clear the FPU bits in CR0.TS in the VMCS, and trap to
re-load them every time the guest accesses the FPU after a switch back into
the guest from the host.

This patch copies the x86 task switch semantics for FPU loading, with the
FPU loaded eagerly after first use if the system uses eager fpu mode,
or if the guest uses the FPU frequently.

In the latter case, after loading the FPU for 255 times, the fpu_counter
will roll over, and we will revert to loading the FPU on demand, until
it has been established that the guest is still actively using the FPU.

This mirrors the x86 task switch policy, which seems to work.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

653f52c3

kvm: x86: Extended struct kvm_lapic_irq with msi_redir_hint for MSI delivery · 93bbf0b8

由 James Sullivan 提交于 3月 18, 2015

Extended struct kvm_lapic_irq with bool msi_redir_hint, which will
be used to determine if the delivery of the MSI should target only
the lowest priority CPU in the logical group specified for delivery.
(In physical dest mode, the RH bit is not relevant). Initialized the value
of msi_redir_hint to true when RH=1 in kvm_set_msi_irq(), and initialized
to false in all other cases.

Added value of msi_redir_hint to a debug message dump of an IRQ in
apic_send_ipi().
Signed-off-by: NJames Sullivan <sullivan.james.f@gmail.com>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93bbf0b8

KVM: x86: INIT and reset sequences are different · d28bc9dd

由 Nadav Amit 提交于 4月 13, 2015

x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.

References (from Intel SDM):

"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]

[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]

"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]

"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d28bc9dd

KVM: x86: Support for disabling quirks · 90de4a18

由 Nadav Amit 提交于 4月 13, 2015

Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous
created in order to overcome QEMU issues. Those issue were mostly result of
invalid VM BIOS.  Currently there are two quirks that can be disabled:

1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot
2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot

These two issues are already resolved in recent releases of QEMU, and would
therefore be disabled by QEMU.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428879221-29996-1-git-send-email-namit@cs.technion.ac.il>
[Report capability from KVM_CHECK_EXTENSION too. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

90de4a18

KVM: arm/mips/x86/power use __kvm_guest_{enter|exit} · ccf73aaf

由 Christian Borntraeger 提交于 4月 30, 2015

Use __kvm_guest_{enter|exit} instead of kvm_guest_{enter|exit}
where interrupts are disabled.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ccf73aaf

27 4月, 2015 1 次提交

kvm: x86: fix kvmclock update protocol · 5dca0d91

由 Radim Krčmář 提交于 3月 25, 2015

The kvmclock spec says that the host will increment a version field to
an odd number, then update stuff, then increment it to an even number.
The host is buggy and doesn't do this, and the result is observable
when one vcpu reads another vcpu's kvmclock data.

There's no good way for a guest kernel to keep its vdso from reading
a different vcpu's kvmclock data, but we don't need to care about
changing VCPUs as long as we read a consistent data from kvmclock.
(VCPU can change outside of this loop too, so it doesn't matter if we
return a value not fit for this VCPU.)

Based on a patch by Radim Krčmář.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5dca0d91

15 4月, 2015 1 次提交

KVM: x86: Fix MSR_IA32_BNDCFGS in msrs_to_save · 9e9c3fe4

由 Nadav Amit 提交于 4月 12, 2015

kvm_init_msr_list is currently called before hardware_setup. As a result,
vmx_mpx_supported always returns false when kvm_init_msr_list checks whether to
save MSR_IA32_BNDCFGS.

Move kvm_init_msr_list after vmx_hardware_setup is called to fix this issue.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>

Message-Id: <1428864435-4732-1-git-send-email-namit@cs.technion.ac.il>
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9e9c3fe4

08 4月, 2015 7 次提交

kvm: mmu: lazy collapse small sptes into large sptes · 3ea3b7fa

由 Wanpeng Li 提交于 4月 03, 2015

Dirty logging tracks sptes in 4k granularity, meaning that large sptes
have to be split.  If live migration is successful, the guest in the
source machine will be destroyed and large sptes will be created in the
destination. However, the guest continues to run in the source machine
(for example if live migration fails), small sptes will remain around
and cause bad performance.

This patch introduce lazy collapsing of small sptes into large sptes.
The rmap will be scanned in ioctl context when dirty logging is stopped,
dropping those sptes which can be collapsed into a single large-page spte.
Later page faults will create the large-page sptes.
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Message-Id: <1428046825-6905-1-git-send-email-wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ea3b7fa

KVM: x86: Clear CR2 on VCPU reset · 1119022c

由 Nadav Amit 提交于 4月 02, 2015

CR2 is not cleared as it should after reset. See Intel SDM table named "IA-32
Processor States Following Power-up, Reset, or INIT".
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427933438-12782-5-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1119022c

KVM: x86: DR0-DR3 are not clear on reset · ae561ede

由 Nadav Amit 提交于 4月 02, 2015

DR0-DR3 are not cleared as they should during reset and when they are set from
userspace.  It appears to be caused by c77fb5fe ("KVM: x86: Allow the guest
to run with dirty debug registers").

Force their reload on these situations.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427933438-12782-4-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae561ede

KVM: x86: BSP in MSR_IA32_APICBASE is writable · 58d269d8

由 Nadav Amit 提交于 4月 02, 2015

After reset, the CPU can change the BSP, which will be used upon INIT.  Reset
should return the BSP which QEMU asked for, and therefore handled accordingly.

To quote: "If the MP protocol has completed and a BSP is chosen, subsequent
INITs (either to a specific processor or system wide) do not cause the MP
protocol to be repeated."
[Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions]
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58d269d8

KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu · 5a4f55cd

由 Eugene Korenevsky 提交于 3月 29, 2015

cpuid_maxphyaddr(), which performs lot of memory accesses is called
extensively across KVM, especially in nVMX code.

This patch adds a cached value of maxphyaddr to vcpu.arch to reduce the
pressure onto CPU cache and simplify the code of cpuid_maxphyaddr()
callers. The cached value is initialized in kvm_arch_vcpu_init() and
reloaded every time CPUID is updated by usermode. It is obvious that
these reloads occur infrequently.
Signed-off-by: NEugene Korenevsky <ekorenevsky@gmail.com>
Message-Id: <20150329205612.GA1223@gnote>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a4f55cd

KVM: x86: optimize delivery of TSC deadline timer interrupt · 9c8fd1ba

由 Paolo Bonzini 提交于 2月 06, 2015

The newly-added tracepoint shows the following results on
the tscdeadline_latency test:

qemu-kvm-8387 [002] 6425.558974: kvm_vcpu_wakeup: poll time 10407 ns
qemu-kvm-8387 [002] 6425.558984: kvm_vcpu_wakeup: poll time 0 ns
qemu-kvm-8387 [002] 6425.561242: kvm_vcpu_wakeup: poll time 10477 ns
qemu-kvm-8387 [002] 6425.561251: kvm_vcpu_wakeup: poll time 0 ns

and so on. This is because we need to go through kvm_vcpu_block again
after the timer IRQ is injected. Avoid it by polling once before
entering kvm_vcpu_block.

On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%)
from the latency of the TSC deadline timer.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c8fd1ba

KVM: x86: extract blocking logic from __vcpu_run · 362c698f

由 Paolo Bonzini 提交于 2月 06, 2015

Rename the old __vcpu_run to vcpu_run, and extract part of it to a new
function vcpu_block.

The next patch will add a new condition in vcpu_block, avoid extra
indentation.
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

362c698f

27 3月, 2015 2 次提交

time: Rename timekeeper::tkr to timekeeper::tkr_mono · 876e7881

由 Peter Zijlstra 提交于 3月 19, 2015

In preparation of adding another tkr field, rename this one to
tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
since the structure is not specific to CLOCK_MONOTONIC and the mono
name got added to the tk_read_base instance.

Lots of trivial churn.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

876e7881

KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks. · e32edf4f

由 Nikolay Nikolaev 提交于 3月 26, 2015

This is needed in e.g. ARM vGIC emulation, where the MMIO handling
depends on the VCPU that does the access.
Signed-off-by: NNikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

e32edf4f

18 3月, 2015 2 次提交

KVM: x86: For the symbols used locally only should be static type · 52eb5a6d

由 Xiubo Li 提交于 3月 13, 2015

This patch fix the following sparse warnings:

for arch/x86/kvm/x86.c:
warning: symbol 'emulator_read_write' was not declared. Should it be static?
warning: symbol 'emulator_write_emulated' was not declared. Should it be static?
warning: symbol 'emulator_get_dr' was not declared. Should it be static?
warning: symbol 'emulator_set_dr' was not declared. Should it be static?

for arch/x86/kvm/pmu.c:
warning: symbol 'fixed_pmc_events' was not declared. Should it be static?
Signed-off-by: NXiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

52eb5a6d

KVM: x86: Avoid using plain integer as NULL pointer warning · 795a149e

由 Xiubo Li 提交于 3月 13, 2015

This patch fix the following sparse warning:

for file arch/x86/kvm/x86.c:
warning: Using plain integer as NULL pointer
Signed-off-by: NXiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

795a149e

11 3月, 2015 2 次提交

kvm: move advertising of KVM_CAP_IRQFD to common code · dc9be0fa

由 Paolo Bonzini 提交于 3月 05, 2015

POWER supports irqfds but forgot to advertise them.  Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.

To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.
Reported-and-tested-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 297e2105Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dc9be0fa

kvm: x86: make kvm_emulate_* consistant · 5cb56059

由 Joel Schopp 提交于 3月 02, 2015

Currently kvm_emulate() skips the instruction but kvm_emulate_* sometimes
don't.  The end reult is the caller ends up doing the skip themselves.
Let's make them consistant.
Signed-off-by: NJoel Schopp <joel.schopp@amd.com>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5cb56059

10 3月, 2015 1 次提交

KVM: Get rid of kvm_kvfree() · 548ef284

由 Thomas Huth 提交于 2月 24, 2015

kvm_kvfree() provides exactly the same functionality as the
new common kvfree() function - so let's simply replace the
kvm function with the common function.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

548ef284

06 2月, 2015 1 次提交

kvm: add halt_poll_ns module parameter · f7819512

由 Paolo Bonzini 提交于 2月 04, 2015

This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f7819512

30 1月, 2015 1 次提交

KVM: VMX: Add PML support in VMX · 843e4330

由 Kai Huang 提交于 1月 28, 2015

This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
to allow user to enable/disable it manually.
Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

843e4330

29 1月, 2015 1 次提交

KVM: x86: Add new dirty logging kvm_x86_ops for PML · 88178fd4

由 Kai Huang 提交于 1月 28, 2015

This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty
logging for particular memory slot, and to flush potentially logged dirty GPAs
before reporting slot->dirty_bitmap to userspace.

kvm x86 common code calls these hooks when they are available so PML logic can
be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL
there.
Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88178fd4