提交 · 85c9aae9ac8b228f2134b56d4fc743afc446947a · gsplhtlxg / clone-Linux

23 11月, 2019 5 次提交

kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints · 85c9aae9

由 Jim Mattson 提交于 11月 22, 2019

Commit 37e4c997 ("KVM: VMX: validate individual bits of guest
MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
new constraints on the data values that kvm would accept for the guest
MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
opt-in via a new KVM capability, but they were applied
indiscriminately, breaking at least one existing hypervisor.

Relax the constraints to allow either or both of
FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
enabled. This change is sufficient to fix the aforementioned breakage.

Fixes: 37e4c997 ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

85c9aae9

KVM: x86: Grab KVM's srcu lock when setting nested state · ad5996d9

由 Sean Christopherson 提交于 11月 22, 2019

Acquire kvm->srcu for the duration of ->set_nested_state() to fix a bug
where nVMX derefences ->memslots without holding ->srcu or ->slots_lock.

The other half of nested migration, ->get_nested_state(), does not need
to acquire ->srcu as it is a purely a dump of internal KVM (and CPU)
state to userspace.

Detected as an RCU lockdep splat that is 100% reproducible by running
KVM's state_test selftest with CONFIG_PROVE_LOCKING=y.  Note that the
failing function, kvm_is_visible_gfn(), is only checking the validity of
a gfn, it's not actually accessing guest memory (which is more or less
unsupported during vmx_set_nested_state() due to incorrect MMU state),
i.e. vmx_set_nested_state() itself isn't fundamentally broken.  In any
case, setting nested state isn't a fast path so there's no reason to go
out of our way to avoid taking ->srcu.

  =============================
  WARNING: suspicious RCU usage
  5.4.0-rc7+ #94 Not tainted
  -----------------------------
  include/linux/kvm_host.h:626 suspicious rcu_dereference_check() usage!

               other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by evmcs_test/10939:
   #0: ffff88826ffcb800 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x630 [kvm]

  stack backtrace:
  CPU: 1 PID: 10939 Comm: evmcs_test Not tainted 5.4.0-rc7+ #94
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack+0x68/0x9b
   kvm_is_visible_gfn+0x179/0x180 [kvm]
   mmu_check_root+0x11/0x30 [kvm]
   fast_cr3_switch+0x40/0x120 [kvm]
   kvm_mmu_new_cr3+0x34/0x60 [kvm]
   nested_vmx_load_cr3+0xbd/0x1f0 [kvm_intel]
   nested_vmx_enter_non_root_mode+0xab8/0x1d60 [kvm_intel]
   vmx_set_nested_state+0x256/0x340 [kvm_intel]
   kvm_arch_vcpu_ioctl+0x491/0x11a0 [kvm]
   kvm_vcpu_ioctl+0xde/0x630 [kvm]
   do_vfs_ioctl+0xa2/0x6c0
   ksys_ioctl+0x66/0x70
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x54/0x200
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7f59a2b95f47

Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad5996d9

KVM: x86: Open code shared_msr_update() in its only caller · 05c19c2f

由 Sean Christopherson 提交于 11月 22, 2019

Fold shared_msr_update() into its sole user to eliminate its pointless
bounds check, its godawful printk, its misleading comment (it's called
under a global lock), and its woefully inaccurate name.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

05c19c2f

KVM: Fix jump label out_free_* in kvm_init() · faf0be22

由 Miaohe Lin 提交于 11月 23, 2019

The jump label out_free_1 and out_free_2 deal with
the same stuff, so git rid of one and rename the
label out_free_0a to retain the label name order.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

faf0be22

KVM: x86: Remove a spurious export of a static function · 24885d1d

由 Sean Christopherson 提交于 11月 22, 2019

A recent change inadvertently exported a static function, which results
in modpost throwing a warning.  Fix it.

Fixes: cbbaa272 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

24885d1d

21 11月, 2019 11 次提交

KVM: x86: create mmu/ subdirectory · c50d8ae3

由 Paolo Bonzini 提交于 11月 21, 2019

Preparatory work for shattering mmu.c into multiple files. Besides making it easier
to follow, this will also make it possible to write unit tests for various parts.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c50d8ae3

KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page · 0155b2b9

由 Liran Alon 提交于 11月 20, 2019

According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
of the INVVPID/INVEPT Instruction, the hypervisor needs to execute
INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
either: "Virtualize APIC accesses" VM-execution control was changed
from 0 to 1, OR the value of apic_access_page was changed.

In the nested case, the burden falls on L1, unless L0 enables EPT in
vmcs02 but L1 enables neither EPT nor VPID in vmcs12.  For this reason
prepare_vmcs02() and load_vmcs12_host_state() have special code to
request a TLB flush in case L1 does not use EPT but it uses
"virtualize APIC accesses".

This special case however is not necessary. On a nested vmentry the
physical TLB will already be flushed except if all the following apply:

* L0 uses VPID

* L1 uses VPID

* L0 can guarantee TLB entries populated while running L1 are tagged
differently than TLB entries populated while running L2.

If the first condition is false, the processor will flush the TLB
on vmentry to L2.  If the second or third condition are false,
prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.  However, even
if both are true, no extra TLB flush is needed to handle the APIC
access page:

* if L1 doesn't use VPID, the second condition doesn't hold and the
TLB will be flushed anyway.

* if L1 uses VPID, it has to flush the TLB itself with INVVPID and
section 28.3.3.3 doesn't apply to L0.

* even INVEPT is not needed because, if L0 uses EPT, it uses different
EPTP when running L2 than L1 (because guest_mode is part of mmu-role).
In this case SDM section 28.3.3.4 doesn't apply.

Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
one could note that L0 won't flush TLB only in cases where SDM sections
28.3.3.3 and 28.3.3.4 don't apply.  In particular, if L0 uses different
VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section
28.3.3.3 doesn't apply.

Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().

Side-note: This patch can be viewed as removing parts of commit
fb6c8198 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
that is not relevant anymore since commit
1313cc2b ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
then L0 will use same EPTP for both L0 and L1. Which indeed required
L0 to execute INVEPT before entering L2 guest. This assumption is
not true anymore since when guest_mode was added to mmu-role.
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0155b2b9

KVM: x86: remove set but not used variable 'called' · db5a95ec

由 Mao Wenan 提交于 11月 19, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask:
arch/x86/kvm/x86.c:7911:7: warning: variable called set but not
used [-Wunused-but-set-variable]

It is not used since commit 7ee30bc1 ("KVM: x86: deliver KVM
IOAPIC scan request to target vCPUs")
Signed-off-by: NMao Wenan <maowenan@huawei.com>
Fixes: 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db5a95ec

KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning · b11494bc

由 Liran Alon 提交于 11月 21, 2019

vmcs->apic_access_page is simply a token that the hypervisor puts into
the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers
APIC-access VMExit or APIC virtualization logic whenever a CPU running
in VMX non-root mode read/write from/to this PFN.

As every write either triggers an APIC-access VMExit or write is
performed on vmcs->virtual_apic_page, the PFN pointed to by
vmcs->apic_access_page should never actually be touched by CPU.

Therefore, there is no need to mark vmcs02->apic_access_page as dirty
after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation.
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b11494bc

P
Merge branch 'kvm-tsx-ctrl' into HEAD · 46f4f0aa
由 Paolo Bonzini 提交于 11月 21, 2019
```
Conflicts:
	arch/x86/kvm/vmx/vmx.c
```
46f4f0aa

KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it · b07a5c53

由 Paolo Bonzini 提交于 11月 18, 2019

If X86_FEATURE_RTM is disabled, the guest should not be able to access
MSR_IA32_TSX_CTRL.  We can therefore use it in KVM to force all
transactions from the guest to abort.
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b07a5c53

KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality · c11f83e0

由 Paolo Bonzini 提交于 11月 18, 2019

The current guest mitigation of TAA is both too heavy and not really
sufficient.  It is too heavy because it will cause some affected CPUs
(those that have MDS_NO but lack TAA_NO) to fall back to VERW and
get the corresponding slowdown.  It is not really sufficient because
it will cause the MDS_NO bit to disappear upon microcode update, so
that VMs started before the microcode update will not be runnable
anymore afterwards, even with tsx=on.

Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
the guest and let it run without the VERW mitigation.  Even though
MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
it on every vmentry, we can use the shared MSR functionality because
the host kernel need not protect itself from TSX-based side-channels.
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c11f83e0

KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID · edef5c36

由 Paolo Bonzini 提交于 11月 18, 2019

Because KVM always emulates CPUID, the CPUID clear bit
(bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually"
by the hypervisor when performing said emulation.

Right now neither kvm-intel.ko nor kvm-amd.ko implement
MSR_IA32_TSX_CTRL but this will change in the next patch.
Reviewed-by: NJim Mattson <jmattson@google.com>
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

edef5c36

KVM: x86: do not modify masked bits of shared MSRs · de1fca5d

由 Paolo Bonzini 提交于 11月 18, 2019

"Shared MSRs" are guest MSRs that are written to the host MSRs but
keep their value until the next return to userspace.  They support
a mask, so that some bits keep the host value, but this mask is
only used to skip an unnecessary MSR write and the value written
to the MSR is always the guest MSR.

Fix this and, while at it, do not update smsr->values[slot].curr if
for whatever reason the wrmsr fails.  This should only happen due to
reserved bits, so the value written to smsr->values[slot].curr
will not match when the user-return notifier and the host value will
always be restored.  However, it is untidy and in rare cases this
can actually avoid spurious WRMSRs on return to userspace.

Cc: stable@vger.kernel.org
Reviewed-by: NJim Mattson <jmattson@google.com>
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

de1fca5d

KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES · cbbaa272

由 Paolo Bonzini 提交于 11月 18, 2019

KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
to the guests.  It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
!RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
hidden (it actually was), yet the value says that TSX is not vulnerable
to microarchitectural data sampling.  Fix both.

Cc: stable@vger.kernel.org
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cbbaa272

Merge tag 'kvmarm-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 14edff88

由 Paolo Bonzini 提交于 11月 21, 2019

KVM/arm updates for Linux 5.5:

- Allow non-ISV data aborts to be reported to userspace
- Allow injection of data aborts from userspace
- Expose stolen time to guests
- GICv4 performance improvements
- vgic ITS emulation fixes
- Simplify FWB handling
- Enable halt pool counters
- Make the emulated timer PREEMPT_RT compliant

Conflicts:
	include/uapi/linux/kvm.h

14edff88

20 11月, 2019 5 次提交

KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT · 992edeae

由 Liran Alon 提交于 11月 20, 2019

Since commit 1313cc2b ("kvm: mmu: Add guest_mode to kvm_mmu_page_role"),
guest_mode was added to mmu-role and therefore if L0 use EPT, it will
always run L1 and L2 with different EPTP. i.e. EPTP01!=EPTP02.

Because TLB entries are tagged with EP4TA, KVM can assume
TLB entries populated while running L2 are tagged differently
than TLB entries populated while running L1.

Therefore, update nested_has_guest_tlb_tag() to consider if
L0 use EPT instead of if L1 use EPT.
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

992edeae

KVM: x86: Unexport kvm_vcpu_reload_apic_access_page() · 5637f60b

由 Liran Alon 提交于 11月 18, 2019

The function is only used in kvm.ko module.
Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5637f60b

KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1 · c79eb775

由 Chenyi Qiang 提交于 11月 19, 2019

When L1 guest uses 5-level paging, it fails vm-entry to L2 due to
invalid host-state. It needs to add CR4_LA57 bit to nested CR4_FIXED1
MSR.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c79eb775

KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization · cc877670

由 Liran Alon 提交于 11月 18, 2019

Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc877670

KVM: x86: Zero the IOAPIC scan request dest vCPUs bitmap · 9a2ae9f6

由 Nitesh Narayan Lal 提交于 11月 20, 2019

Not zeroing the bitmap used for identifying the destination vCPUs for an
IOAPIC scan request in fixed delivery mode could lead to waking up unwanted
vCPUs. This patch zeroes the vCPU bitmap before passing it to
kvm_bitmap_or_dest_vcpus(), which is responsible for setting the bitmap
with the bits corresponding to the destination vCPUs.

Fixes: 7ee30bc1("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: NNitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a2ae9f6

18 11月, 2019 1 次提交

Merge tag 'kvm-s390-next-5.5-1' of... · fe289ebb

由 Paolo Bonzini 提交于 11月 18, 2019

Merge tag 'kvm-s390-next-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: small fixes and enhancements

- selftest improvements
- yield improvements
- cleanups

fe289ebb

15 11月, 2019 18 次提交

KVM: x86: deliver KVM IOAPIC scan request to target vCPUs · 7ee30bc1

由 Nitesh Narayan Lal 提交于 11月 07, 2019

In IOAPIC fixed delivery mode instead of flushing the scan
requests to all vCPUs, we should only send the requests to
vCPUs specified within the destination field.

This patch introduces kvm_get_dest_vcpus_mask() API which
retrieves an array of target vCPUs by using
kvm_apic_map_get_dest_lapic() and then based on the
vcpus_idx, it sets the bit in a bitmap. However, if the above
fails kvm_get_dest_vcpus_mask() finds the target vCPUs by
traversing all available vCPUs. Followed by setting the
bits in the bitmap.

If we had different vCPUs in the previous request for the
same redirection table entry then bits corresponding to
these vCPUs are also set. This to done to keep
ioapic_handled_vectors synchronized.

This bitmap is then eventually passed on to
kvm_make_vcpus_request_mask() to generate a masked request
only for the target vCPUs.

This would enable us to reduce the latency overhead on isolated
vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN.
Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NNitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ee30bc1

KVM: remember position in kvm->vcpus array · 8750e72a

由 Radim Krčmář 提交于 11月 07, 2019

Fetching an index for any vcpu in kvm->vcpus array by traversing
the entire array everytime is costly.
This patch remembers the position of each vcpu in kvm->vcpus array
by storing it in vcpus_idx under kvm_vcpu structure.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NNitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8750e72a

KVM: nVMX: Add support for capturing highest observable L2 TSC · 662f1d1d

由 Aaron Lewis 提交于 11月 07, 2019

The L1 hypervisor may include the IA32_TIME_STAMP_COUNTER MSR in the
vmcs12 MSR VM-exit MSR-store area as a way of determining the highest
TSC value that might have been observed by L2 prior to VM-exit. The
current implementation does not capture a very tight bound on this
value.  To tighten the bound, add the IA32_TIME_STAMP_COUNTER MSR to the
vmcs02 VM-exit MSR-store area whenever it appears in the vmcs12 VM-exit
MSR-store area.  When L0 processes the vmcs12 VM-exit MSR-store area
during the emulation of an L2->L1 VM-exit, special-case the
IA32_TIME_STAMP_COUNTER MSR, using the value stored in the vmcs02
VM-exit MSR-store area to derive the value to be stored in the vmcs12
VM-exit MSR-store area.
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

662f1d1d

kvm: vmx: Rename function find_msr() to vmx_find_msr_index() · ef0fbcac

由 Aaron Lewis 提交于 11月 07, 2019

Rename function find_msr() to vmx_find_msr_index() in preparation for an
upcoming patch where we export it and use it in nested.c.
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef0fbcac

kvm: vmx: Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRS · 7cfe0526

由 Aaron Lewis 提交于 11月 07, 2019

Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRS.  This needs to be done
due to the addition of the MSR-autostore area that will be added in a
future patch.  After that the name AUTOLOAD will no longer make sense.
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cfe0526

kvm: nested: Introduce read_and_check_msr_entry() · 365d3d55

由 Aaron Lewis 提交于 11月 07, 2019

Add the function read_and_check_msr_entry() which just pulls some code
out of nested_vmx_store_msr().  This will be useful as reusable code in
upcoming patches.
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

365d3d55

KVM: nVMX: mark functions in the header as "static inline" · d4069dbe

由 Paolo Bonzini 提交于 11月 15, 2019

Correct a small inaccuracy in the shattering of vmx.c, which becomes
visible now that pmu_intel.c includes nested.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d4069dbe

KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control · 03a8871a

由 Oliver Upton 提交于 11月 13, 2019

The "load IA32_PERF_GLOBAL_CTRL" bit for VM-entry and VM-exit should
only be exposed to the guest if IA32_PERF_GLOBAL_CTRL is a valid MSR.
Create a new helper to allow pmu_refresh() to update the VM-Entry and
VM-Exit controls to ensure PMU values are initialized when performing
the is_valid_msr() check.
Suggested-by: NJim Mattson <jmattson@google.com>
Co-developed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03a8871a

KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-Entry · 71f73470

由 Oliver Upton 提交于 11月 13, 2019

Add condition to prepare_vmcs02 which loads IA32_PERF_GLOBAL_CTRL on
VM-entry if the "load IA32_PERF_GLOBAL_CTRL" bit on the VM-entry control
is set. Use SET_MSR_OR_WARN() rather than directly writing to the field
to avoid overwrite by atomic_switch_perf_msrs().
Suggested-by: NJim Mattson <jmattson@google.com>
Co-developed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71f73470

KVM: nVMX: Use kvm_set_msr to load IA32_PERF_GLOBAL_CTRL on VM-Exit · 458151f6

由 Oliver Upton 提交于 11月 13, 2019

The existing implementation for loading the IA32_PERF_GLOBAL_CTRL MSR
on VM-exit was incorrect, as the next call to atomic_switch_perf_msrs()
could cause this value to be overwritten. Instead, call kvm_set_msr()
which will allow atomic_switch_perf_msrs() to correctly set the values.

Define a macro, SET_MSR_OR_WARN(), to set the MSR with kvm_set_msr()
and WARN on failure.
Suggested-by: NJim Mattson <jmattson@google.com>
Co-developed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

458151f6

KVM: nVMX: Check HOST_IA32_PERF_GLOBAL_CTRL on VM-Entry · c547cb6f

由 Oliver Upton 提交于 11月 13, 2019

Add a consistency check on nested vm-entry for host's
IA32_PERF_GLOBAL_CTRL from vmcs12. Per Intel's SDM Vol 3 26.2.2:

  If the "load IA32_PERF_GLOBAL_CTRL"
  VM-exit control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL
  MSR must be 0 in the field for that register"
Suggested-by: NJim Mattson <jmattson@google.com>
Co-developed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c547cb6f

KVM: nVMX: Check GUEST_IA32_PERF_GLOBAL_CTRL on VM-Entry · bfc6ad6a

由 Oliver Upton 提交于 11月 13, 2019

Add condition to nested_vmx_check_guest_state() to check the validity of
GUEST_IA32_PERF_GLOBAL_CTRL. Per Intel's SDM Vol 3 26.3.1.1:

  If the "load IA32_PERF_GLOBAL_CTRL" VM-entry control is 1, bits
  reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that
  register.
Suggested-by: NJim Mattson <jmattson@google.com>
Co-developed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bfc6ad6a

KVM: VMX: Add helper to check reserved bits in IA32_PERF_GLOBAL_CTRL · 9477f444

由 Oliver Upton 提交于 11月 13, 2019

Create a helper function to check the validity of a proposed value for
IA32_PERF_GLOBAL_CTRL from the existing check in intel_pmu_set_msr().

Per Intel's SDM, the reserved bits in IA32_PERF_GLOBAL_CTRL must be
cleared for the corresponding host/guest state fields.
Suggested-by: NJim Mattson <jmattson@google.com>
Co-developed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9477f444

selftests: kvm: Simplify loop in kvm_create_max_vcpus test · f245eead

由 Wainer dos Santos Moschetta 提交于 11月 12, 2019

On kvm_create_max_vcpus test remove unneeded local
variable in the loop that add vcpus to the VM.
Signed-off-by: NWainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f245eead

KVM: x86: Optimization: Requst TLB flush in fast_cr3_switch() instead of do it directly · 1924242b

由 Liran Alon 提交于 11月 12, 2019

When KVM emulates a nested VMEntry (L1->L2 VMEntry), it switches mmu root
page. If nEPT is used, this will happen from
kvm_init_shadow_ept_mmu()->__kvm_mmu_new_cr3() and otherwise it will
happpen from nested_vmx_load_cr3()->kvm_mmu_new_cr3(). Either case,
__kvm_mmu_new_cr3() will use fast_cr3_switch() in attempt to switch to a
previously cached root page.

In case fast_cr3_switch() finds a matching cached root page, it will
set it in mmu->root_hpa and request KVM_REQ_LOAD_CR3 such that on
next entry to guest, KVM will set root HPA in appropriate hardware
fields (e.g. vmcs->eptp). In addition, fast_cr3_switch() calls
kvm_x86_ops->tlb_flush() in order to flush TLB as MMU root page
was replaced.

This works as mmu->root_hpa, which vmx_flush_tlb() use, was
already replaced in cached_root_available(). However, this may
result in unnecessary INVEPT execution because a KVM_REQ_TLB_FLUSH
may have already been requested. For example, by prepare_vmcs02()
in case L1 don't use VPID.

Therefore, change fast_cr3_switch() to just request TLB flush on
next entry to guest.
Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1924242b

KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC · b35e5548

由 Like Xu 提交于 10月 27, 2019

Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.

If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu
sched time slice, and its independent enable bit of the vPMC isn't set,
we can predict that the guest has finished the use of this vPMC, and then
do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events
in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in.

This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't changed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.
Suggested-by: NWei Wang <wei.w.wang@intel.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b35e5548

KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter · a6da0d77

由 Like Xu 提交于 10月 27, 2019

The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a heavyweight and high-frequency operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler. It limits the use of vPMUs in the guest.

When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.

For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its current config AND a new sample period based
on pmc->counter is accepted by host perf interface, the current event could
be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.

It's light-weight to call pmc_pause_counter (disable, read and reset event)
and pmc_resume_counter (recalibrate period and re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 current_config'
field is added to save the last original programed config for each vPMC.

Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.
Suggested-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6da0d77

KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback · c900c156

由 Like Xu 提交于 10月 27, 2019

Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*,
and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) ||
.is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c900c156