提交 · 3c9fa24ca7c9c47605672916491f79e8ccacb9e6 · openeuler / Kernel

12 6月, 2018 4 次提交

kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access · 3c9fa24c

由 Paolo Bonzini 提交于 6月 06, 2018

The functions that were used in the emulation of fxrstor, fxsave, sgdt and
sidt were originally meant for task switching, and as such they did not
check privilege levels. This is very bad when the same functions are used
in the emulation of unprivileged instructions. This is CVE-2018-10853.

The obvious fix is to add a new argument to ops->read_std and ops->write_std,
which decides whether the access is a "system" access or should use the
processor's CPL.

Fixes: 129a72a0 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3c9fa24c

KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system · ce14e868

由 Paolo Bonzini 提交于 6月 06, 2018

Int the next patch the emulator's .read_std and .write_std callbacks will
grow another argument, which is not needed in kvm_read_guest_virt and
kvm_write_guest_virt_system's callers. Since we have to make separate
functions, let's give the currently existing names a nicer interface, too.

Fixes: 129a72a0 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce14e868

KVM: x86: introduce linear_{read,write}_system · 79367a65

由 Paolo Bonzini 提交于 6月 06, 2018

Wrap the common invocation of ctxt->ops->read_std and ctxt->ops->write_std, so
as to have a smaller patch when the functions grow another argument.

Fixes: 129a72a0 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

79367a65

kvm: nVMX: Enforce cpl=0 for VMX instructions · 727ba748

由 Felix Wilhelm 提交于 6月 11, 2018

VMX instructions executed inside a L1 VM will always trigger a VM exit
even when executed with cpl 3. This means we must perform the
privilege check in software.

Fixes: 70f3aac9("kvm: nVMX: Remove superfluous VMX instruction fault checks")
Cc: stable@vger.kernel.org
Signed-off-by: NFelix Wilhelm <fwilhelm@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

727ba748

04 6月, 2018 3 次提交

kvm: nVMX: Add support for "VMWRITE to any supported field" · f4160e45

由 Jim Mattson 提交于 5月 29, 2018

Add support for "VMWRITE to any supported field in the VMCS" and
enable this feature by default in L1's IA32_VMX_MISC MSR. If userspace
clears the VMX capability bit, the old behavior will be restored.

Note that this feature is a prerequisite for kvm in L1 to use VMCS
shadowing, once that feature is available.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4160e45

kvm: nVMX: Restrict VMX capability MSR changes · a943ac50

由 Jim Mattson 提交于 5月 29, 2018

Disallow changes to the VMX capability MSRs while the vCPU is in VMX
operation. Although this does break the existing API, it helps to
avoid some potentially tricky situations for which there is no
architected behavior.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a943ac50

KVM: VMX: Optimize tscdeadline timer latency · c5ce8235

由 Wanpeng Li 提交于 5月 29, 2018

'Commit d0659d94 ("KVM: x86: add option to advance tscdeadline
hrtimer expiration")' advances the tscdeadline (the timer is emulated
by hrtimer) expiration in order that the latency which is incurred
by hypervisor (apic_timer_fn -> vmentry) can be avoided. This patch
adds the advance tscdeadline expiration support to which the tscdeadline
timer is emulated by VMX preemption timer to reduce the hypervisor
lantency (handle_preemption_timer -> vmentry). The guest can also
set an expiration that is very small (for example in Linux if an
hrtimer feeds a expiration in the past); in that case we set delta_tsc
to 0, leading to an immediately vmexit when delta_tsc is not bigger than
advance ns.

This patch can reduce ~63% latency (~4450 cycles to ~1660 cycles on
a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
busy waits.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5ce8235

02 6月, 2018 2 次提交

kvm: Make VM ioctl do valloc for some archs · d1e5b0e9

由 Marc Orr 提交于 5月 15, 2018

The kvm struct has been bloating. For example, it's tens of kilo-bytes
for x86, which turns out to be a large amount of memory to allocate
contiguously via kzalloc. Thus, this patch does the following:
1. Uses architecture-specific routines to allocate the kvm struct via
   vzalloc for x86.
2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc
   when has_vhe() is true.

Other architectures continue to default to kalloc, as they have a
dependency on kalloc or have a small-enough struct kvm.
Signed-off-by: NMarc Orr <marcorr@google.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1e5b0e9

kvm: Change return type to vm_fault_t · 1499fa80

由 Souptick Joarder 提交于 4月 19, 2018

Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

commit 1c8f4220 ("mm: change return type to vm_fault_t")
Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1499fa80

26 5月, 2018 6 次提交

KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability · c1aea919

由 Vitaly Kuznetsov 提交于 5月 16, 2018

We need a new capability to indicate support for the newly added
HvFlushVirtualAddress{List,Space}{,Ex} hypercalls. Upon seeing this
capability, userspace is supposed to announce PV TLB flush features
by setting the appropriate CPUID bits (if needed).
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c1aea919

KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation · c7012676

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Implement HvFlushVirtualAddress{List,Space}Ex hypercalls in the same way
we've implemented non-EX counterparts.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
[Initialized valid_bank_mask to silence misguided GCC warnigs. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c7012676

KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation · e2f11f42

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Implement HvFlushVirtualAddress{List,Space} hypercalls in a simplistic way:
do full TLB flush with KVM_REQ_TLB_FLUSH and kick vCPUs which are currently
IN_GUEST_MODE.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

e2f11f42

KVM: x86: hyperv: do rep check for each hypercall separately · 56b9ae78

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Prepare to support TLB flush hypercalls, some of which are REP hypercalls.
Also, return HV_STATUS_INVALID_HYPERCALL_INPUT as it seems more
appropriate.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

56b9ae78

KVM: x86: hyperv: use defines when parsing hypercall parameters · 142c95da

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Avoid open-coding offsets for hypercall input parameters, we already
have defines for them.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

142c95da

x86/hyper-v: move struct hv_flush_pcpu{,ex} definitions to common header · c9c92bee

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Hyper-V TLB flush hypercalls definitions will be required for KVM so move
them hyperv-tlfs.h. Structures also need to be renamed as '_pcpu' suffix is
irrelevant for a general-purpose definition.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c9c92bee

25 5月, 2018 6 次提交

KVM: x86: Expose CLDEMOTE CPU feature to guest VM · 0ea3286e

由 Jingqi Liu 提交于 5月 22, 2018

The CLDEMOTE instruction hints to hardware that the cache line that
contains the linear address should be moved("demoted") from
the cache(s) closest to the processor core to a level more distant
from the processor core. This may accelerate subsequent accesses
to the line by other cores in the same coherence domain,
especially if the line was written by the core that demotes the line.

This patch exposes the cldemote feature to the guest.

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf
This patch has a dependency on https://lkml.org/lkml/2018/4/23/928Signed-off-by: NJingqi Liu <jingqi.liu@intel.com>
Reviewed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0ea3286e

KVM: nVMX: Emulate L1 individual-address invvpid by L0 individual-address invvpid · cd9a491f

由 Liran Alon 提交于 5月 22, 2018

When vmcs12 uses VPID, all TLB entries populated by L2 are tagged with
vmx->nested.vpid02. Currently, INVVPID executed by L1 is emulated by L0
by using INVVPID single/global-context to flush all TLB entries
tagged with vmx->nested.vpid02 regardless of INVVPID type executed by
L1.

However, we can easily optimize the case of L1 INVVPID on an
individual-address. Just INVVPID given individual-address tagged with
vmx->nested.vpid02.
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
[Squashed with a preparatory patch that added the !operand.vpid line.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cd9a491f

KVM: nVMX: Don't flush TLB when vmcs12 uses VPID · 6f1e03bc

由 Liran Alon 提交于 5月 22, 2018

Since commit 5c614b35 ("KVM: nVMX: nested VPID emulation"),
vmcs01 and vmcs02 don't share the same VPID. vmcs01 uses vmx->vpid
while vmcs02 uses vmx->nested.vpid02. This was done such that TLB
flush could be avoided when switching between L1 and L2.

However, the above mentioned commit only changed L2 VMEntry logic to
not flush TLB when switching from L1 to L2. It forgot to also remove
the TLB flush which is done when simulating a VMExit from L2 to L1.

To fix this issue, on VMExit from L2 to L1 we flush TLB only in case
vmcs01 enables VPID and vmcs01->vpid==vmcs02->vpid. This happens when
vmcs01 enables VPID and vmcs12 does not.

Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation")
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6f1e03bc

KVM: nVMX: Use vmx local var for referencing vpid02 · 6bce30c7

由 Liran Alon 提交于 5月 22, 2018

Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6bce30c7

KVM: x86: prevent integer overflows in KVM_MEMORY_ENCRYPT_REG_REGION · 86bf20cb

由 Dan Carpenter 提交于 5月 19, 2018

This is a fix from reviewing the code, but it looks like it might be
able to lead to an Oops.  It affects 32bit systems.

The KVM_MEMORY_ENCRYPT_REG_REGION ioctl uses a u64 for range->addr and
range->size but the high 32 bits would be truncated away on a 32 bit
system.  This is harmless but it's also harmless to prevent it.

Then in sev_pin_memory() the "uaddr + ulen" calculation can wrap around.
The wrap around can happen on 32 bit or 64 bit systems, but I was only
able to figure out a problem for 32 bit systems.  We would pick a number
which results in "npages" being zero.  The sev_pin_memory() would then
return ZERO_SIZE_PTR without allocating anything.

I made it illegal to call sev_pin_memory() with "ulen" set to zero.
Hopefully, that doesn't cause any problems.  I also changed the type of
"first" and "last" to long, just for cosmetic reasons.  Otherwise on a
64 bit system you're saving "uaddr >> 12" in an int and it truncates the
high 20 bits away.  The math works in the current code so far as I can
see but it's just weird.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
[Brijesh noted that the code is only reachable on X86_64.]
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

86bf20cb

KVM: x86: remove obsolete EXPORT... of handle_mmio_page_fault · a1d588e9

由 Sean Christopherson 提交于 3月 29, 2018

handle_mmio_page_fault() was recently moved to be an internal-only
MMU function, i.e. it's static and no longer defined in kvm_host.h.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a1d588e9

23 5月, 2018 4 次提交

KVM: nVMX: Ensure that VMCS12 field offsets do not change · 21ebf53b

由 Jim Mattson 提交于 5月 01, 2018

Enforce the invariant that existing VMCS12 field offsets must not
change. Experience has shown that without strict enforcement, this
invariant will not be maintained.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6
 requiring _Static_assert. - Radim.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

21ebf53b

KVM: nVMX: Restore the VMCS12 offsets for v4.0 fields · b348e793

由 Jim Mattson 提交于 5月 01, 2018

Changing the VMCS12 layout will break save/restore compatibility with
older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are
accepted upstream. Google has already been using these ioctls for some
time, and we implore the community not to disturb the existing layout.

Move the four most recently added fields to preserve the offsets of
the previously defined fields and reserve locations for the vmread and
vmwrite bitmaps, which will be used in the virtualization of VMCS
shadowing (to improve the performance of double-nesting).
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Kept the SDM order in vmcs_field_to_offset_table. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b348e793

KVM: x86: use timespec64 for KVM_HC_CLOCK_PAIRING · 899a31f5

由 Arnd Bergmann 提交于 4月 23, 2018

The hypercall was added using a struct timespec based implementation,
but we should not use timespec in new code.

This changes it to timespec64. There is no functional change
here since the implementation is only used in 64-bit kernels
that use the same definition for timespec and timespec64.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

899a31f5

kvm: nVMX: Use nested_run_pending rather than from_vmentry · 6514dc38

由 Jim Mattson 提交于 4月 26, 2018

When saving a vCPU's nested state, the vmcs02 is discarded. Only the
shadow vmcs12 is saved. The shadow vmcs12 contains all of the
information needed to reconstruct an equivalent vmcs02 on restore, but
we have to be able to deal with two contexts:

1. The nested state was saved immediately after an emulated VM-entry,
   before the vmcs02 was ever launched.

2. The nested state was saved some time after the first successful
   launch of the vmcs02.

Though it's an implementation detail rather than an architected bit,
vmx->nested_run_pending serves to distinguish between these two
cases. Hence, we save it as part of the vCPU's nested state. (Yes,
this is ugly.)

Even when restoring from a checkpoint, it may be necessary to build
the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So,
the 'from_vmentry' argument should be dropped, and
vmx->nested_run_pending should be consulted instead. The nested state
restoration code then has to set vmx->nested_run_pending prior to
calling prepare_vmcs02. It's important that the restoration code set
vmx->nested_run_pending anyway, since the flag impacts things like
interrupt delivery as well.

Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6514dc38

20 5月, 2018 1 次提交

x86/Hyper-V/hv_apic: Build the Hyper-V APIC conditionally · 2d2ccf24

由 Thomas Gleixner 提交于 5月 19, 2018

The Hyper-V APIC code is built when CONFIG_HYPERV is enabled but the actual
code in that file is guarded with CONFIG_X86_64. There is no point in doing
this. Neither is there a point in having the CONFIG_HYPERV guard in there
because the containing directory is not built when CONFIG_HYPERV=n.

Further for the hv_init_apic() function a stub is provided only for
CONFIG_HYPERV=n, which is pointless as the callsite is not compiled at
all. But for X86_32 the stub is missing and the build fails.

Clean that up:

  - Compile hv_apic.c only when CONFIG_X86_64=y
  - Make the stub for hv_init_apic() available when CONFG_X86_64=n

Fixes: 6b48cb5f ("X86/Hyper-V: Enlighten APIC access")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>

2d2ccf24

19 5月, 2018 6 次提交

x86/Hyper-V/hv_apic: Include asm/apic.h · 61eeb1f6

由 Thomas Gleixner 提交于 5月 19, 2018

Not all configurations magically include asm/apic.h, but the Hyper-V code
requires it. Include it explicitely.

Fixes: 6b48cb5f ("X86/Hyper-V: Enlighten APIC access")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>

61eeb1f6

X86/Hyper-V: Consolidate the allocation of the hypercall input page · 9a2d78e2

由 K. Y. Srinivasan 提交于 5月 16, 2018

Consolidate the allocation of the hypercall input page.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-5-kys@linuxonhyperv.com

9a2d78e2

X86/Hyper-V: Consolidate code for converting cpumask to vpset · 800b8f03

由 K. Y. Srinivasan 提交于 5月 16, 2018

Consolidate code for converting cpumask to vpset.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-4-kys@linuxonhyperv.com

800b8f03

X86/Hyper-V: Enhanced IPI enlightenment · 366f03b0

由 K. Y. Srinivasan 提交于 5月 16, 2018

Support enhanced IPI enlightenments (to target more than 64 CPUs).
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-3-kys@linuxonhyperv.com

366f03b0

X86/Hyper-V: Enable IPI enlightenments · 68bb7bfb

由 K. Y. Srinivasan 提交于 5月 16, 2018

Hyper-V supports hypercalls to implement IPI; use them.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-2-kys@linuxonhyperv.com

68bb7bfb

X86/Hyper-V: Enlighten APIC access · 6b48cb5f

由 K. Y. Srinivasan 提交于 5月 16, 2018

Hyper-V supports MSR based APIC access; implement
the enlightenment.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-1-kys@linuxonhyperv.com

6b48cb5f

18 5月, 2018 1 次提交

kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8

由 Michael S. Tsirkin 提交于 5月 17, 2018

KVM_HINTS_DEDICATED seems to be somewhat confusing:

Guest doesn't really care whether it's the only task running on a host
CPU as long as it's not preempted.

And there are more reasons for Guest to be preempted than host CPU
sharing, for example, with memory overcommit it can get preempted on a
memory access, post copy migration can cause preemption, etc.

Let's call it KVM_HINTS_REALTIME which seems to better
match what guests expect.

Also, the flag most be set on all vCPUs - current guests assume this.
Note so in the documentation.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

633711e8

15 5月, 2018 7 次提交

KVM: X86: Lower the default timer frequency limit to 200us · 4c27625b

由 Wanpeng Li 提交于 5月 05, 2018

Anthoine reported:
 The period used by Windows change over time but it can be 1
 milliseconds or less. I saw the limit_periodic_timer_frequency
 print so 500 microseconds is sometimes reached.

As suggested by Paolo, lower the default timer frequency limit to a
smaller interval of 200 us (5000 Hz) to leave some headroom. This
is required due to Windows 10 changing the scheduler tick limit
from 1024 Hz to 2048 Hz.
Reported-by: NAnthoine Bourgeois <anthoine.bourgeois@blade-group.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c27625b

tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all} · 45dd9b06

由 Steven Rostedt (VMware) 提交于 5月 09, 2018

Doing an audit of trace events, I discovered two trace events in the xen
subsystem that use a hack to create zero data size trace events. This is not
what trace events are for. Trace events add memory footprint overhead, and
if all you need to do is see if a function is hit or not, simply make that
function noinline and use function tracer filtering.

Worse yet, the hack used was:

 __array(char, x, 0)

Which creates a static string of zero in length. There's assumptions about
such constructs in ftrace that this is a dynamic string that is nul
terminated. This is not the case with these tracepoints and can cause
problems in various parts of ftrace.

Nuke the trace events!

Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 95a7d768 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

45dd9b06

kvm: mmu: Don't expose private memslots to L2 · 3a2936de

由 Jim Mattson 提交于 5月 09, 2018

These private pages have special purposes in the virtualization of L1,
but not in the virtualization of L2. In particular, L1's APIC access
page should never be entered into L2's page tables, because this
causes a great deal of confusion when the APIC virtualization hardware
is being used to accelerate L2's accesses to its own APIC.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a2936de

kvm: mmu: Add guest_mode to kvm_mmu_page_role · 1313cc2b

由 Jim Mattson 提交于 5月 09, 2018

L1 and L2 need to have disjoint mappings, so that L1's APIC access
page (under VMX) can be omitted from L2's mappings.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1313cc2b

kvm: nVMX: Eliminate APIC access page sharing between L1 and L2 · ab5df31c

由 Jim Mattson 提交于 5月 09, 2018

It is only possible to share the APIC access page between L1 and L2 if
they also share the virtual-APIC page.  If L2 has its own virtual-APIC
page, then MMIO accesses to L1's TPR from L2 will access L2's TPR
instead.  Moreover, L1's local APIC has to be in xAPIC mode, which is
another condition that hasn't been checked.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ab5df31c

kvm: vmx: Basic APIC virtualization controls have three settings · 8d860bbe

由 Jim Mattson 提交于 5月 09, 2018

Previously, we toggled between SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE
and SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, depending on whether or
not the EXTD bit was set in MSR_IA32_APICBASE. However, if the local
APIC is disabled, we should not set either of these APIC
virtualization control bits.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8d860bbe

kvm: vmx: Introduce lapic_mode enumeration · 58871649

由 Jim Mattson 提交于 5月 09, 2018

The local APIC can be in one of three modes: disabled, xAPIC or
x2APIC. (A fourth mode, "invalid," is included for completeness.)

Using the new enumeration can make some of the APIC mode logic easier
to read. In kvm_set_apic_base, for instance, it is clear that one
cannot transition directly from x2APIC mode to xAPIC mode or directly
from APIC disabled to x2APIC mode.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
[Check invalid bits even if msr_info->host_initiated.  Reported by
 Wanpeng Li. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58871649

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功