提交 · c70126764bf09c5dd95527808b647ec347b8a822 · openanolis / cloud-kernel

26 5月, 2018 5 次提交

KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation · c7012676

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Implement HvFlushVirtualAddress{List,Space}Ex hypercalls in the same way
we've implemented non-EX counterparts.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
[Initialized valid_bank_mask to silence misguided GCC warnigs. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c7012676

KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation · e2f11f42

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Implement HvFlushVirtualAddress{List,Space} hypercalls in a simplistic way:
do full TLB flush with KVM_REQ_TLB_FLUSH and kick vCPUs which are currently
IN_GUEST_MODE.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

e2f11f42

KVM: x86: hyperv: do rep check for each hypercall separately · 56b9ae78

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Prepare to support TLB flush hypercalls, some of which are REP hypercalls.
Also, return HV_STATUS_INVALID_HYPERCALL_INPUT as it seems more
appropriate.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

56b9ae78

KVM: x86: hyperv: use defines when parsing hypercall parameters · 142c95da

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Avoid open-coding offsets for hypercall input parameters, we already
have defines for them.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

142c95da

x86/hyper-v: move struct hv_flush_pcpu{,ex} definitions to common header · c9c92bee

由 Vitaly Kuznetsov 提交于 5月 16, 2018

Hyper-V TLB flush hypercalls definitions will be required for KVM so move
them hyperv-tlfs.h. Structures also need to be renamed as '_pcpu' suffix is
irrelevant for a general-purpose definition.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c9c92bee

25 5月, 2018 6 次提交

KVM: x86: Expose CLDEMOTE CPU feature to guest VM · 0ea3286e

由 Jingqi Liu 提交于 5月 22, 2018

The CLDEMOTE instruction hints to hardware that the cache line that
contains the linear address should be moved("demoted") from
the cache(s) closest to the processor core to a level more distant
from the processor core. This may accelerate subsequent accesses
to the line by other cores in the same coherence domain,
especially if the line was written by the core that demotes the line.

This patch exposes the cldemote feature to the guest.

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf
This patch has a dependency on https://lkml.org/lkml/2018/4/23/928Signed-off-by: NJingqi Liu <jingqi.liu@intel.com>
Reviewed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0ea3286e

KVM: nVMX: Emulate L1 individual-address invvpid by L0 individual-address invvpid · cd9a491f

由 Liran Alon 提交于 5月 22, 2018

When vmcs12 uses VPID, all TLB entries populated by L2 are tagged with
vmx->nested.vpid02. Currently, INVVPID executed by L1 is emulated by L0
by using INVVPID single/global-context to flush all TLB entries
tagged with vmx->nested.vpid02 regardless of INVVPID type executed by
L1.

However, we can easily optimize the case of L1 INVVPID on an
individual-address. Just INVVPID given individual-address tagged with
vmx->nested.vpid02.
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
[Squashed with a preparatory patch that added the !operand.vpid line.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cd9a491f

KVM: nVMX: Don't flush TLB when vmcs12 uses VPID · 6f1e03bc

由 Liran Alon 提交于 5月 22, 2018

Since commit 5c614b35 ("KVM: nVMX: nested VPID emulation"),
vmcs01 and vmcs02 don't share the same VPID. vmcs01 uses vmx->vpid
while vmcs02 uses vmx->nested.vpid02. This was done such that TLB
flush could be avoided when switching between L1 and L2.

However, the above mentioned commit only changed L2 VMEntry logic to
not flush TLB when switching from L1 to L2. It forgot to also remove
the TLB flush which is done when simulating a VMExit from L2 to L1.

To fix this issue, on VMExit from L2 to L1 we flush TLB only in case
vmcs01 enables VPID and vmcs01->vpid==vmcs02->vpid. This happens when
vmcs01 enables VPID and vmcs12 does not.

Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation")
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6f1e03bc

KVM: nVMX: Use vmx local var for referencing vpid02 · 6bce30c7

由 Liran Alon 提交于 5月 22, 2018

Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6bce30c7

KVM: x86: prevent integer overflows in KVM_MEMORY_ENCRYPT_REG_REGION · 86bf20cb

由 Dan Carpenter 提交于 5月 19, 2018

This is a fix from reviewing the code, but it looks like it might be
able to lead to an Oops.  It affects 32bit systems.

The KVM_MEMORY_ENCRYPT_REG_REGION ioctl uses a u64 for range->addr and
range->size but the high 32 bits would be truncated away on a 32 bit
system.  This is harmless but it's also harmless to prevent it.

Then in sev_pin_memory() the "uaddr + ulen" calculation can wrap around.
The wrap around can happen on 32 bit or 64 bit systems, but I was only
able to figure out a problem for 32 bit systems.  We would pick a number
which results in "npages" being zero.  The sev_pin_memory() would then
return ZERO_SIZE_PTR without allocating anything.

I made it illegal to call sev_pin_memory() with "ulen" set to zero.
Hopefully, that doesn't cause any problems.  I also changed the type of
"first" and "last" to long, just for cosmetic reasons.  Otherwise on a
64 bit system you're saving "uaddr >> 12" in an int and it truncates the
high 20 bits away.  The math works in the current code so far as I can
see but it's just weird.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
[Brijesh noted that the code is only reachable on X86_64.]
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

86bf20cb

KVM: x86: remove obsolete EXPORT... of handle_mmio_page_fault · a1d588e9

由 Sean Christopherson 提交于 3月 29, 2018

handle_mmio_page_fault() was recently moved to be an internal-only
MMU function, i.e. it's static and no longer defined in kvm_host.h.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a1d588e9

23 5月, 2018 4 次提交

KVM: nVMX: Ensure that VMCS12 field offsets do not change · 21ebf53b

由 Jim Mattson 提交于 5月 01, 2018

Enforce the invariant that existing VMCS12 field offsets must not
change. Experience has shown that without strict enforcement, this
invariant will not be maintained.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6
 requiring _Static_assert. - Radim.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

21ebf53b

KVM: nVMX: Restore the VMCS12 offsets for v4.0 fields · b348e793

由 Jim Mattson 提交于 5月 01, 2018

Changing the VMCS12 layout will break save/restore compatibility with
older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are
accepted upstream. Google has already been using these ioctls for some
time, and we implore the community not to disturb the existing layout.

Move the four most recently added fields to preserve the offsets of
the previously defined fields and reserve locations for the vmread and
vmwrite bitmaps, which will be used in the virtualization of VMCS
shadowing (to improve the performance of double-nesting).
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Kept the SDM order in vmcs_field_to_offset_table. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b348e793

KVM: x86: use timespec64 for KVM_HC_CLOCK_PAIRING · 899a31f5

由 Arnd Bergmann 提交于 4月 23, 2018

The hypercall was added using a struct timespec based implementation,
but we should not use timespec in new code.

This changes it to timespec64. There is no functional change
here since the implementation is only used in 64-bit kernels
that use the same definition for timespec and timespec64.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

899a31f5

kvm: nVMX: Use nested_run_pending rather than from_vmentry · 6514dc38

由 Jim Mattson 提交于 4月 26, 2018

When saving a vCPU's nested state, the vmcs02 is discarded. Only the
shadow vmcs12 is saved. The shadow vmcs12 contains all of the
information needed to reconstruct an equivalent vmcs02 on restore, but
we have to be able to deal with two contexts:

1. The nested state was saved immediately after an emulated VM-entry,
   before the vmcs02 was ever launched.

2. The nested state was saved some time after the first successful
   launch of the vmcs02.

Though it's an implementation detail rather than an architected bit,
vmx->nested_run_pending serves to distinguish between these two
cases. Hence, we save it as part of the vCPU's nested state. (Yes,
this is ugly.)

Even when restoring from a checkpoint, it may be necessary to build
the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So,
the 'from_vmentry' argument should be dropped, and
vmx->nested_run_pending should be consulted instead. The nested state
restoration code then has to set vmx->nested_run_pending prior to
calling prepare_vmcs02. It's important that the restoration code set
vmx->nested_run_pending anyway, since the flag impacts things like
interrupt delivery as well.

Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6514dc38

20 5月, 2018 1 次提交

x86/Hyper-V/hv_apic: Build the Hyper-V APIC conditionally · 2d2ccf24

由 Thomas Gleixner 提交于 5月 19, 2018

The Hyper-V APIC code is built when CONFIG_HYPERV is enabled but the actual
code in that file is guarded with CONFIG_X86_64. There is no point in doing
this. Neither is there a point in having the CONFIG_HYPERV guard in there
because the containing directory is not built when CONFIG_HYPERV=n.

Further for the hv_init_apic() function a stub is provided only for
CONFIG_HYPERV=n, which is pointless as the callsite is not compiled at
all. But for X86_32 the stub is missing and the build fails.

Clean that up:

  - Compile hv_apic.c only when CONFIG_X86_64=y
  - Make the stub for hv_init_apic() available when CONFG_X86_64=n

Fixes: 6b48cb5f ("X86/Hyper-V: Enlighten APIC access")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>

2d2ccf24

19 5月, 2018 6 次提交

x86/Hyper-V/hv_apic: Include asm/apic.h · 61eeb1f6

由 Thomas Gleixner 提交于 5月 19, 2018

Not all configurations magically include asm/apic.h, but the Hyper-V code
requires it. Include it explicitely.

Fixes: 6b48cb5f ("X86/Hyper-V: Enlighten APIC access")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>

61eeb1f6

X86/Hyper-V: Consolidate the allocation of the hypercall input page · 9a2d78e2

由 K. Y. Srinivasan 提交于 5月 16, 2018

Consolidate the allocation of the hypercall input page.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-5-kys@linuxonhyperv.com

9a2d78e2

X86/Hyper-V: Consolidate code for converting cpumask to vpset · 800b8f03

由 K. Y. Srinivasan 提交于 5月 16, 2018

Consolidate code for converting cpumask to vpset.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-4-kys@linuxonhyperv.com

800b8f03

X86/Hyper-V: Enhanced IPI enlightenment · 366f03b0

由 K. Y. Srinivasan 提交于 5月 16, 2018

Support enhanced IPI enlightenments (to target more than 64 CPUs).
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-3-kys@linuxonhyperv.com

366f03b0

X86/Hyper-V: Enable IPI enlightenments · 68bb7bfb

由 K. Y. Srinivasan 提交于 5月 16, 2018

Hyper-V supports hypercalls to implement IPI; use them.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-2-kys@linuxonhyperv.com

68bb7bfb

X86/Hyper-V: Enlighten APIC access · 6b48cb5f

由 K. Y. Srinivasan 提交于 5月 16, 2018

Hyper-V supports MSR based APIC access; implement
the enlightenment.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-1-kys@linuxonhyperv.com

6b48cb5f

18 5月, 2018 1 次提交

kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8

由 Michael S. Tsirkin 提交于 5月 17, 2018

KVM_HINTS_DEDICATED seems to be somewhat confusing:

Guest doesn't really care whether it's the only task running on a host
CPU as long as it's not preempted.

And there are more reasons for Guest to be preempted than host CPU
sharing, for example, with memory overcommit it can get preempted on a
memory access, post copy migration can cause preemption, etc.

Let's call it KVM_HINTS_REALTIME which seems to better
match what guests expect.

Also, the flag most be set on all vCPUs - current guests assume this.
Note so in the documentation.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

633711e8

15 5月, 2018 10 次提交

KVM: X86: Lower the default timer frequency limit to 200us · 4c27625b

由 Wanpeng Li 提交于 5月 05, 2018

Anthoine reported:
 The period used by Windows change over time but it can be 1
 milliseconds or less. I saw the limit_periodic_timer_frequency
 print so 500 microseconds is sometimes reached.

As suggested by Paolo, lower the default timer frequency limit to a
smaller interval of 200 us (5000 Hz) to leave some headroom. This
is required due to Windows 10 changing the scheduler tick limit
from 1024 Hz to 2048 Hz.
Reported-by: NAnthoine Bourgeois <anthoine.bourgeois@blade-group.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c27625b

tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all} · 45dd9b06

由 Steven Rostedt (VMware) 提交于 5月 09, 2018

Doing an audit of trace events, I discovered two trace events in the xen
subsystem that use a hack to create zero data size trace events. This is not
what trace events are for. Trace events add memory footprint overhead, and
if all you need to do is see if a function is hit or not, simply make that
function noinline and use function tracer filtering.

Worse yet, the hack used was:

 __array(char, x, 0)

Which creates a static string of zero in length. There's assumptions about
such constructs in ftrace that this is a dynamic string that is nul
terminated. This is not the case with these tracepoints and can cause
problems in various parts of ftrace.

Nuke the trace events!

Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 95a7d768 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

45dd9b06

kvm: mmu: Don't expose private memslots to L2 · 3a2936de

由 Jim Mattson 提交于 5月 09, 2018

These private pages have special purposes in the virtualization of L1,
but not in the virtualization of L2. In particular, L1's APIC access
page should never be entered into L2's page tables, because this
causes a great deal of confusion when the APIC virtualization hardware
is being used to accelerate L2's accesses to its own APIC.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a2936de

kvm: mmu: Add guest_mode to kvm_mmu_page_role · 1313cc2b

由 Jim Mattson 提交于 5月 09, 2018

L1 and L2 need to have disjoint mappings, so that L1's APIC access
page (under VMX) can be omitted from L2's mappings.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1313cc2b

kvm: nVMX: Eliminate APIC access page sharing between L1 and L2 · ab5df31c

由 Jim Mattson 提交于 5月 09, 2018

It is only possible to share the APIC access page between L1 and L2 if
they also share the virtual-APIC page.  If L2 has its own virtual-APIC
page, then MMIO accesses to L1's TPR from L2 will access L2's TPR
instead.  Moreover, L1's local APIC has to be in xAPIC mode, which is
another condition that hasn't been checked.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ab5df31c

kvm: vmx: Basic APIC virtualization controls have three settings · 8d860bbe

由 Jim Mattson 提交于 5月 09, 2018

Previously, we toggled between SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE
and SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, depending on whether or
not the EXTD bit was set in MSR_IA32_APICBASE. However, if the local
APIC is disabled, we should not set either of these APIC
virtualization control bits.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8d860bbe

kvm: vmx: Introduce lapic_mode enumeration · 58871649

由 Jim Mattson 提交于 5月 09, 2018

The local APIC can be in one of three modes: disabled, xAPIC or
x2APIC. (A fourth mode, "invalid," is included for completeness.)

Using the new enumeration can make some of the APIC mode logic easier
to read. In kvm_set_apic_base, for instance, it is clear that one
cannot transition directly from x2APIC mode to xAPIC mode or directly
from APIC disabled to x2APIC mode.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
[Check invalid bits even if msr_info->host_initiated.  Reported by
 Wanpeng Li. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58871649

KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support · ceef7d10

由 Vitaly Kuznetsov 提交于 4月 16, 2018

Enlightened MSR-Bitmap is a natural extension of Enlightened VMCS:
Hyper-V Top Level Functional Specification states:

"The L1 hypervisor may collaborate with the L0 hypervisor to make MSR
accesses more efficient. It can enable enlightened MSR bitmaps by setting
the corresponding field in the enlightened VMCS to 1. When enabled, the L0
hypervisor does not monitor the MSR bitmaps for changes. Instead, the L1
hypervisor must invalidate the corresponding clean field after making
changes to one of the MSR bitmaps."

I reached out to Hyper-V team for additional details and I got the
following information:

"Current Hyper-V implementation works as following:

If the enlightened MSR bitmap is not enabled:
- All MSR accesses of L2 guests cause physical VM-Exits

If the enlightened MSR bitmap is enabled:
- Physical VM-Exits for L2 accesses to certain MSRs (currently FS_BASE,
  GS_BASE and KERNEL_GS_BASE) are avoided, thus making these MSR accesses
  faster."

I tested my series with a tight rdmsrl loop in L2, for KERNEL_GS_BASE the
results are:

Without Enlightened MSR-Bitmap: 1300 cycles/read
With Enlightened MSR-Bitmap: 120 cycles/read
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ceef7d10

kvm: x86: Refactor mmu_free_roots() · 74b566e6

由 Junaid Shahid 提交于 5月 04, 2018

Extract the logic to free a root page in a separate function to avoid code
duplication in mmu_free_roots(). Also, change it to an exported function
i.e. kvm_mmu_free_roots().
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

74b566e6

KVM: X86: Fix reserved bits check for MOV to CR3 · a780a3ea

由 Wanpeng Li 提交于 5月 13, 2018

MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4.
It should be checked when PCIDE bit is not set, however commit
'd1cd3ce9 ("KVM: MMU: check guest CR3 reserved bits based on
its physical address width")' removes the bit 63 checking
unconditionally. This patch fixes it by checking bit 63 of CR3
when PCIDE bit is not set in CR4.

Fixes: d1cd3ce9 (KVM: MMU: check guest CR3 reserved bits based on its physical address width)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Reviewed-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a780a3ea

14 5月, 2018 1 次提交

x86/amd_nb: Add support for Raven Ridge CPUs · f9bc6b2d

由 Guenter Roeck 提交于 5月 04, 2018

Add Raven Ridge root bridge and data fabric PCI IDs.
This is required for amd_pci_dev_to_node_id() and amd_smn_read().

Cc: stable@vger.kernel.org # v4.16+
Tested-by: NGabriel Craciunescu <nix.or.die@gmail.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

f9bc6b2d

11 5月, 2018 4 次提交

KVM: vmx: update sec exec controls for UMIP iff emulating UMIP · 64f7a115

由 Sean Christopherson 提交于 4月 30, 2018

Update SECONDARY_EXEC_DESC for UMIP emulation if and only UMIP
is actually being emulated.  Skipping the VMCS update eliminates
unnecessary VMREAD/VMWRITE when UMIP is supported in hardware,
and on platforms that don't have SECONDARY_VM_EXEC_CONTROL.  The
latter case resolves a bug where KVM would fill the kernel log
with warnings due to failed VMWRITEs on older platforms.

Fixes: 0367f205 ("KVM: vmx: add support for emulating UMIP")
Cc: stable@vger.kernel.org #4.16
Reported-by: NPaolo Zeppegno <pzeppegno@gmail.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Suggested-by: NRadim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64f7a115

kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled · c19986fe

由 Junaid Shahid 提交于 5月 04, 2018

If the PCIDE bit is not set in CR4, then the MSb of CR3 is a reserved
bit. If the guest tries to set it, that should cause a #GP fault. So
mask out the bit only when the PCIDE bit is set.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c19986fe

KVM: hyperv: idr_find needs RCU protection · 452a68d0

由 Paolo Bonzini 提交于 5月 07, 2018

Even though the eventfd is released after the KVM SRCU grace period
elapses, the conn_to_evt data structure itself is not; it uses RCU
internally, instead.  Fix the read-side critical section to happen
under rcu_read_lock/unlock; the result is still protected by
vcpu->kvm->srcu.
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

452a68d0

x86: Delay skip of emulated hypercall instruction · 6356ee0c

由 Marian Rotariu 提交于 4月 30, 2018

The IP increment should be done after the hypercall emulation, after
calling the various handlers. In this way, these handlers can accurately
identify the the IP of the VMCALL if they need it.

This patch keeps the same functionality for the Hyper-V handler which does
not use the return code of the standard kvm_skip_emulated_instruction()
call.
Signed-off-by: NMarian Rotariu <mrotariu@bitdefender.com>
[Hyper-V hypercalls also need kvm_skip_emulated_instruction() - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6356ee0c

08 5月, 2018 1 次提交

x86/xen: Reset VCPU0 info pointer after shared_info remap · d1ecfa9d

由 van der Linden, Frank 提交于 5月 04, 2018

This patch fixes crashes during boot for HVM guests on older (pre HVM
vector callback) Xen versions. Without this, current kernels will always
fail to boot on those Xen versions.

Sample stack trace:

   BUG: unable to handle kernel paging request at ffffffffff200000
   IP: __xen_evtchn_do_upcall+0x1e/0x80
   PGD 1e0e067 P4D 1e0e067 PUD 1e10067 PMD 235c067 PTE 0
    Oops: 0002 [#1] SMP PTI
   Modules linked in:
   CPU: 0 PID: 512 Comm: kworker/u2:0 Not tainted 4.14.33-52.13.amzn1.x86_64 #1
   Hardware name: Xen HVM domU, BIOS 3.4.3.amazon 11/11/2016
   task: ffff88002531d700 task.stack: ffffc90000480000
   RIP: 0010:__xen_evtchn_do_upcall+0x1e/0x80
   RSP: 0000:ffff880025403ef0 EFLAGS: 00010046
   RAX: ffffffff813cc760 RBX: ffffffffff200000 RCX: ffffc90000483ef0
   RDX: ffff880020540a00 RSI: ffff880023c78000 RDI: 000000000000001c
   RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
   R13: ffff880025403f5c R14: 0000000000000000 R15: 0000000000000000
   FS:  0000000000000000(0000) GS:ffff880025400000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: ffffffffff200000 CR3: 0000000001e0a000 CR4: 00000000000006f0
    Call Trace:
   <IRQ>
   do_hvm_evtchn_intr+0xa/0x10
   __handle_irq_event_percpu+0x43/0x1a0
   handle_irq_event_percpu+0x20/0x50
   handle_irq_event+0x39/0x60
   handle_fasteoi_irq+0x80/0x140
   handle_irq+0xaf/0x120
   do_IRQ+0x41/0xd0
   common_interrupt+0x7d/0x7d
   </IRQ>

During boot, the HYPERVISOR_shared_info page gets remapped to make it work
with KASLR. This means that any pointer derived from it needs to be
adjusted.

The only value that this applies to is the vcpu_info pointer for VCPU 0.
For PV and HVM with the callback vector feature, this gets done via the
smp_ops prepare_boot_cpu callback. Older Xen versions do not support the
HVM callback vector, so there is no Xen-specific smp_ops set up in that
scenario. So, the vcpu_info pointer for VCPU 0 never gets set to the proper
value, and the first reference of it will be bad. Fix this by resetting it
immediately after the remap.
Signed-off-by: NFrank van der Linden <fllinden@amazon.com>
Reviewed-by: NEduardo Valentin <eduval@amazon.com>
Reviewed-by: NAlakesh Haloi <alakeshh@amazon.com>
Reviewed-by: NVallish Vaidyeshwara <vallish@amazon.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

d1ecfa9d

06 5月, 2018 1 次提交

KVM: x86: remove APIC Timer periodic/oneshot spikes · ecf08dad

由 Anthoine Bourgeois 提交于 4月 29, 2018

Since the commit "8003c9ae: add APIC Timer periodic/oneshot mode VMX
preemption timer support", a Windows 10 guest has some erratic timer
spikes.

Here the results on a 150000 times 1ms timer without any load:
	  Before 8003c9ae | After 8003c9ae
Max           1834us          |  86000us
Mean          1100us          |   1021us
Deviation       59us          |    149us
Here the results on a 150000 times 1ms timer with a cpu-z stress test:
	  Before 8003c9ae | After 8003c9ae
Max          32000us          | 140000us
Mean          1006us          |   1997us
Deviation      140us          |  11095us

The root cause of the problem is starting hrtimer with an expiry time
already in the past can take more than 20 milliseconds to trigger the
timer function.  It can be solved by forward such past timers
immediately, rather than submitting them to hrtimer_start().
In case the timer is periodic, update the target expiration and call
hrtimer_start with it.

v2: Check if the tsc deadline is already expired. Thank you Mika.
v3: Execute the past timers immediately rather than submitting them to
hrtimer_start().
v4: Rearm the periodic timer with advance_periodic_target_expiration() a
simpler version of set_target_expiration(). Thank you Paolo.

Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAnthoine Bourgeois <anthoine.bourgeois@blade-group.com>
8003c9ae ("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support")
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

ecf08dad

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功