提交 · 217aaff53c25f03b1d2fc23eff9dc2bae34f690e · openeuler / Kernel

12 2月, 2019 11 次提交

KVM: VMX: Invert the ordering of saving guest/host scratch reg at VM-Enter · 217aaff5

由 Sean Christopherson 提交于 1月 25, 2019

Switching the ordering allows for an out-of-line path for VM-Fail
that elides saving guest state but still shares the register clearing
with the VM-Exit path.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

217aaff5

KVM: VMX: Pass "launched" directly to the vCPU-run asm blob · c9afc58c

由 Sean Christopherson 提交于 1月 25, 2019

...and remove struct vcpu_vmx's temporary __launched variable.

Eliminating __launched is a bonus, the real motivation is to get to the
point where the only reference to struct vcpu_vmx in the asm code is
to vcpu.arch.regs, which will simplify moving the blob to a proper asm
file.  Note that also means this approach is deliberately different than
what is used in nested_vmx_check_vmentry_hw().

Use BL as it is a callee-save register in both 32-bit and 64-bit ABIs,
i.e. it can't be modified by vmx_update_host_rsp(), to avoid having to
temporarily save/restore the launched flag.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9afc58c

KVM: VMX: Update VMCS.HOST_RSP via helper C function · c09b03eb

由 Sean Christopherson 提交于 1月 25, 2019

Providing a helper function to update HOST_RSP is visibly easier to
read, and more importantly (for the future) eliminates two arguments to
the VM-Enter assembly blob. Reducing the number of arguments to the asm
blob is for all intents and purposes a prerequisite to moving the code
to a proper assembly routine. It's not truly mandatory, but it greatly
simplifies the future code, and the cost of the extra CALL+RET is
negligible in the grand scheme.

Note that although _ASM_ARG[1-3] can be used in the inline asm itself,
the intput/output constraints need to be manually defined. gcc will
actually compile with _ASM_ARG[1-3] specified as constraints, but what
it actually ends up doing with the bogus constraint is unknown.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c09b03eb

KVM: VMX: Load/save guest CR2 via C code in __vmx_vcpu_run() · 47e97c09

由 Sean Christopherson 提交于 1月 25, 2019

...to eliminate its parameter and struct vcpu_vmx offset definition
from the assembly blob.  Accessing CR2 from C versus assembly doesn't
change the likelihood of taking a page fault (and modifying CR2) while
it's loaded with the guest's value, so long as we don't do anything
silly between accessing CR2 and VM-Enter/VM-Exit.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

47e97c09

KVM: nVMX: Cache host_rsp on a per-VMCS basis · 5a878160

由 Sean Christopherson 提交于 1月 25, 2019

Currently, host_rsp is cached on a per-vCPU basis, i.e. it's stored in
struct vcpu_vmx.  In non-nested usage the caching is for all intents
and purposes 100% effective, e.g. only the first VMLAUNCH needs to
synchronize VMCS.HOST_RSP since the call stack to vmx_vcpu_run() is
identical each and every time.  But when running a nested guest, KVM
must invalidate the cache when switching the current VMCS as it can't
guarantee the new VMCS has the same HOST_RSP as the previous VMCS.  In
other words, the cache loses almost all of its efficacy when running a
nested VM.

Move host_rsp to struct vmcs_host_state, which is per-VMCS, so that it
is cached on a per-VMCS basis and restores its 100% hit rate when
nested VMs are in play.

Note that the host_rsp cache for vmcs02 essentially "breaks" when
nested early checks are enabled as nested_vmx_check_vmentry_hw() will
see a different RSP at the time of its VM-Enter.  While it's possible
to avoid even that VMCS.HOST_RSP synchronization, e.g. by employing a
dedicated VM-Exit stack, there is little motivation for doing so as
the overhead of two VMWRITEs (~55 cycles) is dwarfed by the overhead
of the extra VMX transition (600+ cycles) and is a proverbial drop in
the ocean relative to the total cost of a nested transtion (10s of
thousands of cycles).
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a878160

KVM: VMX: Let the compiler save/load RDX during vCPU-run · 6f7c6d23

由 Sean Christopherson 提交于 1月 25, 2019

Per commit c2036300 ("KVM: VMX: Let gcc to choose which registers
to save (x86_64)"), the only reason RDX is saved/loaded to/from the
stack is because it was specified as an input, i.e. couldn't be marked
as clobbered (ignoring the fact that "saving" it to a dummy output
would indirectly mark it as clobbered).

Now that RDX is no longer an input, clobber it.
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6f7c6d23

KVM: VMX: Manually load RDX in vCPU-run asm blob · ccf44743

由 Sean Christopherson 提交于 1月 25, 2019

Load RDX with the VMCS.HOST_RSP field encoding on-demand instead of
delegating to the compiler via an input constraint.  In addition to
saving one whole MOV instruction, this allows RDX to be properly
clobbered (in a future patch) instead of being saved/loaded to/from
the stack.

Despite nested_vmx_check_vmentry_hw() having similar code, leave it
alone, for now.  In that case, RDX is unconditionally used and isn't
clobbered, i.e. sending in HOST_RSP as an input is simpler.

Note that because HOST_RSP is an enum and not a define, it must be
redefined as an immediate instead of using __stringify(HOST_RSP).  The
naming "conflict" between host_rsp and HOST_RSP is slightly confusing,
but the former will be removed in a future patch, at which point
HOST_RSP is absolutely what is desired.
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ccf44743

KVM: VMX: Save RSI to an unused output in the vCPU-run asm blob · f3689e3f

由 Sean Christopherson 提交于 1月 25, 2019

RSI is clobbered by the vCPU-run asm blob, but it's not marked as such,
probably because GCC doesn't let you mark inputs as clobbered. "Save"
RSI to a dummy output so that GCC recognizes it as being clobbered.

Fixes: 773e8a04 ("x86/kvm: use Enlightened VMCS when running on Hyper-V")
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3689e3f

KVM: VMX: Modify only RSP when creating a placeholder for guest's RCX · 831a3011

由 Sean Christopherson 提交于 1月 25, 2019

In the vCPU-run asm blob, the guest's RCX is temporarily saved onto the
stack after VM-Exit as the exit flow must first load a register with a
pointer to the vCPU's save area in order to save the guest's registers.
RCX is arbitrarily designated as the scratch register.

Since the stack usage is to (1)save host, (2)save guest, (3)load host
and (4)load guest, the code can't conform to the stack's natural FIFO
semantics, i.e. it can't simply do PUSH/POP. Regardless of whether it
is done for the host's value or guest's value, at some point the code
needs to access the stack using a non-traditional method, e.g. MOV
instead of POP. vCPU-run opts to create a placeholder on the stack for
guest's RCX (by adjusting RSP) and saves RCX to its place immediately
after VM-Exit (via MOV).

In other words, the purpose of the first 'PUSH RCX' at the start of
the vCPU-run asm blob is to adjust RSP down, i.e. there's no need to
actually access memory. Use 'SUB $wordsize, RSP' instead of 'PUSH RCX'
to make it more obvious that the intent is simply to create a gap on
the stack for the guest's RCX.
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

831a3011

KVM: VMX: Zero out *all* general purpose registers after VM-Exit · 0e0ab73c

由 Sean Christopherson 提交于 1月 25, 2019

...except RSP, which is restored by hardware as part of VM-Exit.

Paolo theorized that restoring registers from the stack after a VM-Exit
in lieu of zeroing them could lead to speculative execution with the
guest's values, e.g. if the stack accesses miss the L1 cache[1].
Zeroing XORs are dirt cheap, so just be ultra-paranoid.

Note that the scratch register (currently RCX) used to save/restore the
guest state is also zeroed as its host-defined value is loaded via the
stack, just with a MOV instead of a POP.

[1] https://patchwork.kernel.org/patch/10771539/#22441255

Fixes: 0cb5b306 ("kvm: vmx: Scrub hardware GPRs at VM-exit")
Cc: <stable@vger.kernel.org>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0e0ab73c

KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run · 61c08aa9

由 Sean Christopherson 提交于 1月 25, 2019

The vCPU-run asm blob does a manual comparison of a VMCS' launched
status to execute the correct VM-Enter instruction, i.e. VMLAUNCH vs.
VMRESUME.  The launched flag is a bool, which is a typedef of _Bool.
C99 does not define an exact size for _Bool, stating only that is must
be large enough to hold '0' and '1'.  Most, if not all, compilers use
a single byte for _Bool, including gcc[1].

Originally, 'launched' was of type 'int' and so the asm blob used 'cmpl'
to check the launch status.  When 'launched' was moved to be stored on a
per-VMCS basis, struct vcpu_vmx's "temporary" __launched flag was added
in order to avoid having to pass the current VMCS into the asm blob.
The new  '__launched' was defined as a 'bool' and not an 'int', but the
'cmp' instruction was not updated.

This has not caused any known problems, likely due to compilers aligning
variables to 4-byte or 8-byte boundaries and KVM zeroing out struct
vcpu_vmx during allocation.  I.e. vCPU-run accesses "junk" data, it just
happens to always be zero and so doesn't affect the result.

[1] https://gcc.gnu.org/ml/gcc-patches/2000-10/msg01127.html

Fixes: d462b819 ("KVM: VMX: Keep list of loaded VMCSs, instead of vcpus")
Cc: <stable@vger.kernel.org>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61c08aa9

31 1月, 2019 1 次提交

cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · b284909a

由 Josh Poimboeuf 提交于 1月 30, 2019

With the following commit:

  73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")

... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled.  However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.

The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case.  So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.

Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:

  1) /sys/devices/system/cpu/smt/control showed "on" instead of
     "notsupported"; and

  2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

I'd propose that we instead consider #1 above to not actually be a
problem.  Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later.  So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).

The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.

So fix it by:

  a) reverting the original "fix" and its followup fix:

     73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
     bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")

     and

  b) changing vmx_vm_init() to query the actual current SMT state --
     instead of the sysfs control value -- to determine whether the L1TF
     warning is needed.  This also requires the 'sched_smt_present'
     variable to exported, instead of 'cpu_smt_control'.

Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com

b284909a

26 1月, 2019 4 次提交

KVM: x86: Mark expected switch fall-throughs · b2869f28

由 Gustavo A. R. Silva 提交于 1月 25, 2019

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b2869f28

KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function · 5ad6ece8

由 Sean Christopherson 提交于 1月 15, 2019

...along with the function's STACK_FRAME_NON_STANDARD tag.  Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.

Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter().  Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.

Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined.  Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function.  Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.

Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone.  At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function.  Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.

Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.

Fixes: 453eafbe ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: NQian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ad6ece8

kvm: vmx: fix some -Wmissing-prototypes warnings · 8997f657

由 Yi Wang 提交于 1月 21, 2019

We get some warnings when building kernel with W=1:
arch/x86/kvm/vmx/vmx.c:426:5: warning: no previous prototype for ‘kvm_fill_hv_flush_list_func’ [-Wmissing-prototypes]
arch/x86/kvm/vmx/nested.c:58:6: warning: no previous prototype for ‘init_vmcs_shadow_fields’ [-Wmissing-prototypes]

Make them static to fix this.
Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8997f657

KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL · 85ba2b16

由 Sean Christopherson 提交于 1月 14, 2019

Fix a recently introduced bug that results in the wrong VMCS control
field being updated when applying a IA32_PERF_GLOBAL_CTRL errata.

Fixes: c73da3fc ("KVM: VMX: Properly handle dynamic VM Entry/Exit controls")
Reported-by: NHarald Arnesen <harald@skogtun.org>
Tested-by: NHarald Arnesen <harald@skogtun.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

85ba2b16

12 1月, 2019 2 次提交

KVM/VMX: Avoid return error when flush tlb successfully in the hv_remote_flush_tlb_with_range() · b7c1c226

由 Lan Tianyu 提交于 1月 04, 2019

The "ret" is initialized to be ENOTSUPP. The return value of
__hv_remote_flush_tlb_with_range() will be Or with "ret" when ept
table potiners are mismatched. This will cause return ENOTSUPP even if
flush tlb successfully. This patch is to fix the issue and set
"ret" to 0.

Fixes: a5c214da ("KVM/VMX: Change hv flush logic when ept tables are mismatched.")
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b7c1c226

KVM: x86: Fix bit shifting in update_intel_pt_cfg · d14eff1b

由 Gustavo A. R. Silva 提交于 12月 26, 2018

ctl_bitmask in pt_desc is of type u64. When an integer like 0xf is
being left shifted more than 32 bits, the behavior is undefined.

Fix this by adding suffix ULL to integer 0xf.

Addresses-Coverity-ID: 1476095 ("Bad bit shift operation")
Fixes: 6c0f0bba ("KVM: x86: Introduce a function to initialize the PT configuration")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: NWei Yang <richardw.yang@linux.intel.com>
Reviewed-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d14eff1b

21 12月, 2018 12 次提交

KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines · 453eafbe

由 Sean Christopherson 提交于 12月 20, 2018

Transitioning to/from a VMX guest requires KVM to manually save/load
the bulk of CPU state that the guest is allowed to direclty access,
e.g. XSAVE state, CR2, GPRs, etc...  For obvious reasons, loading the
guest's GPR snapshot prior to VM-Enter and saving the snapshot after
VM-Exit is done via handcoded assembly.  The assembly blob is written
as inline asm so that it can easily access KVM-defined structs that
are used to hold guest state, e.g. moving the blob to a standalone
assembly file would require generating defines for struct offsets.

The other relevant aspect of VMX transitions in KVM is the handling of
VM-Exits.  KVM doesn't employ a separate VM-Exit handler per se, but
rather treats the VMX transition as a mega instruction (with many side
effects), i.e. sets the VMCS.HOST_RIP to a label immediately following
VMLAUNCH/VMRESUME.  The label is then exposed to C code via a global
variable definition in the inline assembly.

Because of the global variable, KVM takes steps to (attempt to) ensure
only a single instance of the owning C function, e.g. vmx_vcpu_run, is
generated by the compiler.  The earliest approach placed the inline
assembly in a separate noinline function[1].  Later, the assembly was
folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which
is still used today.

After moving to __noclone, an edge case was encountered where GCC's
-ftracer optimization resulted in the inline assembly blob being
duplicated.  This was "fixed" by explicitly disabling -ftracer in the
__noclone definition[4].

Recently, it was found that disabling -ftracer causes build warnings
for unsuspecting users of __noclone[5], and more importantly for KVM,
prevents the compiler for properly optimizing vmx_vcpu_run()[6].  And
perhaps most importantly of all, it was pointed out that there is no
way to prevent duplication of a function with 100% reliability[7],
i.e. more edge cases may be encountered in the future.

So to summarize, the only way to prevent the compiler from duplicating
the global variable definition is to move the variable out of inline
assembly, which has been suggested several times over[1][7][8].

Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and
VM-Exit "handler" to standalone assembly sub-routines.  Moving only
the core VMX transition codes allows the struct indexing to remain as
inline assembly and also allows the sub-routines to be used by
nested_vmx_check_vmentry_hw().  Reusing the sub-routines has a happy
side-effect of eliminating two VMWRITEs in the nested_early_check path
as there is no longer a need to dynamically change VMCS.HOST_RIP.

Note that callers to vmx_vmenter() must account for the CALL modifying
RSP, e.g. must subtract op-size from RSP when synchronizing RSP with
VMCS.HOST_RSP and "restore" RSP prior to the CALL.  There are no great
alternatives to fudging RSP.  Saving RSP in vmx_enter() is difficult
because doing so requires a second register (VMWRITE does not provide
an immediate encoding for the VMCS field and KVM supports Hyper-V's
memory-based eVMCS ABI).  The other more drastic alternative would be
to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu
variable (which can be encoded as e.g. gs:[imm]).  But because a valid
stack is needed at the time of VM-Exit (NMIs aren't blocked and a user
could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a
dedicated per-cpu VM-Exit stack would be required.  A dedicated stack
isn't difficult to implement, but it would require at least one page
per CPU and knowledge of the stack in the dumpstack routines.  And in
most cases there is essentially zero overhead in dynamically updating
VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first
VMLAUNCH unless nested_early_check=1, which is not a fast path.  In
other words, avoiding the VMCS.HOST_RSP by using a dedicated stack
would only make the code marginally less ugly while requiring at least
one page per CPU and forcing the kernel to be aware (and approve) of
the VM-Exit stack shenanigans.

[1] cea15c24ca39 ("KVM: Move KVM context switch into own function")
[2] a3b5ba49 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run")
[3] 104f226b ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()")
[4] 95272c29 ("compiler-gcc: disable -ftracer for __noclone functions")
[5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble
[6] https://patchwork.kernel.org/patch/8707981/#21817015
[7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz
[8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.comSuggested-by: NAndi Kleen <ak@linux.intel.com>
Suggested-by: NMartin Jambor <mjambor@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

453eafbe

KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs · 051a2d3e

由 Sean Christopherson 提交于 12月 20, 2018

Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the
'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run()
and nested_vmx_check_vmentry_hw().  Using the symbolic name means that
adding/removing an output parameter(s) requires "rewriting" almost all
of the asm blob, which makes it nearly impossible to understand what's
being changed in even the most minor patches.

Opportunistically improve the code comments.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

051a2d3e

KVM/VMX: Add hv tlb range flush support · 1f3a3e46

由 Lan Tianyu 提交于 12月 06, 2018

This patch is to register tlb_remote_flush_with_range callback with
hv tlb range flush interface.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f3a3e46

KVM: x86: Disable Intel PT when VMXON in L1 guest · ee85dec2

由 Luwei Kang 提交于 10月 24, 2018

Currently, Intel Processor Trace do not support tracing in L1 guest
VMX operation(IA32_VMX_MISC[bit 14] is 0). As mentioned in SDM,
on these type of processors, execution of the VMXON instruction will
clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL
causes a general-protection exception (#GP).
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee85dec2

KVM: x86: Set intercept for Intel PT MSRs read/write · b08c2896

由 Chao Peng 提交于 10月 24, 2018

To save performance overhead, disable intercept Intel PT MSRs
read/write when Intel PT is enabled in guest.
MSR_IA32_RTIT_CTL is an exception that will always be intercepted.
Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b08c2896

KVM: x86: Implement Intel PT MSRs read/write emulation · bf8c55d8

由 Chao Peng 提交于 10月 24, 2018

This patch implement Intel Processor Trace MSRs read/write
emulation.
Intel PT MSRs read/write need to be emulated when Intel PT
MSRs is intercepted in guest and during live migration.
Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bf8c55d8

KVM: x86: Introduce a function to initialize the PT configuration · 6c0f0bba

由 Luwei Kang 提交于 10月 24, 2018

Initialize the Intel PT configuration when cpuid update.
Include cpuid inforamtion, rtit_ctl bit mask and the number of
address ranges.
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c0f0bba

KVM: x86: Add Intel PT context switch for each vcpu · 2ef444f1

由 Chao Peng 提交于 10月 24, 2018

Load/Store Intel Processor Trace register in context switch.
MSR IA32_RTIT_CTL is loaded/stored automatically from VMCS.
In Host-Guest mode, we need load/resore PT MSRs only when PT
is enabled in guest.
Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2ef444f1

KVM: x86: Add Intel Processor Trace cpuid emulation · 86f5201d

由 Chao Peng 提交于 10月 24, 2018

Expose Intel Processor Trace to guest only when
the PT works in Host-Guest mode.
Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

86f5201d

KVM: x86: Add Intel PT virtualization work mode · f99e3daf

由 Chao Peng 提交于 10月 24, 2018

Intel Processor Trace virtualization can be work in one
of 2 possible modes:

a. System-Wide mode (default):
   When the host configures Intel PT to collect trace packets
   of the entire system, it can leave the relevant VMX controls
   clear to allow VMX-specific packets to provide information
   across VMX transitions.
   KVM guest will not aware this feature in this mode and both
   host and KVM guest trace will output to host buffer.

b. Host-Guest mode:
   Host can configure trace-packet generation while in
   VMX non-root operation for guests and root operation
   for native executing normally.
   Intel PT will be exposed to KVM guest in this mode, and
   the trace output to respective buffer of host and guest.
   In this mode, tht status of PT will be saved and disabled
   before VM-entry and restored after VM-exit if trace
   a virtual machine.
Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f99e3daf

kvm: vmx: Allow guest read access to IA32_TSC · 788fc1e9

由 Jim Mattson 提交于 11月 09, 2018

Let the guest read the IA32_TSC MSR with the generic RDMSR instruction
as well as the specific RDTSC(P) instructions. Note that the hardware
applies the TSC multiplier and offset (when applicable) to the result of
RDMSR(IA32_TSC), just as it does to the result of RDTSC(P).
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NMarc Orr <marcorr@google.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

788fc1e9

KVM: VMX: Remove duplicated include from vmx.c · ba7424b2

由 YueHaibing 提交于 12月 18, 2018

Remove duplicated include.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

ba7424b2

15 12月, 2018 10 次提交

kvm: x86: Dynamically allocate guest_fpu · b666a4b6

由 Marc Orr 提交于 11月 06, 2018

Previously, the guest_fpu field was embedded in the kvm_vcpu_arch
struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my
current setup). This bloats the kvm_vcpu_arch struct for x86 into an
order 3 memory allocation, which can become a problem on overcommitted
machines. Thus, this patch moves the fpu state outside of the
kvm_vcpu_arch struct.

With this patch applied, the kvm_vcpu_arch struct is reduced to 15168
bytes for vmx on my setup when building the kernel with kvmconfig.
Suggested-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NMarc Orr <marcorr@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b666a4b6

KVM/VMX: Check ept_pointer before flushing ept tlb · 53963a70

由 Lan Tianyu 提交于 12月 06, 2018

This patch is to initialize ept_pointer to INVALID_PAGE and check it
before flushing ept tlb. If ept_pointer is invalid, bypass the flush
request.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53963a70

kvm: vmx: add cpu into VMX preemption timer bug list · 3d82c565

由 Wei Huang 提交于 12月 03, 2018

This patch adds Intel "Xeon CPU E3-1220 V2", with CPUID.01H.EAX=0x000306A8,
into the list of known broken CPUs which fail to support VMX preemption
timer. This bug was found while running the APIC timer test of
kvm-unit-test on this specific CPU, even though the errata info can't be
located in the public domain for this CPU.
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3d82c565

kvm: vmx: Skip all SYSCALL MSRs in setup_msrs() when !EFER.SCE · 84c8c5b8

由 Jim Mattson 提交于 12月 05, 2018

Like IA32_STAR, IA32_LSTAR and IA32_FMASK only need to contain guest
values on VM-entry when the guest is in long mode and EFER.SCE is set.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NMarc Orr <marcorr@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

84c8c5b8

kvm: vmx: Don't set hardware IA32_CSTAR MSR on VM-entry · db31c8f5

由 Jim Mattson 提交于 12月 05, 2018

SYSCALL raises #UD in compatibility mode on Intel CPUs, so it's
pointless to load the guest's IA32_CSTAR value into the hardware MSR.

IA32_CSTAR still provides 48 bits of storage on Intel CPUs that have
CPUID.80000001:EDX.LM[bit 29] set, so we cannot remove it from the
vmx_msr_index[] array.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db31c8f5

kvm: vmx: Document the need for MSR_STAR in i386 builds · 898a811f

由 Jim Mattson 提交于 12月 05, 2018

Add a comment explaining why MSR_STAR must be included in
vmx_msr_index[] even for i386 builds.

The elided comment has not been relevant since move_msr_up() was
introduced in commit a75beee6 ("KVM: VMX: Avoid saving and
restoring msrs on lightweight vmexit").
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

898a811f

kvm: vmx: Set IA32_TSC_AUX for legacy mode guests · 0023ef39

由 Jim Mattson 提交于 12月 05, 2018

RDTSCP is supported in legacy mode as well as long mode. The
IA32_TSC_AUX MSR should be set to the correct guest value before
entering any guest that supports RDTSCP.

Fixes: 4e47c7a6 ("KVM: VMX: Add instruction rdtscp support for guest")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NMarc Orr <marcorr@google.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0023ef39

KVM: nVMX: Move nested code to dedicated files · 55d2375e

由 Sean Christopherson 提交于 12月 03, 2018

From a functional perspective, this is (supposed to be) a straight
copy-paste of code.  Code was moved piecemeal to nested.c as not all
code that could/should be moved was obviously nested-only.  The nested
code was then re-ordered as needed to compile, i.e. stats may not show
this is being a "pure" move despite there not being any intended changes
in functionality.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

55d2375e

KVM: VMX: Expose nested_vmx_allowed() to nested VMX as a non-inline · 7c97fcb3

由 Sean Christopherson 提交于 12月 03, 2018

Exposing only the function allows @nested, i.e. the module param, to be
statically defined in vmx.c, ensuring we aren't unnecessarily checking
said variable in the nested code. nested_vmx_allowed() is exposed due
to the need to verify nested support in vmx_{get,set}_nested_state().
The downside is that nested_vmx_allowed() likely won't be inlined in
vmx_{get,set}_nested_state(), but that should be a non-issue as they're
not a hot path. Keeping vmx_{get,set}_nested_state() in vmx.c isn't a
viable option as they need access to several nested-only functions.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c97fcb3

KVM: VMX: Expose various getters and setters to nested VMX · 97b7ead3

由 Sean Christopherson 提交于 12月 03, 2018

...as they're used directly by the nested code.  This will allow
moving the bulk of the nested code out of vmx.c without concurrent
changes to vmx.h.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

97b7ead3

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功