提交 · 6e42782f516f05c8030f63308f2457681b1c9919 · openanolis / cloud-kernel

06 8月, 2018 33 次提交

kvm: x86: Introduce KVM_REQ_LOAD_CR3 · 6e42782f

由 Junaid Shahid 提交于 6月 27, 2018

The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the
current root_hpa.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e42782f

kvm: x86: Introduce kvm_mmu_calc_root_page_role() · 9fa72119

由 Junaid Shahid 提交于 6月 27, 2018

These functions factor out the base role calculation from the
corresponding kvm_init_*_mmu() functions. The new functions return
what would be the role assigned to a root page in the current VCPU
state. This can be masked with mmu_base_role_mask to derive the base
role.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9fa72119

kvm: x86: Add fast CR3 switch code path · 7c390d35

由 Junaid Shahid 提交于 6月 27, 2018

When using shadow paging, a CR3 switch in the guest results in a VM Exit.
In the common case, that VM exit doesn't require much processing by KVM.
However, it does acquire the MMU lock, which can start showing signs of
contention under some workloads even on a 2 VCPU VM when the guest is
using KPTI. Therefore, we add a fast path that avoids acquiring the MMU
lock in the most common cases e.g. when switching back and forth between
the kernel and user mode CR3s used by KPTI with no guest page table
changes in between.

For now, this fast path is implemented only for 64-bit guests and hosts
to avoid the handling of PDPTEs, but it can be extended later to 32-bit
guests and/or hosts as well.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c390d35

kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is needed · 578e1c4d

由 Junaid Shahid 提交于 6月 27, 2018

kvm_mmu_sync_roots() can locklessly check whether a sync is needed and just
bail out if it isn't.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

578e1c4d

kvm: x86: Make sync_page() flush remote TLBs once only · 5ce4786f

由 Junaid Shahid 提交于 6月 27, 2018

sync_page() calls set_spte() from a loop across a page table. It would
work better if set_spte() left the TLB flushing to its callers, so that
sync_page() can aggregate into a single call.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ce4786f

KVM: MMU: drop vcpu param in gpte_access · 42522d08

由 Peter Xu 提交于 7月 18, 2018

It's never used.  Drop it.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42522d08

KVM: nVMX: Separate logic allocating shadow vmcs to a function · abfc52c6

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
This is done as a preparation for VMCS shadowing virtualization.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

abfc52c6

KVM: VMX: Mark vmcs header as shadow in case alloc_vmcs_cpu() allocate shadow vmcs · 491a6038

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

491a6038

KVM: nVMX: Expose VMCS shadowing to L1 guest · 32c7acf0

由 Liran Alon 提交于 6月 23, 2018

Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU,
whether or not VMCS shadowing is supported by the physical CPU.
(VMCS shadowing emulation)

Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a
VM-exit to L1.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

32c7acf0

KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by... · a7cde481

由 Liran Alon 提交于 6月 23, 2018

KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps

This is done as a preparation for VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a7cde481

KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2 · 6d894f49

由 Liran Alon 提交于 6月 23, 2018

This is done as a preparation to VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d894f49

KVM: selftests: add tests for shadow VMCS save/restore · 9a78bdf3

由 Paolo Bonzini 提交于 7月 29, 2018

This includes setting up the shadow VMCS and the secondary execution
controls in lib/vmx.c.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a78bdf3

KVM: nVMX: include shadow vmcs12 in nested state · fa58a9fa

由 Paolo Bonzini 提交于 7月 18, 2018

The shadow vmcs12 cannot be flushed on KVM_GET_NESTED_STATE,
because at that point guest memory is assumed by userspace to
be immutable.  Capture the cache in vmx_get_nested_state, adding
another page at the end if there is an active shadow vmcs12.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa58a9fa

KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExit · 61ada748

由 Liran Alon 提交于 6月 23, 2018

This is done is done as a preparation to VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61ada748

KVM: nVMX: Verify VMCS shadowing VMCS link pointer · f145d90d

由 Liran Alon 提交于 6月 23, 2018

Intel SDM considers these checks to be part of
"Checks on Guest Non-Register State".

Note that it is legal for vmcs->vmcs_link_pointer to be -1ull
when VMCS shadowing is enabled. In this case, any VMREAD/VMWRITE to
shadowed-field sets the ALU flags for VMfailInvalid (i.e. CF=1).
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f145d90d

KVM: nVMX: Verify VMCS shadowing controls · a8a7c02b

由 Liran Alon 提交于 6月 23, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8a7c02b

KVM: nVMX: Introduce nested_cpu_has_shadow_vmcs() · f792d274

由 Liran Alon 提交于 6月 23, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f792d274

KVM: nVMX: Fail VMLAUNCH and VMRESUME on shadow VMCS · a6192d40

由 Liran Alon 提交于 6月 23, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6192d40

KVM: nVMX: Allow VMPTRLD for shadow VMCS if vCPU supports VMCS shadowing · fa97d7db

由 Liran Alon 提交于 7月 18, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa97d7db

KVM: VMX: Change vmcs12_{read,write}_any() to receive vmcs12 as parameter · e2536742

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
This is done as a preparation for VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e2536742

KVM: VMX: Create struct for VMCS header · 392b2f25

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

392b2f25

P
kvm: selftests: add test for nested state save/restore · cb547637
由 Paolo Bonzini 提交于 7月 28, 2018
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
cb547637

kvm: nVMX: Introduce KVM_CAP_NESTED_STATE · 8fcc4b59

由 Jim Mattson 提交于 7月 10, 2018

For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.

With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NJim Mattson <jmattson@google.com>
[karahmed@ - rename structs and functions and make them ready for AMD and
             address previous comments.
           - handle nested.smm state.
           - rebase & a bit of refactoring.
           - Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8fcc4b59

KVM: x86: do not load vmcs12 pages while still in SMM · 7f7f1ba3

由 Paolo Bonzini 提交于 7月 18, 2018

If the vCPU enters system management mode while running a nested guest,
RSM starts processing the vmentry while still in SMM.  In that case,
however, the pages pointed to by the vmcs12 might be incorrectly
loaded from SMRAM.  To avoid this, delay the handling of the pages
until just before the next vmentry.  This is done with a new request
and a new entry in kvm_x86_ops, which we will be able to reuse for
nested VMX state migration.

Extracted from a patch by Jim Mattson and KarimAllah Ahmed.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f7f1ba3

kvm: selftests: add basic test for state save and restore · fa3899ad

由 Paolo Bonzini 提交于 7月 26, 2018

The test calls KVM_RUN repeatedly, and creates an entirely new VM with the
old memory and vCPU state on every exit to userspace. The kvm_util API is
expanded with two functions that manage the lifetime of a kvm_vm struct:
the first closes the file descriptors and leaves the memory allocated,
and the second opens the file descriptors and reuses the memory from
the previous incarnation of the kvm_vm struct.

For now the test is very basic, as it does not test for example XSAVE or
vCPU events. However, it will test nested virtualization state starting
with the next patch.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa3899ad

kvm: selftests: ensure vcpu file is released · 0a505fe6

由 Paolo Bonzini 提交于 7月 26, 2018

The selftests were not munmap-ing the kvm_run area from the vcpu file descriptor.
The result was that kvm_vcpu_release was not called and a reference was left in the
parent "struct kvm". Ultimately this was visible in the upcoming state save/restore
test as an error when KVM attempted to create a duplicate debugfs entry.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a505fe6

kvm: selftests: actually use all of lib/vmx.c · 87ccb7db

由 Paolo Bonzini 提交于 7月 28, 2018

The allocation of the VMXON and VMCS is currently done twice, in
lib/vmx.c and in vmx_tsc_adjust_test.c. Reorganize the code to
provide a cleaner and easier to use API to the tests. lib/vmx.c
now does the complete setup of the VMX data structures, but does not
create the VM or set CPUID. This has to be done by the caller.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87ccb7db

kvm: selftests: create a GDT and TSS · 2305339e

由 Paolo Bonzini 提交于 7月 28, 2018

The GDT and the TSS base were left to zero, and this has interesting effects
when the TSS descriptor is later read to set up a VMCS's TR_BASE.  Basically
it worked by chance, and this patch fixes it by setting up all the protected
mode data structures properly.

Because the GDT and TSS addresses are virtual, the page tables now always
exist at the time of vcpu setup.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2305339e

KVM: x86: ensure all MSRs can always be KVM_GET/SET_MSR'd · 44883f01

由 Paolo Bonzini 提交于 7月 26, 2018

Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back
to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you
they are only accepted under special conditions.  This makes the API a pain to
use.

To avoid this pain, this patch makes it so that the result of the get-list
ioctl can always be used for host-initiated get and set.  Since we don't have
a separate way to check for read-only MSRs, this means some Hyper-V MSRs are
ignored when written.  Arguably they should not even be in the result of
GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the
outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding
Hyper-V feature.

Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44883f01

KVM: vmx: remove save/restore of host BNDCGFS MSR · cf81a7e5

由 Sean Christopherson 提交于 7月 11, 2018

Linux does not support Memory Protection Extensions (MPX) in the
kernel itself, thus the BNDCFGS (Bound Config Supervisor) MSR will
always be zero in the KVM host, i.e. RDMSR in vmx_save_host_state()
is superfluous.  KVM unconditionally sets VM_EXIT_CLEAR_BNDCFGS,
i.e. BNDCFGS will always be zero after VMEXIT, thus manually loading
BNDCFGS is also superfluous.

And in the event the MPX kernel support is added (unlikely given
that MPX for userspace is in its death throes[1]), BNDCFGS will
likely be common across all CPUs[2], and at the least shouldn't
change on a regular basis, i.e. saving the MSR on every VMENTRY is
completely unnecessary.

WARN_ONCE in hardware_setup() if the host's BNDCFGS is non-zero to
document that KVM does not preserve BNDCFGS and to serve as a hint
as to how BNDCFGS likely should be handled if MPX is used in the
kernel, e.g. BNDCFGS should be saved once during KVM setup.

[1] https://lkml.org/lkml/2018/4/27/1046
[2] http://www.openwall.com/lists/kernel-hardening/2017/07/24/28Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf81a7e5

KVM: Switch 'requests' to be 64-bit (explicitly) · 86dafed5

由 KarimAllah Ahmed 提交于 7月 10, 2018

Switch 'requests' to be explicitly 64-bit and update BUILD_BUG_ON check to
use the size of "requests" instead of the hard-coded '32'.

That gives us a bit more room again for arch-specific requests as we
already ran out of space for x86 due to the hard-coded check.

The only exception here is ARM32 as it is still 32-bits.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

86dafed5

kvm: selftests: add cr4_cpuid_sync_test · ca359066

由 Wei Huang 提交于 6月 25, 2018

KVM is supposed to update some guest VM's CPUID bits (e.g. OSXSAVE) when
CR4 is changed. A bug was found in KVM recently and it was fixed by
Commit c4d21882 ("KVM: x86: Update cpuid properly when CR4.OSXAVE or
CR4.PKE is changed"). This patch adds a test to verify the synchronization
between guest VM's CR4 and CPUID bits.
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ca359066

P
Merge tag 'v4.18-rc6' into HEAD · d2ce98ca
由 Paolo Bonzini 提交于 8月 06, 2018
```
Pull bug fixes into the KVM development tree to avoid nasty conflicts.
```
d2ce98ca

02 8月, 2018 2 次提交

Merge tag 'kvm-s390-next-4.19-1' of... · 85eae57b

由 Paolo Bonzini 提交于 8月 02, 2018

Merge tag 'kvm-s390-next-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features for 4.19

- initial version for host large page support. Must be enabled with
  module parameter hpage=1 and will conflict with the nested=1
  parameter.
- enable etoken facility for guests
- Fixes

85eae57b

Merge tag 'kvm-ppc-next-4.19-1' of... · 3a1174cd

由 Paolo Bonzini 提交于 8月 02, 2018

Merge tag 'kvm-ppc-next-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 4.19.

This update adds no new features; it just has some minor code cleanups
and bug fixes, including a fix to allow us to create KVM_MAX_VCPUS
vCPUs on POWER9 in all CPU threading modes.

3a1174cd

31 7月, 2018 4 次提交

Merge tag 'hlp_stage1' of... · 23758461

由 Janosch Frank 提交于 7月 30, 2018

Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next

KVM: s390: initial host large page support

- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet

23758461

KVM: s390: Add huge page enablement control · a4499382

由 Janosch Frank 提交于 7月 13, 2018

General KVM huge page support on s390 has to be enabled via the
kvm.hpage module parameter. Either nested or hpage can be enabled, as
we currently do not support vSIE for huge backed guests. Once the vSIE
support is added we will either drop the parameter or enable it as
default.

For a guest the feature has to be enabled through the new
KVM_CAP_S390_HPAGE_1M capability and the hpage module
parameter. Enabling it means that cmm can't be enabled for the vm and
disables pfmf and storage key interpretation.

This is due to the fact that in some cases, in upcoming patches, we
have to split huge pages in the guest mapping to be able to set more
granular memory protection on 4k pages. These split pages have fake
page tables that are not visible to the Linux memory management which
subsequently will not manage its PGSTEs, while the SIE will. Disabling
these features lets us manage PGSTE data in a consistent matter and
solve that problem.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>

a4499382

s390/mm: Add huge page gmap linking support · a9e00d83

由 Janosch Frank 提交于 7月 13, 2018

Let's allow huge pmd linking when enabled through the
KVM_CAP_S390_HPAGE_1M capability. Also we can now restrict gmap
invalidation and notification to the cases where the capability has
been activated and save some cycles when that's not the case.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>

a9e00d83

s390/mm: hugetlb pages within a gmap can not be freed · 7d735b9a

由 Dominik Dingel 提交于 7月 13, 2018

Guests backed by huge pages could theoretically free unused pages via
the diagnose 10 instruction. We currently don't allow that, so we
don't have to refault it once it's needed again.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

7d735b9a

30 7月, 2018 1 次提交

KVM: s390: Beautify skey enable check · 57cb198c

由 Janosch Frank 提交于 7月 20, 2018

Let's introduce an explicit check if skeys have already been enabled
for the vcpu, so we don't have to check the mm context if we don't have
the storage key facility.

This lets us check for enablement without having to take the mm
semaphore and thus speedup skey emulation.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NFarhan Ali <alifm@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

57cb198c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功