提交 · f7d31e65368aeef973fab788aa22c4f1d5a6af66 · openeuler / Kernel

01 6月, 2020 2 次提交

x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit · f7d31e65

由 Jon Doron 提交于 4月 24, 2020

The problem the patch is trying to address is the fact that 'struct
kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit
modes.

In 64-bit mode the default alignment boundary is 64 bits thus
forcing extra gaps after 'type' and 'msr' but in 32-bit mode the
boundary is at 32 bits thus no extra gaps.

This is an issue as even when the kernel is 64 bit, the userspace using
the interface can be both 32 and 64 bit but the same 32 bit userspace has
to work with 32 bit kernel.

The issue is fixed by forcing the 64 bit layout, this leads to ABI
change for 32 bit builds and while we are obviously breaking '32 bit
userspace with 32 bit kernel' case, we're fixing the '32 bit userspace
with 64 bit kernel' one.

As the interface has no (known) users and 32 bit KVM is rather baroque
nowadays, this seems like a reasonable decision.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NJon Doron <arilou@gmail.com>
Message-Id: <20200424113746.3473563-2-arilou@gmail.com>
Reviewed-by: NRoman Kagan <rvkagan@yandex-team.ru>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f7d31e65

KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT · 72de5fa4

由 Vitaly Kuznetsov 提交于 5月 25, 2020

Introduce new capability to indicate that KVM supports interrupt based
delivery of 'page ready' APF events. This includes support for both
MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-8-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72de5fa4

25 4月, 2020 1 次提交

kvm: add capability for halt polling · acd05785

由 David Matlack 提交于 4月 17, 2020

KVM_CAP_HALT_POLL is a per-VM capability that lets userspace
control the halt-polling time, allowing halt-polling to be tuned or
disabled on particular VMs.

With dynamic halt-polling, a VM's VCPUs can poll from anywhere from
[0, halt_poll_ns] on each halt. KVM_CAP_HALT_POLL sets the
upper limit on the poll time.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NJon Cargille <jcargill@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20200417221446.108733-1-jcargill@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

acd05785

26 3月, 2020 1 次提交

KVM: PPC: Book3S HV: Add a capability for enabling secure guests · 9a5788c6

由 Paul Mackerras 提交于 3月 19, 2020

At present, on Power systems with Protected Execution Facility
hardware and an ultravisor, a KVM guest can transition to being a
secure guest at will.  Userspace (QEMU) has no way of knowing
whether a host system is capable of running secure guests.  This
will present a problem in future when the ultravisor is capable of
migrating secure guests from one host to another, because
virtualization management software will have no way to ensure that
secure guests only run in domains where all of the hosts can
support secure guests.

This adds a VM capability which has two functions: (a) userspace
can query it to find out whether the host can support secure guests,
and (b) userspace can enable it for a guest, which allows that
guest to become a secure guest.  If userspace does not enable it,
KVM will return an error when the ultravisor does the hypercall
that indicates that the guest is starting to transition to a
secure guest.  The ultravisor will then abort the transition and
the guest will terminate.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NRam Pai <linuxram@us.ibm.com>

9a5788c6

17 3月, 2020 1 次提交

KVM: x86: enable dirty log gradually in small chunks · 3c9bd400

由 Jay Zhou 提交于 2月 27, 2020

It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:

1. Initialize all the bits of the dirty bitmap to 1 when enabling
   dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
   SPTEs gradually in small chunks

Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:

VM Size        Before    After optimization
128G           460ms     10ms
Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3c9bd400

28 2月, 2020 4 次提交

KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTED · 13da9ae1

由 Christian Borntraeger 提交于 2月 18, 2020

Now that everything is in place, we can announce the feature.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>

13da9ae1

KVM: s390: protvirt: UV calls in support of diag308 0, 1 · e0d2773d

由 Janosch Frank 提交于 5月 09, 2019

diag 308 subcode 0 and 1 require several KVM and Ultravisor interactions.
Specific to these "soft" reboots are

* The "unshare all" UVC
* The "prepare for reset" UVC
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

e0d2773d

KVM: S390: protvirt: Introduce instruction data area bounce buffer · 19e12277

由 Janosch Frank 提交于 4月 02, 2019

Now that we can't access guest memory anymore, we have a dedicated
satellite block that's a bounce buffer for instruction data.

We re-use the memop interface to copy the instruction data to / from
userspace. This lets us re-use a lot of QEMU code which used that
interface to make logical guest memory accesses which are not possible
anymore in protected mode anyway.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

19e12277

KVM: s390: protvirt: Add initial vm and cpu lifecycle handling · 29b40f10

由 Janosch Frank 提交于 9月 30, 2019

This contains 3 main changes:
1. changes in SIE control block handling for secure guests
2. helper functions for create/destroy/unpack secure guests
3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
machines
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

29b40f10

31 1月, 2020 1 次提交

KVM: s390: Add new reset vcpu API · 7de3f142

由 Janosch Frank 提交于 1月 31, 2020

The architecture states that we need to reset local IRQs for all CPU
resets. Because the old reset interface did not support the normal CPU
reset we never did that on a normal reset.

Let's implement an interface for the missing normal and clear resets
and reset all local IRQs, registers and control structures as stated
in the architecture.

Userspace might already reset the registers via the vcpu run struct,
but as we need the interface for the interrupt clearing part anyway,
we implement the resets fully and don't rely on userspace to reset the
rest.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20200131100205.74720-4-frankja@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

7de3f142

28 11月, 2019 1 次提交

KVM: PPC: Book3S HV: Support reset of secure guest · 22945688

由 Bharata B Rao 提交于 11月 25, 2019

Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
This ioctl will be issued by QEMU during reset and includes the
the following steps:

- Release all device pages of the secure guest.
- Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
- Unpin the VPA pages so that they can be migrated back to secure
  side when guest becomes secure again. This is required because
  pinned pages can't be migrated.
- Reinit the partition scoped page tables

After these steps, guest is ready to issue UV_ESM call once again
to switch to secure mode.
Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
	[Implementation of uv_svm_terminate() and its call from
	guest shutdown path]
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
	[Unpinning of VPA pages]
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

22945688

22 10月, 2019 3 次提交

KVM: arm64: Provide VCPU attributes for stolen time · 58772e9a

由 Steven Price 提交于 10月 21, 2019

Allow user space to inform the KVM host where in the physical memory
map the paravirtualized time structures should be located.

User space can set an attribute on the VCPU providing the IPA base
address of the stolen time structure for that VCPU. This must be
repeated for every VCPU in the VM.

The address is given in terms of the physical address visible to
the guest and must be 64 byte aligned. The guest will discover the
address via a hypercall.
Signed-off-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

58772e9a

KVM: arm/arm64: Allow user injection of external data aborts · da345174

由 Christoffer Dall 提交于 10月 11, 2019

In some scenarios, such as buggy guest or incorrect configuration of the
VMM and firmware description data, userspace will detect a memory access
to a portion of the IPA, which is not mapped to any MMIO region.

For this purpose, the appropriate action is to inject an external abort
to the guest.  The kernel already has functionality to inject an
external abort, but we need to wire up a signal from user space that
lets user space tell the kernel to do this.

It turns out, we already have the set event functionality which we can
perfectly reuse for this.
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

da345174

KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d

由 Christoffer Dall 提交于 10月 11, 2019

For a long time, if a guest accessed memory outside of a memslot using
any of the load/store instructions in the architecture which doesn't
supply decoding information in the ESR_EL2 (the ISV bit is not set), the
kernel would print the following message and terminate the VM as a
result of returning -ENOSYS to userspace:

  load/store instruction decoding not implemented

The reason behind this message is that KVM assumes that all accesses
outside a memslot is an MMIO access which should be handled by
userspace, and we originally expected to eventually implement some sort
of decoding of load/store instructions where the ISV bit was not set.

However, it turns out that many of the instructions which don't provide
decoding information on abort are not safe to use for MMIO accesses, and
the remaining few that would potentially make sense to use on MMIO
accesses, such as those with register writeback, are not used in
practice.  It also turns out that fetching an instruction from guest
memory can be a pretty horrible affair, involving stopping all CPUs on
SMP systems, handling multiple corner cases of address translation in
software, and more.  It doesn't appear likely that we'll ever implement
this in the kernel.

What is much more common is that a user has misconfigured his/her guest
and is actually not accessing an MMIO region, but just hitting some
random hole in the IPA space.  In this scenario, the error message above
is almost misleading and has led to a great deal of confusion over the
years.

It is, nevertheless, ABI to userspace, and we therefore need to
introduce a new capability that userspace explicitly enables to change
behavior.

This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
which does exactly that, and introduces a new exit reason to report the
event to userspace.  User space can then emulate an exception to the
guest, restart the guest, suspend the guest, or take any other
appropriate action as per the policy of the running system.
Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Reviewed-by: NAlexander Graf <graf@amazon.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

c726200d

21 10月, 2019 1 次提交

KVM: PPC: Report single stepping capability · 1a9167a2

由 Fabiano Rosas 提交于 6月 19, 2019

When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
the next instruction to be single stepped via the
KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.

This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
to inform userspace about the state of single stepping support.

We currently don't have support for guest single stepping implemented
in Book3S HV so the capability is only present for Book3S PR and
BookE.
Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

1a9167a2

24 9月, 2019 1 次提交

KVM/Hyper-V: Add new KVM capability KVM_CAP_HYPERV_DIRECT_TLBFLUSH · 344c6c80

由 Tianyu Lan 提交于 8月 22, 2019

Hyper-V direct tlb flush function should be enabled for
guest that only uses Hyper-V hypercall. User space
hypervisor(e.g, Qemu) can disable KVM identification in
CPUID and just exposes Hyper-V identification to make
sure the precondition. Add new KVM capability KVM_CAP_
HYPERV_DIRECT_TLBFLUSH for user space to enable Hyper-V
direct tlb function and this function is default to be
disabled in KVM.
Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

344c6c80

20 9月, 2019 1 次提交

KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface · dee04eee

由 Anup Patel 提交于 9月 04, 2019

We will be using ONE_REG interface accessing VCPU registers from
user-space hence we add KVM_REG_RISCV for RISC-V VCPU registers.
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NAlexander Graf <graf@amazon.com>
Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>

dee04eee

11 9月, 2019 1 次提交

KVM: x86: Return to userspace with internal error on unexpected exit reason · 7396d337

由 Liran Alon 提交于 8月 26, 2019

Receiving an unexpected exit reason from hardware should be considered
as a severe bug in KVM. Therefore, instead of just injecting #UD to
guest and ignore it, exit to userspace on internal error so that
it could handle it properly (probably by terminating guest).

In addition, prefer to use vcpu_unimpl() instead of WARN_ONCE()
as handling unexpected exit reason should be a rare unexpected
event (that was expected to never happen) and we prefer to print
a message on it every time it occurs to guest.

Furthermore, dump VMCS/VMCB to dmesg to assist diagnosing such cases.
Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7396d337

09 9月, 2019 1 次提交

KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75

由 Marc Zyngier 提交于 8月 18, 2019

While parts of the VGIC support a large number of vcpus (we
bravely allow up to 512), other parts are more limited.

One of these limits is visible in the KVM_IRQ_LINE ioctl, which
only allows 256 vcpus to be signalled when using the CPU or PPI
types. Unfortunately, we've cornered ourselves badly by allocating
all the bits in the irq field.

Since the irq_type subfield (8 bit wide) is currently only taking
the values 0, 1 and 2 (and we have been careful not to allow anything
else), let's reduce this field to only 4 bits, and allocate the
remaining 4 bits to a vcpu2_index, which acts as a multiplier:

  vcpu_id = 256 * vcpu2_index + vcpu_index

With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
allowing this to be discovered, it becomes possible to inject
PPIs to up to 4096 vcpus. But please just don't.

Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
Reported-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

92f35b75

24 7月, 2019 1 次提交

Documentation: move Documentation/virtual to Documentation/virt · 2f5947df

由 Christoph Hellwig 提交于 7月 24, 2019

Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories.  We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory.  Fix up the documentation
to match that.

Fixes: ed16648e ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f5947df

11 7月, 2019 1 次提交

KVM: x86: PMU Event Filter · 66bb8a06

由 Eric Hankland 提交于 7月 10, 2019

Some events can provide a guest with information about other guests or the
host (e.g. L3 cache stats); providing the capability to restrict access
to a "safe" set of events would limit the potential for the PMU to be used
in any side channel attacks. This change introduces a new VM ioctl that
sets an event filter. If the guest attempts to program a counter for
any blacklisted or non-whitelisted event, the kernel counter won't be
created, so any RDPMC/RDMSR will show 0 instances of that event.
Signed-off-by: NEric Hankland <ehankland@google.com>
[Lots of changes. All remaining bugs are probably mine. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

66bb8a06

05 6月, 2019 1 次提交

KVM: X86: Provide a capability to disable cstate msr read intercepts · b5170063

由 Wanpeng Li 提交于 5月 21, 2019

Allow guest reads CORE cstate when exposing host CPU power management capabilities
to the guest. PKG cstate is restricted to avoid a guest to get the whole package
information in multi-tenant scenario.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5170063

08 5月, 2019 1 次提交

KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 · d7547c55

由 Peter Xu 提交于 5月 08, 2019

The previous KVM_CAP_MANUAL_DIRTY_LOG_PROTECT has some problem which
blocks the correct usage from userspace.  Obsolete the old one and
introduce a new capability bit for it.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d7547c55

30 4月, 2019 2 次提交

KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVE · eacc56bb

由 Cédric Le Goater 提交于 4月 18, 2019

The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to
let QEMU connect the vCPU presenters to the XIVE KVM device if
required. The capability is not advertised for now as the full support
for the XIVE native exploitation mode is not yet available. When this
is case, the capability will be advertised on PowerNV Hypervisors
only. Nested guests (pseries KVM Hypervisor) are not supported.

Internally, the interface to the new KVM device is protected with a
new interrupt mode: KVMPPC_IRQ_XIVE.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

eacc56bb

KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode · 90c73795

由 Cédric Le Goater 提交于 4月 18, 2019

This is the basic framework for the new KVM device supporting the XIVE
native exploitation mode. The user interface exposes a new KVM device
to be created by QEMU, only available when running on a L0 hypervisor.
Support for nested guests is not available yet.

The XIVE device reuses the device structure of the XICS-on-XIVE device
as they have a lot in common. That could possibly change in the future
if the need arise.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

90c73795

24 4月, 2019 1 次提交

KVM: arm64: Add capability to advertise ptrauth for guest · a243c16d

由 Amit Daniel Kachhap 提交于 4月 23, 2019

This patch advertises the capability of two cpu feature called address
pointer authentication and generic pointer authentication. These
capabilities depend upon system support for pointer authentication and
VHE mode.

The current arm64 KVM partially implements pointer authentication and
support of address/generic authentication are tied together. However,
separate ABI requirements for both of them is added so that any future
isolated implementation will not require any ABI changes.
Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

a243c16d

29 3月, 2019 3 次提交

KVM: arm64: Add a capability to advertise SVE support · 555f3d03

由 Dave Martin 提交于 1月 15, 2019

To provide a uniform way to check for KVM SVE support amongst other
features, this patch adds a suitable capability KVM_CAP_ARM_SVE,
and reports it as present when SVE is available.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

555f3d03

KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl · 7dd32a0d

由 Dave Martin 提交于 12月 19, 2018

Some aspects of vcpu configuration may be too complex to be
completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
requirement for userspace to do some additional configuration
before various other ioctls will work in a consistent way.

In particular this will be the case for SVE, where userspace will
need to negotiate the set of vector lengths to be made available to
the guest before the vcpu becomes fully usable.

In order to provide an explicit way for userspace to confirm that
it has finished setting up a particular vcpu feature, this patch
adds a new ioctl KVM_ARM_VCPU_FINALIZE.

When userspace has opted into a feature that requires finalization,
typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
matching call to KVM_ARM_VCPU_FINALIZE is now required before
KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
impose additional restrictions where appropriate.

No existing vcpu features are affected by this, so current
userspace implementations will continue to work exactly as before,
with no need to issue KVM_ARM_VCPU_FINALIZE.

As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
placeholder: no finalizable features exist yet, so ioctl is not
required and will always yield EINVAL.  Subsequent patches will add
the finalization logic to make use of this ioctl for SVE.

No functional change for existing userspace.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

7dd32a0d

KVM: Allow 2048-bit register access via ioctl interface · 2b953ea3

由 Dave Martin 提交于 9月 28, 2018

The Arm SVE architecture defines registers that are up to 2048 bits
in size (with some possibility of further future expansion).

In order to avoid the need for an excessively large number of
ioctls when saving and restoring a vcpu's registers, this patch
adds a #define to make support for individual 2048-bit registers
through the KVM_{GET,SET}_ONE_REG ioctl interface official.  This
will allow each SVE register to be accessed in a single call.

There are sufficient spare bits in the register id size field for
this change, so there is no ABI impact, providing that
KVM_GET_REG_LIST does not enumerate any 2048-bit register unless
userspace explicitly opts in to the relevant architecture-specific
features.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

2b953ea3

15 12月, 2018 1 次提交

x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID · 2bc39970

由 Vitaly Kuznetsov 提交于 12月 10, 2018

With every new Hyper-V Enlightenment we implement we're forced to add a
KVM_CAP_HYPERV_* capability. While this approach works it is fairly
inconvenient: the majority of the enlightenments we do have corresponding
CPUID feature bit(s) and userspace has to know this anyways to be able to
expose the feature to the guest.

Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
cap to rule them all!") returning all Hyper-V CPUID feature leaves.

Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
0x40000001) and we would probably confuse userspace in case we decide to
return these twice.

KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bc39970

14 12月, 2018 1 次提交

kvm: introduce manual dirty log reprotect · 2a31b9db

由 Paolo Bonzini 提交于 10月 23, 2018

There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
it can take kvm->mmu_lock for an extended period of time.  Second, its user
can actually see many false positives in some cases.  The latter is due
to a benign race like this:

  1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
     them.
  2. The guest modifies the pages, causing them to be marked ditry.
  3. Userspace actually copies the pages.
  4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
     they were not written to since (3).

This is especially a problem for large guests, where the time between
(1) and (3) can be substantial.  This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns.  Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a31b9db

18 10月, 2018 1 次提交

kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD · c4f55198

由 Jim Mattson 提交于 10月 16, 2018

This is a per-VM capability which can be enabled by userspace so that
the faulting linear address will be included with the information
about a pending #PF in L2, and the "new DR6 bits" will be included
with the information about a pending #DB in L2. With this capability
enabled, the L1 hypervisor can now intercept #PF before CR2 is
modified. Under VMX, the L1 hypervisor can now intercept #DB before
DR6 and DR7 are modified.

When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
generally provide an appropriate payload when injecting a #PF or #DB
exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
checkpoints, this payload is not required.

Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
region while advanced debugging of RTM transactional regions was
enabled. This is the reverse of DR6.RTM, which is cleared in this
scenario.

This capability also enables exception.pending in struct
kvm_vcpu_events, which allows userspace to distinguish between pending
and injected exceptions.
Reported-by: NJim Mattson <jmattson@google.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c4f55198

17 10月, 2018 3 次提交

KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability · 57b119da

由 Vitaly Kuznetsov 提交于 10月 16, 2018

Enlightened VMCS is opt-in. The current version does not contain all
fields supported by nested VMX so we must not advertise the
corresponding VMX features if enlightened VMCS is enabled.

Userspace is given the enlightened VMCS version supported by KVM as
part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to
be advertised to the nested hypervisor, currently done via a cpuid
leaf for Hyper-V.
Suggested-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57b119da

kvm/x86 : add coalesced pio support · 0804c849

由 Peng Hao 提交于 10月 14, 2018

Coalesced pio is based on coalesced mmio and can be used for some port
like rtc port, pci-host config port and so on.

Specially in case of rtc as coalesced pio, some versions of windows guest
access rtc frequently because of rtc as system tick. guest access rtc like
this: write register index to 0x70, then write or read data from 0x71.
writing 0x70 port is just as index and do nothing else. So we can use
coalesced pio to handle this scene to reduce VM-EXIT time.

When starting and closing a virtual machine, it will access pci-host config
port frequently. So setting these port as coalesced pio can reduce startup
and shutdown time.

without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8
using perf tools: (guest OS : windows 7 64bit)
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%)
0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%)

with my patch
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%)
0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%)
Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0804c849

KVM: x86: hyperv: implement PV IPI send hypercalls · 214ff83d

由 Vitaly Kuznetsov 提交于 9月 26, 2018

Using hypercall for sending IPIs is faster because this allows to specify
any number of vCPUs (even > 64 with sparse CPU set), the whole procedure
will take only one VMEXIT.

Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi
hypercall can't be 'fast' (passing parameters through registers) but
apparently this is not true, Windows always uses it as 'fast' so we need
to support that.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

214ff83d

09 10月, 2018 2 次提交

KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result · 901f8c3f

由 Paul Mackerras 提交于 10月 08, 2018

This adds a KVM_PPC_NO_HASH flag to the flags field of the
kvm_ppc_smmu_info struct, and arranges for it to be set when
running as a nested hypervisor, as an unambiguous indication
to userspace that HPT guests are not supported.  Reporting the
KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as
indicating only that the new HPT features in ISA V3.0 are not
supported, leaving it ambiguous whether pre-V3.0 HPT features
are supported.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

901f8c3f

KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization · aa069a99

由 Paul Mackerras 提交于 9月 21, 2018

With this, userspace can enable a KVM-HV guest to run nested guests
under it.

The administrator can control whether any nested guests can be run;
setting the "nested" module parameter to false prevents any guests
becoming nested hypervisors (that is, any attempt to enable the nested
capability on a guest will fail).  Guests which are already nested
hypervisors will continue to be so.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

aa069a99

03 10月, 2018 1 次提交

kvm: arm64: Allow tuning the physical address size for VM · 233a7cb2

由 Suzuki K Poulose 提交于 9月 26, 2018

Allow specifying the physical address size limit for a new
VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This
allows us to finalise the stage2 page table as early as possible
and hence perform the right checks on the memory slots
without complication. The size is encoded as Log2(PA_Size) in
bits[7:0] of the type field. For backward compatibility the
value 0 is reserved and implies 40bits. Also, lift the limit
of the IPA to host limit and allow lower IPA sizes (e.g, 32).

The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE
for the availability of this feature. The cap check returns the
maximum limit for the physical address shift supported by the host.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <cdall@kernel.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

233a7cb2

20 9月, 2018 1 次提交

KVM: x86: Control guest reads of MSR_PLATFORM_INFO · 6fbbde9a

由 Drew Schmitt 提交于 8月 20, 2018

Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.

Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).
Signed-off-by: NDrew Schmitt <dasch@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6fbbde9a

06 8月, 2018 1 次提交

kvm: nVMX: Introduce KVM_CAP_NESTED_STATE · 8fcc4b59

由 Jim Mattson 提交于 7月 10, 2018

For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.

With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NJim Mattson <jmattson@google.com>
[karahmed@ - rename structs and functions and make them ready for AMD and
             address previous comments.
           - handle nested.smm state.
           - rebase & a bit of refactoring.
           - Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8fcc4b59

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功