提交 c136b843 编写于 作者: L Linus Torvalds

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC:
   - Better machine check handling for HV KVM
   - Ability to support guests with threads=2, 4 or 8 on POWER9
   - Fix for a race that could cause delayed recognition of signals
   - Fix for a bug where POWER9 guests could sleep with interrupts pending.

  ARM:
   - VCPU request overhaul
   - allow timer and PMU to have their interrupt number selected from userspace
   - workaround for Cavium erratum 30115
   - handling of memory poisonning
   - the usual crop of fixes and cleanups

  s390:
   - initial machine check forwarding
   - migration support for the CMMA page hinting information
   - cleanups and fixes

  x86:
   - nested VMX bugfixes and improvements
   - more reliable NMI window detection on AMD
   - APIC timer optimizations

  Generic:
   - VCPU request overhaul + documentation of common code patterns
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  Update my email address
  kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: make spte mmio mask more explicit
  x86: kvm: mmu: dead code thanks to access tracking
  KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
  KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
  KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
  KVM: x86: remove ignored type attribute
  KVM: LAPIC: Fix lapic timer injection delay
  KVM: lapic: reorganize restart_apic_timer
  KVM: lapic: reorganize start_hv_timer
  kvm: nVMX: Check memory operand to INVVPID
  KVM: s390: Inject machine check into the nested guest
  KVM: s390: Inject machine check into the guest
  tools/kvm_stat: add new interactive command 'b'
  tools/kvm_stat: add new command line switch '-i'
  tools/kvm_stat: fix error on interactive command 'g'
  KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
  ...
...@@ -1862,6 +1862,18 @@ ...@@ -1862,6 +1862,18 @@
for all guests. for all guests.
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode. Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
kvm-arm.vgic_v3_group0_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-0
system registers
kvm-arm.vgic_v3_group1_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-1
system registers
kvm-arm.vgic_v3_common_trap=
[KVM,ARM] Trap guest accesses to GICv3 common
system registers
kvm-intel.ept= [KVM,Intel] Disable extended page tables kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips. (virtualized MMU) support on capable Intel chips.
Default is 1 (enabled) Default is 1 (enabled)
......
...@@ -62,6 +62,7 @@ stable kernels. ...@@ -62,6 +62,7 @@ stable kernels.
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | | Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
| Cavium | ThunderX SMMUv2 | #27704 | N/A | | Cavium | ThunderX SMMUv2 | #27704 | N/A |
| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
| | | | | | | | | |
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
| | | | | | | | | |
......
...@@ -3255,6 +3255,141 @@ Otherwise, if the MCE is a corrected error, KVM will just ...@@ -3255,6 +3255,141 @@ Otherwise, if the MCE is a corrected error, KVM will just
store it in the corresponding bank (provided this bank is store it in the corresponding bank (provided this bank is
not holding a previously reported uncorrected error). not holding a previously reported uncorrected error).
4.107 KVM_S390_GET_CMMA_BITS
Capability: KVM_CAP_S390_CMMA_MIGRATION
Architectures: s390
Type: vm ioctl
Parameters: struct kvm_s390_cmma_log (in, out)
Returns: 0 on success, a negative value on error
This ioctl is used to get the values of the CMMA bits on the s390
architecture. It is meant to be used in two scenarios:
- During live migration to save the CMMA values. Live migration needs
to be enabled via the KVM_REQ_START_MIGRATION VM property.
- To non-destructively peek at the CMMA values, with the flag
KVM_S390_CMMA_PEEK set.
The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
values are written to a buffer whose location is indicated via the "values"
member in the kvm_s390_cmma_log struct. The values in the input struct are
also updated as needed.
Each CMMA value takes up one byte.
struct kvm_s390_cmma_log {
__u64 start_gfn;
__u32 count;
__u32 flags;
union {
__u64 remaining;
__u64 mask;
};
__u64 values;
};
start_gfn is the number of the first guest frame whose CMMA values are
to be retrieved,
count is the length of the buffer in bytes,
values points to the buffer where the result will be written to.
If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
other ioctls.
The result is written in the buffer pointed to by the field values, and
the values of the input parameter are updated as follows.
Depending on the flags, different actions are performed. The only
supported flag so far is KVM_S390_CMMA_PEEK.
The default behaviour if KVM_S390_CMMA_PEEK is not set is:
start_gfn will indicate the first page frame whose CMMA bits were dirty.
It is not necessarily the same as the one passed as input, as clean pages
are skipped.
count will indicate the number of bytes actually written in the buffer.
It can (and very often will) be smaller than the input value, since the
buffer is only filled until 16 bytes of clean values are found (which
are then not copied in the buffer). Since a CMMA migration block needs
the base address and the length, for a total of 16 bytes, we will send
back some clean data if there is some dirty data afterwards, as long as
the size of the clean data does not exceed the size of the header. This
allows to minimize the amount of data to be saved or transferred over
the network at the expense of more roundtrips to userspace. The next
invocation of the ioctl will skip over all the clean values, saving
potentially more than just the 16 bytes we found.
If KVM_S390_CMMA_PEEK is set:
the existing storage attributes are read even when not in migration
mode, and no other action is performed;
the output start_gfn will be equal to the input start_gfn,
the output count will be equal to the input count, except if the end of
memory has been reached.
In both cases:
the field "remaining" will indicate the total number of dirty CMMA values
still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
not enabled.
mask is unused.
values points to the userspace buffer where the result will be stored.
This ioctl can fail with -ENOMEM if not enough memory can be allocated to
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
-EFAULT if the userspace address is invalid or if no page table is
present for the addresses (e.g. when using hugepages).
4.108 KVM_S390_SET_CMMA_BITS
Capability: KVM_CAP_S390_CMMA_MIGRATION
Architectures: s390
Type: vm ioctl
Parameters: struct kvm_s390_cmma_log (in)
Returns: 0 on success, a negative value on error
This ioctl is used to set the values of the CMMA bits on the s390
architecture. It is meant to be used during live migration to restore
the CMMA values, but there are no restrictions on its use.
The ioctl takes parameters via the kvm_s390_cmma_values struct.
Each CMMA value takes up one byte.
struct kvm_s390_cmma_log {
__u64 start_gfn;
__u32 count;
__u32 flags;
union {
__u64 remaining;
__u64 mask;
};
__u64 values;
};
start_gfn indicates the starting guest frame number,
count indicates how many values are to be considered in the buffer,
flags is not used and must be 0.
mask indicates which PGSTE bits are to be considered.
remaining is not used.
values points to the buffer in userspace where to store the values.
This ioctl can fail with -ENOMEM if not enough memory can be allocated to
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
if the flags field was not 0, with -EFAULT if the userspace address is
invalid, if invalid pages are written to (e.g. after the end of memory)
or if no page table is present for the addresses (e.g. when using
hugepages).
5. The kvm_run structure 5. The kvm_run structure
------------------------ ------------------------
...@@ -3996,6 +4131,34 @@ Parameters: none ...@@ -3996,6 +4131,34 @@ Parameters: none
Allow use of adapter-interruption suppression. Allow use of adapter-interruption suppression.
Returns: 0 on success; -EBUSY if a VCPU has already been created. Returns: 0 on success; -EBUSY if a VCPU has already been created.
7.11 KVM_CAP_PPC_SMT
Architectures: ppc
Parameters: vsmt_mode, flags
Enabling this capability on a VM provides userspace with a way to set
the desired virtual SMT mode (i.e. the number of virtual CPUs per
virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
between 1 and 8. On POWER8, vsmt_mode must also be no greater than
the number of threads per subcore for the host. Currently flags must
be 0. A successful call to enable this capability will result in
vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is
subsequently queried for the VM. This capability is only supported by
HV KVM, and can only be set before any VCPUs have been created.
The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT
modes are available.
7.12 KVM_CAP_PPC_FWNMI
Architectures: ppc
Parameters: none
With this capability a machine check exception in the guest address
space will cause KVM to exit the guest with NMI exit reason. This
enables QEMU to build error log and branch to guest kernel registered
machine check handling routine. Without this capability KVM will
branch to guests' 0x200 interrupt vector.
8. Other capabilities. 8. Other capabilities.
---------------------- ----------------------
...@@ -4157,3 +4320,12 @@ Currently the following bits are defined for the device_irq_level bitmap: ...@@ -4157,3 +4320,12 @@ Currently the following bits are defined for the device_irq_level bitmap:
Future versions of kvm may implement additional events. These will get Future versions of kvm may implement additional events. These will get
indicated by returning a higher number from KVM_CHECK_EXTENSION and will be indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
listed above. listed above.
8.10 KVM_CAP_PPC_SMT_POSSIBLE
Architectures: ppc
Querying this capability returns a bitmap indicating the possible
virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
(counting from the right) is set, then a virtual SMT mode of 2^N is
available.
...@@ -16,6 +16,7 @@ FLIC provides support to ...@@ -16,6 +16,7 @@ FLIC provides support to
- register and modify adapter interrupt sources (KVM_DEV_FLIC_ADAPTER_*) - register and modify adapter interrupt sources (KVM_DEV_FLIC_ADAPTER_*)
- modify AIS (adapter-interruption-suppression) mode state (KVM_DEV_FLIC_AISM) - modify AIS (adapter-interruption-suppression) mode state (KVM_DEV_FLIC_AISM)
- inject adapter interrupts on a specified adapter (KVM_DEV_FLIC_AIRQ_INJECT) - inject adapter interrupts on a specified adapter (KVM_DEV_FLIC_AIRQ_INJECT)
- get/set all AIS mode states (KVM_DEV_FLIC_AISM_ALL)
Groups: Groups:
KVM_DEV_FLIC_ENQUEUE KVM_DEV_FLIC_ENQUEUE
...@@ -136,6 +137,20 @@ struct kvm_s390_ais_req { ...@@ -136,6 +137,20 @@ struct kvm_s390_ais_req {
an isc according to the adapter-interruption-suppression mode on condition an isc according to the adapter-interruption-suppression mode on condition
that the AIS capability is enabled. that the AIS capability is enabled.
KVM_DEV_FLIC_AISM_ALL
Gets or sets the adapter-interruption-suppression mode for all ISCs. Takes
a kvm_s390_ais_all describing:
struct kvm_s390_ais_all {
__u8 simm; /* Single-Interruption-Mode mask */
__u8 nimm; /* No-Interruption-Mode mask *
};
simm contains Single-Interruption-Mode mask for all ISCs, nimm contains
No-Interruption-Mode mask for all ISCs. Each bit in simm and nimm corresponds
to an ISC (MSB0 bit 0 to ISC 0 and so on). The combination of simm bit and
nimm bit presents AIS mode for a ISC.
Note: The KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR device ioctls executed on Note: The KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR device ioctls executed on
FLIC with an unknown group or attribute gives the error code EINVAL (instead of FLIC with an unknown group or attribute gives the error code EINVAL (instead of
ENXIO, as specified in the API documentation). It is not possible to conclude ENXIO, as specified in the API documentation). It is not possible to conclude
......
...@@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a ...@@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
Returns: -EBUSY: The PMU overflow interrupt is already set Returns: -EBUSY: The PMU overflow interrupt is already set
-ENXIO: The overflow interrupt not set when attempting to get it -ENXIO: The overflow interrupt not set when attempting to get it
-ENODEV: PMUv3 not supported -ENODEV: PMUv3 not supported
-EINVAL: Invalid PMU overflow interrupt number supplied -EINVAL: Invalid PMU overflow interrupt number supplied or
trying to set the IRQ number without using an in-kernel
irqchip.
A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
...@@ -25,11 +27,36 @@ all vcpus, while as an SPI it must be a separate number per vcpu. ...@@ -25,11 +27,36 @@ all vcpus, while as an SPI it must be a separate number per vcpu.
1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
Parameters: no additional parameter in kvm_device_attr.addr Parameters: no additional parameter in kvm_device_attr.addr
Returns: -ENODEV: PMUv3 not supported Returns: -ENODEV: PMUv3 not supported or GIC not initialized
-ENXIO: PMUv3 not properly configured as required prior to calling this -ENXIO: PMUv3 not properly configured or in-kernel irqchip not
attribute configured as required prior to calling this attribute
-EBUSY: PMUv3 already initialized -EBUSY: PMUv3 already initialized
Request the initialization of the PMUv3. This must be done after creating the Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel
in-kernel irqchip. Creating a PMU with a userspace irqchip is currently not virtual GIC implementation, this must be done after initializing the in-kernel
supported. irqchip.
2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
Architectures: ARM,ARM64
2.1. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_VTIMER
2.2. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_PTIMER
Parameters: in kvm_device_attr.addr the address for the timer interrupt is a
pointer to an int
Returns: -EINVAL: Invalid timer interrupt number
-EBUSY: One or more VCPUs has already run
A value describing the architected timer interrupt number when connected to an
in-kernel virtual GIC. These must be a PPI (16 <= intid < 32). Setting the
attribute overrides the default values (see below).
KVM_ARM_VCPU_TIMER_IRQ_VTIMER: The EL1 virtual timer intid (default: 27)
KVM_ARM_VCPU_TIMER_IRQ_PTIMER: The EL1 physical timer intid (default: 30)
Setting the same PPI for different timers will prevent the VCPUs from running.
Setting the interrupt number on a VCPU configures all VCPUs created at that
time to use the number provided for a given timer, overwriting any previously
configured values on other VCPUs. Userspace should configure the interrupt
numbers on at least one VCPU after creating all VCPUs and before running any
VCPUs.
...@@ -222,3 +222,36 @@ Allows user space to disable dea key wrapping, clearing the wrapping key. ...@@ -222,3 +222,36 @@ Allows user space to disable dea key wrapping, clearing the wrapping key.
Parameters: none Parameters: none
Returns: 0 Returns: 0
5. GROUP: KVM_S390_VM_MIGRATION
Architectures: s390
5.1. ATTRIBUTE: KVM_S390_VM_MIGRATION_STOP (w/o)
Allows userspace to stop migration mode, needed for PGSTE migration.
Setting this attribute when migration mode is not active will have no
effects.
Parameters: none
Returns: 0
5.2. ATTRIBUTE: KVM_S390_VM_MIGRATION_START (w/o)
Allows userspace to start migration mode, needed for PGSTE migration.
Setting this attribute when migration mode is already active will have
no effects.
Parameters: none
Returns: -ENOMEM if there is not enough free memory to start migration mode
-EINVAL if the state of the VM is invalid (e.g. no memory defined)
0 in case of success.
5.3. ATTRIBUTE: KVM_S390_VM_MIGRATION_STATUS (r/o)
Allows userspace to query the status of migration mode.
Parameters: address of a buffer in user space to store the data (u64) to;
the data itself is either 0 if migration mode is disabled or 1
if it is enabled
Returns: -EFAULT if the given address is not accessible from kernel space
0 in case of success.
...@@ -179,6 +179,10 @@ Shadow pages contain the following information: ...@@ -179,6 +179,10 @@ Shadow pages contain the following information:
shadow page; it is also used to go back from a struct kvm_mmu_page shadow page; it is also used to go back from a struct kvm_mmu_page
to a memslot, through the kvm_memslots_for_spte_role macro and to a memslot, through the kvm_memslots_for_spte_role macro and
__gfn_to_memslot. __gfn_to_memslot.
role.ad_disabled:
Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D
bits before Haswell; shadow EPT page tables also cannot use A/D bits
if the L1 hypervisor does not enable them.
gfn: gfn:
Either the guest page table containing the translations shadowed by this Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct. page, or the base page frame for linear translations. See role.direct.
......
=================
KVM VCPU Requests
=================
Overview
========
KVM supports an internal API enabling threads to request a VCPU thread to
perform some activity. For example, a thread may request a VCPU to flush
its TLB with a VCPU request. The API consists of the following functions::
/* Check if any requests are pending for VCPU @vcpu. */
bool kvm_request_pending(struct kvm_vcpu *vcpu);
/* Check if VCPU @vcpu has request @req pending. */
bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
/* Clear request @req for VCPU @vcpu. */
void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
/*
* Check if VCPU @vcpu has request @req pending. When the request is
* pending it will be cleared and a memory barrier, which pairs with
* another in kvm_make_request(), will be issued.
*/
bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
/*
* Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
* with another in kvm_check_request(), prior to setting the request.
*/
void kvm_make_request(int req, struct kvm_vcpu *vcpu);
/* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
Typically a requester wants the VCPU to perform the activity as soon
as possible after making the request. This means most requests
(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
and kvm_make_all_cpus_request() has the kicking of all VCPUs built
into it.
VCPU Kicks
----------
The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
order to perform some KVM maintenance. To do so, an IPI is sent, forcing
a guest mode exit. However, a VCPU thread may not be in guest mode at the
time of the kick. Therefore, depending on the mode and state of the VCPU
thread, there are two other actions a kick may take. All three actions
are listed below:
1) Send an IPI. This forces a guest mode exit.
2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest
mode that wait on waitqueues. Waking them removes the threads from
the waitqueues, allowing the threads to run again. This behavior
may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not
sleeping, then there is nothing to do.
VCPU Mode
---------
VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
guest is running in guest mode or not, as well as some specific
outside guest mode states. The architecture may use ``vcpu->mode`` to
ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
even to ensure IPI acknowledgements are waited upon (see "Waiting for
Acknowledgements"). The following modes are defined:
OUTSIDE_GUEST_MODE
The VCPU thread is outside guest mode.
IN_GUEST_MODE
The VCPU thread is in guest mode.
EXITING_GUEST_MODE
The VCPU thread is transitioning from IN_GUEST_MODE to
OUTSIDE_GUEST_MODE.
READING_SHADOW_PAGE_TABLES
The VCPU thread is outside guest mode, but it wants the sender of
certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
thread is done reading the page tables.
VCPU Request Internals
======================
VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
This means general bitops, like those documented in [atomic-ops]_ could
also be used, e.g. ::
clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
However, VCPU request users should refrain from doing so, as it would
break the abstraction. The first 8 bits are reserved for architecture
independent requests, all additional bits are available for architecture
dependent requests.
Architecture Independent Requests
---------------------------------
KVM_REQ_TLB_FLUSH
KVM's common MMU notifier may need to flush all of a guest's TLB
entries, calling kvm_flush_remote_tlbs() to do so. Architectures that
choose to use the common kvm_flush_remote_tlbs() implementation will
need to handle this VCPU request.
KVM_REQ_MMU_RELOAD
When shadow page tables are used and memory slots are removed it's
necessary to inform each VCPU to completely refresh the tables. This
request is used for that.
KVM_REQ_PENDING_TIMER
This request may be made from a timer handler run on the host on behalf
of a VCPU. It informs the VCPU thread to inject a timer interrupt.
KVM_REQ_UNHALT
This request may be made from the KVM common function kvm_vcpu_block(),
which is used to emulate an instruction that causes a CPU to halt until
one of an architectural specific set of events and/or interrupts is
received (determined by checking kvm_arch_vcpu_runnable()). When that
event or interrupt arrives kvm_vcpu_block() makes the request. This is
in contrast to when kvm_vcpu_block() returns due to any other reason,
such as a pending signal, which does not indicate the VCPU's halt
emulation should stop, and therefore does not make the request.
KVM_REQUEST_MASK
----------------
VCPU requests should be masked by KVM_REQUEST_MASK before using them with
bitops. This is because only the lower 8 bits are used to represent the
request's number. The upper bits are used as flags. Currently only two
flags are defined.
VCPU Request Flags
------------------
KVM_REQUEST_NO_WAKEUP
This flag is applied to requests that only need immediate attention
from VCPUs running in guest mode. That is, sleeping VCPUs do not need
to be awaken for these requests. Sleeping VCPUs will handle the
requests when they are awaken later for some other reason.
KVM_REQUEST_WAIT
When requests with this flag are made with kvm_make_all_cpus_request(),
then the caller will wait for each VCPU to acknowledge its IPI before
proceeding. This flag only applies to VCPUs that would receive IPIs.
If, for example, the VCPU is sleeping, so no IPI is necessary, then
the requesting thread does not wait. This means that this flag may be
safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for
Acknowledgements" for more information about requests with
KVM_REQUEST_WAIT.
VCPU Requests with Associated State
===================================
Requesters that want the receiving VCPU to handle new state need to ensure
the newly written state is observable to the receiving VCPU thread's CPU
by the time it observes the request. This means a write memory barrier
must be inserted after writing the new state and before setting the VCPU
request bit. Additionally, on the receiving VCPU thread's side, a
corresponding read barrier must be inserted after reading the request bit
and before proceeding to read the new state associated with it. See
scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
[memory-barriers]_.
The pair of functions, kvm_check_request() and kvm_make_request(), provide
the memory barriers, allowing this requirement to be handled internally by
the API.
Ensuring Requests Are Seen
==========================
When making requests to VCPUs, we want to avoid the receiving VCPU
executing in guest mode for an arbitrary long time without handling the
request. We can be sure this won't happen as long as we ensure the VCPU
thread checks kvm_request_pending() before entering guest mode and that a
kick will send an IPI to force an exit from guest mode when necessary.
Extra care must be taken to cover the period after the VCPU thread's last
kvm_request_pending() check and before it has entered guest mode, as kick
IPIs will only trigger guest mode exits for VCPU threads that are in guest
mode or at least have already disabled interrupts in order to prepare to
enter guest mode. This means that an optimized implementation (see "IPI
Reduction") must be certain when it's safe to not send the IPI. One
solution, which all architectures except s390 apply, is to:
- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
the last kvm_request_pending() check;
- enable interrupts atomically when entering the guest.
This solution also requires memory barriers to be placed carefully in both
the requesting thread and the receiving VCPU. With the memory barriers we
can exclude the possibility of a VCPU thread observing
!kvm_request_pending() on its last check and then not receiving an IPI for
the next request made of it, even if the request is made immediately after
the check. This is done by way of the Dekker memory barrier pattern
(scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables,
this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting
them into the pattern gives::
CPU1 CPU2
================= =================
local_irq_disable();
WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu);
smp_mb(); smp_mb();
if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) ==
IN_GUEST_MODE) {
...abort guest entry... ...send IPI...
} }
As stated above, the IPI is only useful for VCPU threads in guest mode or
that have already disabled interrupts. This is why this specific case of
the Dekker pattern has been extended to disable interrupts before setting
``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to
pedantically implement the memory barrier pattern, guaranteeing the
compiler doesn't interfere with ``vcpu->mode``'s carefully planned
accesses.
IPI Reduction
-------------
As only one IPI is needed to get a VCPU to check for any/all requests,
then they may be coalesced. This is easily done by having the first IPI
sending kick also change the VCPU mode to something !IN_GUEST_MODE. The
transitional state, EXITING_GUEST_MODE, is used for this purpose.
Waiting for Acknowledgements
----------------------------
Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
be sent, and the acknowledgements to be waited upon, even when the target
VCPU threads are in modes other than IN_GUEST_MODE. For example, one case
is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
is set after disabling interrupts. To support these cases, the
KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
checking that the VCPU is IN_GUEST_MODE to checking that it is not
OUTSIDE_GUEST_MODE.
Request-less VCPU Kicks
-----------------------
As the determination of whether or not to send an IPI depends on the
two-variable Dekker memory barrier pattern, then it's clear that
request-less VCPU kicks are almost never correct. Without the assurance
that a non-IPI generating kick will still result in an action by the
receiving VCPU, as the final kvm_request_pending() check does for
request-accompanying kicks, then the kick may not do anything useful at
all. If, for instance, a request-less kick was made to a VCPU that was
just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
the VCPU thread may continue its entry without actually having done
whatever it was the kick was meant to initiate.
One exception is x86's posted interrupt mechanism. In this case, however,
even the request-less VCPU kick is coupled with the same
local_irq_disable() + smp_mb() pattern described above; the ON bit
(Outstanding Notification) in the posted interrupt descriptor takes the
role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is
set before reading ``vcpu->mode``; dually, in the VCPU thread,
vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
IN_GUEST_MODE.
Additional Considerations
=========================
Sleeping VCPUs
--------------
VCPU threads may need to consider requests before and/or after calling
functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they
do or not, and, if they do, which requests need consideration, is
architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
to check if it should awaken. One reason to do so is to provide
architectures a function where requests may be checked if necessary.
Clearing Requests
-----------------
Generally it only makes sense for the receiving VCPU thread to clear a
request. However, in some circumstances, such as when the requesting
thread and the receiving VCPU thread are executed serially, such as when
they are the same thread, or when they are using some form of concurrency
control to temporarily execute synchronously, then it's possible to know
that the request may be cleared immediately, rather than waiting for the
receiving VCPU thread to handle the request in VCPU RUN. The only current
examples of this are kvm_vcpu_block() calls made by VCPUs to block
themselves. A possible side-effect of that call is to make the
KVM_REQ_UNHALT request, which may then be cleared immediately when the
VCPU returns from the call.
References
==========
.. [atomic-ops] Documentation/core-api/atomic_ops.rst
.. [memory-barriers] Documentation/memory-barriers.txt
.. [lwn-mb] https://lwn.net/Articles/573436/
...@@ -7350,7 +7350,7 @@ F: arch/powerpc/kvm/ ...@@ -7350,7 +7350,7 @@ F: arch/powerpc/kvm/
KERNEL VIRTUAL MACHINE for s390 (KVM/s390) KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
M: Christian Borntraeger <borntraeger@de.ibm.com> M: Christian Borntraeger <borntraeger@de.ibm.com>
M: Cornelia Huck <cornelia.huck@de.ibm.com> M: Cornelia Huck <cohuck@redhat.com>
L: linux-s390@vger.kernel.org L: linux-s390@vger.kernel.org
W: http://www.ibm.com/developerworks/linux/linux390/ W: http://www.ibm.com/developerworks/linux/linux390/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
...@@ -11268,7 +11268,7 @@ S: Supported ...@@ -11268,7 +11268,7 @@ S: Supported
F: drivers/iommu/s390-iommu.c F: drivers/iommu/s390-iommu.c
S390 VFIO-CCW DRIVER S390 VFIO-CCW DRIVER
M: Cornelia Huck <cornelia.huck@de.ibm.com> M: Cornelia Huck <cohuck@redhat.com>
M: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> M: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
L: linux-s390@vger.kernel.org L: linux-s390@vger.kernel.org
L: kvm@vger.kernel.org L: kvm@vger.kernel.org
...@@ -13814,7 +13814,7 @@ F: include/uapi/linux/virtio_*.h ...@@ -13814,7 +13814,7 @@ F: include/uapi/linux/virtio_*.h
F: drivers/crypto/virtio/ F: drivers/crypto/virtio/
VIRTIO DRIVERS FOR S390 VIRTIO DRIVERS FOR S390
M: Cornelia Huck <cornelia.huck@de.ibm.com> M: Cornelia Huck <cohuck@redhat.com>
M: Halil Pasic <pasic@linux.vnet.ibm.com> M: Halil Pasic <pasic@linux.vnet.ibm.com>
L: linux-s390@vger.kernel.org L: linux-s390@vger.kernel.org
L: virtualization@lists.linux-foundation.org L: virtualization@lists.linux-foundation.org
......
...@@ -44,7 +44,9 @@ ...@@ -44,7 +44,9 @@
#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
#endif #endif
#define KVM_REQ_VCPU_EXIT (8 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
int __attribute_const__ kvm_target_cpu(void); int __attribute_const__ kvm_target_cpu(void);
...@@ -233,8 +235,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void); ...@@ -233,8 +235,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
struct kvm_vcpu __percpu **kvm_get_running_vcpus(void); struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
void kvm_arm_halt_guest(struct kvm *kvm); void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm); void kvm_arm_resume_guest(struct kvm *kvm);
void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices); int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu); unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
...@@ -291,20 +291,12 @@ static inline void kvm_arm_init_debug(void) {} ...@@ -291,20 +291,12 @@ static inline void kvm_arm_init_debug(void) {}
static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {} static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {} static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {} static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
static inline int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr) int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
{ struct kvm_device_attr *attr);
return -ENXIO; int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
} struct kvm_device_attr *attr);
static inline int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu, int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr) struct kvm_device_attr *attr);
{
return -ENXIO;
}
static inline int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr)
{
return -ENXIO;
}
#endif /* __ARM_KVM_HOST_H__ */ #endif /* __ARM_KVM_HOST_H__ */
...@@ -203,6 +203,14 @@ struct kvm_arch_memory_slot { ...@@ -203,6 +203,14 @@ struct kvm_arch_memory_slot {
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INTID_MASK 0x3ff #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INTID_MASK 0x3ff
#define VGIC_LEVEL_INFO_LINE_LEVEL 0 #define VGIC_LEVEL_INFO_LINE_LEVEL 0
/* Device Control API on vcpu fd */
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
#define KVM_ARM_VCPU_PMU_V3_INIT 1
#define KVM_ARM_VCPU_TIMER_CTRL 1
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
#define KVM_DEV_ARM_VGIC_CTRL_INIT 0 #define KVM_DEV_ARM_VGIC_CTRL_INIT 0
#define KVM_DEV_ARM_ITS_SAVE_TABLES 1 #define KVM_DEV_ARM_ITS_SAVE_TABLES 1
#define KVM_DEV_ARM_ITS_RESTORE_TABLES 2 #define KVM_DEV_ARM_ITS_RESTORE_TABLES 2
......
...@@ -301,3 +301,54 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, ...@@ -301,3 +301,54 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
{ {
return -EINVAL; return -EINVAL;
} }
int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr)
{
int ret;
switch (attr->group) {
case KVM_ARM_VCPU_TIMER_CTRL:
ret = kvm_arm_timer_set_attr(vcpu, attr);
break;
default:
ret = -ENXIO;
break;
}
return ret;
}
int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr)
{
int ret;
switch (attr->group) {
case KVM_ARM_VCPU_TIMER_CTRL:
ret = kvm_arm_timer_get_attr(vcpu, attr);
break;
default:
ret = -ENXIO;
break;
}
return ret;
}
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr)
{
int ret;
switch (attr->group) {
case KVM_ARM_VCPU_TIMER_CTRL:
ret = kvm_arm_timer_has_attr(vcpu, attr);
break;
default:
ret = -ENXIO;
break;
}
return ret;
}
...@@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run) ...@@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
trace_kvm_wfx(*vcpu_pc(vcpu), false); trace_kvm_wfx(*vcpu_pc(vcpu), false);
vcpu->stat.wfi_exit_stat++; vcpu->stat.wfi_exit_stat++;
kvm_vcpu_block(vcpu); kvm_vcpu_block(vcpu);
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
} }
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu)); kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
......
...@@ -237,8 +237,10 @@ void __hyp_text __noreturn __hyp_panic(int cause) ...@@ -237,8 +237,10 @@ void __hyp_text __noreturn __hyp_panic(int cause)
vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR); vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
__timer_save_state(vcpu);
__deactivate_traps(vcpu); __deactivate_traps(vcpu);
__deactivate_vm(vcpu); __deactivate_vm(vcpu);
__banked_restore_state(host_ctxt);
__sysreg_restore_state(host_ctxt); __sysreg_restore_state(host_ctxt);
} }
......
...@@ -37,16 +37,6 @@ static struct kvm_regs cortexa_regs_reset = { ...@@ -37,16 +37,6 @@ static struct kvm_regs cortexa_regs_reset = {
.usr_regs.ARM_cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT, .usr_regs.ARM_cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
}; };
static const struct kvm_irq_level cortexa_ptimer_irq = {
{ .irq = 30 },
.level = 1,
};
static const struct kvm_irq_level cortexa_vtimer_irq = {
{ .irq = 27 },
.level = 1,
};
/******************************************************************************* /*******************************************************************************
* Exported reset function * Exported reset function
...@@ -62,16 +52,12 @@ static const struct kvm_irq_level cortexa_vtimer_irq = { ...@@ -62,16 +52,12 @@ static const struct kvm_irq_level cortexa_vtimer_irq = {
int kvm_reset_vcpu(struct kvm_vcpu *vcpu) int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
{ {
struct kvm_regs *reset_regs; struct kvm_regs *reset_regs;
const struct kvm_irq_level *cpu_vtimer_irq;
const struct kvm_irq_level *cpu_ptimer_irq;
switch (vcpu->arch.target) { switch (vcpu->arch.target) {
case KVM_ARM_TARGET_CORTEX_A7: case KVM_ARM_TARGET_CORTEX_A7:
case KVM_ARM_TARGET_CORTEX_A15: case KVM_ARM_TARGET_CORTEX_A15:
reset_regs = &cortexa_regs_reset; reset_regs = &cortexa_regs_reset;
vcpu->arch.midr = read_cpuid_id(); vcpu->arch.midr = read_cpuid_id();
cpu_vtimer_irq = &cortexa_vtimer_irq;
cpu_ptimer_irq = &cortexa_ptimer_irq;
break; break;
default: default:
return -ENODEV; return -ENODEV;
...@@ -84,5 +70,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) ...@@ -84,5 +70,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
kvm_reset_coprocs(vcpu); kvm_reset_coprocs(vcpu);
/* Reset arch_timer context */ /* Reset arch_timer context */
return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq, cpu_ptimer_irq); return kvm_timer_vcpu_reset(vcpu);
} }
...@@ -488,6 +488,17 @@ config CAVIUM_ERRATUM_27456 ...@@ -488,6 +488,17 @@ config CAVIUM_ERRATUM_27456
If unsure, say Y. If unsure, say Y.
config CAVIUM_ERRATUM_30115
bool "Cavium erratum 30115: Guest may disable interrupts in host"
default y
help
On ThunderX T88 pass 1.x through 2.2, T81 pass 1.0 through
1.2, and T83 Pass 1.0, KVM guest execution may disable
interrupts in host. Trapping both GICv3 group-0 and group-1
accesses sidesteps the issue.
If unsure, say Y.
config QCOM_FALKOR_ERRATUM_1003 config QCOM_FALKOR_ERRATUM_1003
bool "Falkor E1003: Incorrect translation due to ASID change" bool "Falkor E1003: Incorrect translation due to ASID change"
default y default y
......
...@@ -89,7 +89,7 @@ static inline void gic_write_ctlr(u32 val) ...@@ -89,7 +89,7 @@ static inline void gic_write_ctlr(u32 val)
static inline void gic_write_grpen1(u32 val) static inline void gic_write_grpen1(u32 val)
{ {
write_sysreg_s(val, SYS_ICC_GRPEN1_EL1); write_sysreg_s(val, SYS_ICC_IGRPEN1_EL1);
isb(); isb();
} }
......
...@@ -38,7 +38,8 @@ ...@@ -38,7 +38,8 @@
#define ARM64_WORKAROUND_REPEAT_TLBI 17 #define ARM64_WORKAROUND_REPEAT_TLBI 17
#define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18 #define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18
#define ARM64_WORKAROUND_858921 19 #define ARM64_WORKAROUND_858921 19
#define ARM64_WORKAROUND_CAVIUM_30115 20
#define ARM64_NCAPS 20 #define ARM64_NCAPS 21
#endif /* __ASM_CPUCAPS_H */ #endif /* __ASM_CPUCAPS_H */
...@@ -86,6 +86,7 @@ ...@@ -86,6 +86,7 @@
#define CAVIUM_CPU_PART_THUNDERX 0x0A1 #define CAVIUM_CPU_PART_THUNDERX 0x0A1
#define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2 #define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2
#define CAVIUM_CPU_PART_THUNDERX_83XX 0x0A3
#define BRCM_CPU_PART_VULCAN 0x516 #define BRCM_CPU_PART_VULCAN 0x516
...@@ -96,6 +97,7 @@ ...@@ -96,6 +97,7 @@
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73) #define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX) #define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX) #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
#define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1) #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1)
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
......
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
#define __ASM_ESR_H #define __ASM_ESR_H
#include <asm/memory.h> #include <asm/memory.h>
#include <asm/sysreg.h>
#define ESR_ELx_EC_UNKNOWN (0x00) #define ESR_ELx_EC_UNKNOWN (0x00)
#define ESR_ELx_EC_WFx (0x01) #define ESR_ELx_EC_WFx (0x01)
...@@ -182,6 +183,29 @@ ...@@ -182,6 +183,29 @@
#define ESR_ELx_SYS64_ISS_SYS_CNTFRQ (ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 0, 14, 0) | \ #define ESR_ELx_SYS64_ISS_SYS_CNTFRQ (ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 0, 14, 0) | \
ESR_ELx_SYS64_ISS_DIR_READ) ESR_ELx_SYS64_ISS_DIR_READ)
#define esr_sys64_to_sysreg(e) \
sys_reg((((e) & ESR_ELx_SYS64_ISS_OP0_MASK) >> \
ESR_ELx_SYS64_ISS_OP0_SHIFT), \
(((e) & ESR_ELx_SYS64_ISS_OP1_MASK) >> \
ESR_ELx_SYS64_ISS_OP1_SHIFT), \
(((e) & ESR_ELx_SYS64_ISS_CRN_MASK) >> \
ESR_ELx_SYS64_ISS_CRN_SHIFT), \
(((e) & ESR_ELx_SYS64_ISS_CRM_MASK) >> \
ESR_ELx_SYS64_ISS_CRM_SHIFT), \
(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \
ESR_ELx_SYS64_ISS_OP2_SHIFT))
#define esr_cp15_to_sysreg(e) \
sys_reg(3, \
(((e) & ESR_ELx_SYS64_ISS_OP1_MASK) >> \
ESR_ELx_SYS64_ISS_OP1_SHIFT), \
(((e) & ESR_ELx_SYS64_ISS_CRN_MASK) >> \
ESR_ELx_SYS64_ISS_CRN_SHIFT), \
(((e) & ESR_ELx_SYS64_ISS_CRM_MASK) >> \
ESR_ELx_SYS64_ISS_CRM_SHIFT), \
(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \
ESR_ELx_SYS64_ISS_OP2_SHIFT))
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
#include <asm/types.h> #include <asm/types.h>
......
...@@ -42,7 +42,9 @@ ...@@ -42,7 +42,9 @@
#define KVM_VCPU_MAX_FEATURES 4 #define KVM_VCPU_MAX_FEATURES 4
#define KVM_REQ_VCPU_EXIT (8 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
int __attribute_const__ kvm_target_cpu(void); int __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu); int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
...@@ -334,8 +336,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void); ...@@ -334,8 +336,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void); struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
void kvm_arm_halt_guest(struct kvm *kvm); void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm); void kvm_arm_resume_guest(struct kvm *kvm);
void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
u64 __kvm_call_hyp(void *hypfn, ...); u64 __kvm_call_hyp(void *hypfn, ...);
#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__) #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
......
...@@ -127,6 +127,7 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu); ...@@ -127,6 +127,7 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
void __vgic_v3_save_state(struct kvm_vcpu *vcpu); void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
void __vgic_v3_restore_state(struct kvm_vcpu *vcpu); void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
void __timer_save_state(struct kvm_vcpu *vcpu); void __timer_save_state(struct kvm_vcpu *vcpu);
void __timer_restore_state(struct kvm_vcpu *vcpu); void __timer_restore_state(struct kvm_vcpu *vcpu);
......
...@@ -180,14 +180,31 @@ ...@@ -180,14 +180,31 @@
#define SYS_VBAR_EL1 sys_reg(3, 0, 12, 0, 0) #define SYS_VBAR_EL1 sys_reg(3, 0, 12, 0, 0)
#define SYS_ICC_IAR0_EL1 sys_reg(3, 0, 12, 8, 0)
#define SYS_ICC_EOIR0_EL1 sys_reg(3, 0, 12, 8, 1)
#define SYS_ICC_HPPIR0_EL1 sys_reg(3, 0, 12, 8, 2)
#define SYS_ICC_BPR0_EL1 sys_reg(3, 0, 12, 8, 3)
#define SYS_ICC_AP0Rn_EL1(n) sys_reg(3, 0, 12, 8, 4 | n)
#define SYS_ICC_AP0R0_EL1 SYS_ICC_AP0Rn_EL1(0)
#define SYS_ICC_AP0R1_EL1 SYS_ICC_AP0Rn_EL1(1)
#define SYS_ICC_AP0R2_EL1 SYS_ICC_AP0Rn_EL1(2)
#define SYS_ICC_AP0R3_EL1 SYS_ICC_AP0Rn_EL1(3)
#define SYS_ICC_AP1Rn_EL1(n) sys_reg(3, 0, 12, 9, n)
#define SYS_ICC_AP1R0_EL1 SYS_ICC_AP1Rn_EL1(0)
#define SYS_ICC_AP1R1_EL1 SYS_ICC_AP1Rn_EL1(1)
#define SYS_ICC_AP1R2_EL1 SYS_ICC_AP1Rn_EL1(2)
#define SYS_ICC_AP1R3_EL1 SYS_ICC_AP1Rn_EL1(3)
#define SYS_ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1) #define SYS_ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1)
#define SYS_ICC_RPR_EL1 sys_reg(3, 0, 12, 11, 3)
#define SYS_ICC_SGI1R_EL1 sys_reg(3, 0, 12, 11, 5) #define SYS_ICC_SGI1R_EL1 sys_reg(3, 0, 12, 11, 5)
#define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0) #define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0)
#define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1) #define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1)
#define SYS_ICC_HPPIR1_EL1 sys_reg(3, 0, 12, 12, 2)
#define SYS_ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3) #define SYS_ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3)
#define SYS_ICC_CTLR_EL1 sys_reg(3, 0, 12, 12, 4) #define SYS_ICC_CTLR_EL1 sys_reg(3, 0, 12, 12, 4)
#define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5) #define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5)
#define SYS_ICC_GRPEN1_EL1 sys_reg(3, 0, 12, 12, 7) #define SYS_ICC_IGRPEN0_EL1 sys_reg(3, 0, 12, 12, 6)
#define SYS_ICC_IGRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)
#define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1) #define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1)
#define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4) #define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4)
...@@ -287,8 +304,8 @@ ...@@ -287,8 +304,8 @@
#define SCTLR_ELx_M 1 #define SCTLR_ELx_M 1
#define SCTLR_EL2_RES1 ((1 << 4) | (1 << 5) | (1 << 11) | (1 << 16) | \ #define SCTLR_EL2_RES1 ((1 << 4) | (1 << 5) | (1 << 11) | (1 << 16) | \
(1 << 16) | (1 << 18) | (1 << 22) | (1 << 23) | \ (1 << 18) | (1 << 22) | (1 << 23) | (1 << 28) | \
(1 << 28) | (1 << 29)) (1 << 29))
#define SCTLR_ELx_FLAGS (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \ #define SCTLR_ELx_FLAGS (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \
SCTLR_ELx_SA | SCTLR_ELx_I) SCTLR_ELx_SA | SCTLR_ELx_I)
......
...@@ -232,6 +232,9 @@ struct kvm_arch_memory_slot { ...@@ -232,6 +232,9 @@ struct kvm_arch_memory_slot {
#define KVM_ARM_VCPU_PMU_V3_CTRL 0 #define KVM_ARM_VCPU_PMU_V3_CTRL 0
#define KVM_ARM_VCPU_PMU_V3_IRQ 0 #define KVM_ARM_VCPU_PMU_V3_IRQ 0
#define KVM_ARM_VCPU_PMU_V3_INIT 1 #define KVM_ARM_VCPU_PMU_V3_INIT 1
#define KVM_ARM_VCPU_TIMER_CTRL 1
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
/* KVM_IRQ_LINE irq field index values */ /* KVM_IRQ_LINE irq field index values */
#define KVM_ARM_IRQ_TYPE_SHIFT 24 #define KVM_ARM_IRQ_TYPE_SHIFT 24
......
...@@ -132,6 +132,27 @@ const struct arm64_cpu_capabilities arm64_errata[] = { ...@@ -132,6 +132,27 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
.capability = ARM64_WORKAROUND_CAVIUM_27456, .capability = ARM64_WORKAROUND_CAVIUM_27456,
MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00), MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
}, },
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_30115
{
/* Cavium ThunderX, T88 pass 1.x - 2.2 */
.desc = "Cavium erratum 30115",
.capability = ARM64_WORKAROUND_CAVIUM_30115,
MIDR_RANGE(MIDR_THUNDERX, 0x00,
(1 << MIDR_VARIANT_SHIFT) | 2),
},
{
/* Cavium ThunderX, T81 pass 1.0 - 1.2 */
.desc = "Cavium erratum 30115",
.capability = ARM64_WORKAROUND_CAVIUM_30115,
MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x02),
},
{
/* Cavium ThunderX, T83 pass 1.0 */
.desc = "Cavium erratum 30115",
.capability = ARM64_WORKAROUND_CAVIUM_30115,
MIDR_RANGE(MIDR_THUNDERX_83XX, 0x00, 0x00),
},
#endif #endif
{ {
.desc = "Mismatched cache line size", .desc = "Mismatched cache line size",
......
...@@ -390,6 +390,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu, ...@@ -390,6 +390,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PMU_V3_CTRL: case KVM_ARM_VCPU_PMU_V3_CTRL:
ret = kvm_arm_pmu_v3_set_attr(vcpu, attr); ret = kvm_arm_pmu_v3_set_attr(vcpu, attr);
break; break;
case KVM_ARM_VCPU_TIMER_CTRL:
ret = kvm_arm_timer_set_attr(vcpu, attr);
break;
default: default:
ret = -ENXIO; ret = -ENXIO;
break; break;
...@@ -407,6 +410,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu, ...@@ -407,6 +410,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PMU_V3_CTRL: case KVM_ARM_VCPU_PMU_V3_CTRL:
ret = kvm_arm_pmu_v3_get_attr(vcpu, attr); ret = kvm_arm_pmu_v3_get_attr(vcpu, attr);
break; break;
case KVM_ARM_VCPU_TIMER_CTRL:
ret = kvm_arm_timer_get_attr(vcpu, attr);
break;
default: default:
ret = -ENXIO; ret = -ENXIO;
break; break;
...@@ -424,6 +430,9 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, ...@@ -424,6 +430,9 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PMU_V3_CTRL: case KVM_ARM_VCPU_PMU_V3_CTRL:
ret = kvm_arm_pmu_v3_has_attr(vcpu, attr); ret = kvm_arm_pmu_v3_has_attr(vcpu, attr);
break; break;
case KVM_ARM_VCPU_TIMER_CTRL:
ret = kvm_arm_timer_has_attr(vcpu, attr);
break;
default: default:
ret = -ENXIO; ret = -ENXIO;
break; break;
......
...@@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run) ...@@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false); trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
vcpu->stat.wfi_exit_stat++; vcpu->stat.wfi_exit_stat++;
kvm_vcpu_block(vcpu); kvm_vcpu_block(vcpu);
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
} }
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu)); kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
......
...@@ -350,6 +350,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) ...@@ -350,6 +350,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
} }
} }
if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
exit_code == ARM_EXCEPTION_TRAP &&
(kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
int ret = __vgic_v3_perform_cpuif_access(vcpu);
if (ret == 1) {
__skip_instr(vcpu);
goto again;
}
/* 0 falls through to be handled out of EL2 */
}
fp_enabled = __fpsimd_enabled(); fp_enabled = __fpsimd_enabled();
__sysreg_save_guest_state(guest_ctxt); __sysreg_save_guest_state(guest_ctxt);
...@@ -422,6 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void) ...@@ -422,6 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2); vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
__timer_save_state(vcpu);
__deactivate_traps(vcpu); __deactivate_traps(vcpu);
__deactivate_vm(vcpu); __deactivate_vm(vcpu);
__sysreg_restore_host_state(host_ctxt); __sysreg_restore_host_state(host_ctxt);
......
...@@ -46,16 +46,6 @@ static const struct kvm_regs default_regs_reset32 = { ...@@ -46,16 +46,6 @@ static const struct kvm_regs default_regs_reset32 = {
COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT), COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT),
}; };
static const struct kvm_irq_level default_ptimer_irq = {
.irq = 30,
.level = 1,
};
static const struct kvm_irq_level default_vtimer_irq = {
.irq = 27,
.level = 1,
};
static bool cpu_has_32bit_el1(void) static bool cpu_has_32bit_el1(void)
{ {
u64 pfr0; u64 pfr0;
...@@ -108,8 +98,6 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -108,8 +98,6 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
*/ */
int kvm_reset_vcpu(struct kvm_vcpu *vcpu) int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
{ {
const struct kvm_irq_level *cpu_vtimer_irq;
const struct kvm_irq_level *cpu_ptimer_irq;
const struct kvm_regs *cpu_reset; const struct kvm_regs *cpu_reset;
switch (vcpu->arch.target) { switch (vcpu->arch.target) {
...@@ -122,8 +110,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) ...@@ -122,8 +110,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
cpu_reset = &default_regs_reset; cpu_reset = &default_regs_reset;
} }
cpu_vtimer_irq = &default_vtimer_irq;
cpu_ptimer_irq = &default_ptimer_irq;
break; break;
} }
...@@ -137,5 +123,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) ...@@ -137,5 +123,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
kvm_pmu_vcpu_reset(vcpu); kvm_pmu_vcpu_reset(vcpu);
/* Reset timer */ /* Reset timer */
return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq, cpu_ptimer_irq); return kvm_timer_vcpu_reset(vcpu);
} }
...@@ -56,7 +56,8 @@ ...@@ -56,7 +56,8 @@
*/ */
static bool read_from_write_only(struct kvm_vcpu *vcpu, static bool read_from_write_only(struct kvm_vcpu *vcpu,
const struct sys_reg_params *params) struct sys_reg_params *params,
const struct sys_reg_desc *r)
{ {
WARN_ONCE(1, "Unexpected sys_reg read to write-only register\n"); WARN_ONCE(1, "Unexpected sys_reg read to write-only register\n");
print_sys_reg_instr(params); print_sys_reg_instr(params);
...@@ -64,6 +65,16 @@ static bool read_from_write_only(struct kvm_vcpu *vcpu, ...@@ -64,6 +65,16 @@ static bool read_from_write_only(struct kvm_vcpu *vcpu,
return false; return false;
} }
static bool write_to_read_only(struct kvm_vcpu *vcpu,
struct sys_reg_params *params,
const struct sys_reg_desc *r)
{
WARN_ONCE(1, "Unexpected sys_reg write to read-only register\n");
print_sys_reg_instr(params);
kvm_inject_undefined(vcpu);
return false;
}
/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */ /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
static u32 cache_levels; static u32 cache_levels;
...@@ -93,7 +104,7 @@ static bool access_dcsw(struct kvm_vcpu *vcpu, ...@@ -93,7 +104,7 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r) const struct sys_reg_desc *r)
{ {
if (!p->is_write) if (!p->is_write)
return read_from_write_only(vcpu, p); return read_from_write_only(vcpu, p, r);
kvm_set_way_flush(vcpu); kvm_set_way_flush(vcpu);
return true; return true;
...@@ -135,7 +146,7 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu, ...@@ -135,7 +146,7 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r) const struct sys_reg_desc *r)
{ {
if (!p->is_write) if (!p->is_write)
return read_from_write_only(vcpu, p); return read_from_write_only(vcpu, p, r);
vgic_v3_dispatch_sgi(vcpu, p->regval); vgic_v3_dispatch_sgi(vcpu, p->regval);
...@@ -773,7 +784,7 @@ static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p, ...@@ -773,7 +784,7 @@ static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return trap_raz_wi(vcpu, p, r); return trap_raz_wi(vcpu, p, r);
if (!p->is_write) if (!p->is_write)
return read_from_write_only(vcpu, p); return read_from_write_only(vcpu, p, r);
if (pmu_write_swinc_el0_disabled(vcpu)) if (pmu_write_swinc_el0_disabled(vcpu))
return false; return false;
...@@ -953,7 +964,15 @@ static const struct sys_reg_desc sys_reg_descs[] = { ...@@ -953,7 +964,15 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 }, { SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_EOIR0_EL1), read_from_write_only },
{ SYS_DESC(SYS_ICC_HPPIR0_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_DIR_EL1), read_from_write_only },
{ SYS_DESC(SYS_ICC_RPR_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_SGI1R_EL1), access_gic_sgi }, { SYS_DESC(SYS_ICC_SGI1R_EL1), access_gic_sgi },
{ SYS_DESC(SYS_ICC_IAR1_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_EOIR1_EL1), read_from_write_only },
{ SYS_DESC(SYS_ICC_HPPIR1_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_SRE_EL1), access_gic_sre }, { SYS_DESC(SYS_ICC_SRE_EL1), access_gic_sre },
{ SYS_DESC(SYS_CONTEXTIDR_EL1), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 }, { SYS_DESC(SYS_CONTEXTIDR_EL1), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
......
...@@ -268,36 +268,21 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu, struct sys_reg_params *p, ...@@ -268,36 +268,21 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return true; return true;
} }
static const struct sys_reg_desc gic_v3_icc_reg_descs[] = { static const struct sys_reg_desc gic_v3_icc_reg_descs[] = {
/* ICC_PMR_EL1 */ { SYS_DESC(SYS_ICC_PMR_EL1), access_gic_pmr },
{ Op0(3), Op1(0), CRn(4), CRm(6), Op2(0), access_gic_pmr }, { SYS_DESC(SYS_ICC_BPR0_EL1), access_gic_bpr0 },
/* ICC_BPR0_EL1 */ { SYS_DESC(SYS_ICC_AP0R0_EL1), access_gic_ap0r },
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(3), access_gic_bpr0 }, { SYS_DESC(SYS_ICC_AP0R1_EL1), access_gic_ap0r },
/* ICC_AP0R0_EL1 */ { SYS_DESC(SYS_ICC_AP0R2_EL1), access_gic_ap0r },
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(4), access_gic_ap0r }, { SYS_DESC(SYS_ICC_AP0R3_EL1), access_gic_ap0r },
/* ICC_AP0R1_EL1 */ { SYS_DESC(SYS_ICC_AP1R0_EL1), access_gic_ap1r },
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(5), access_gic_ap0r }, { SYS_DESC(SYS_ICC_AP1R1_EL1), access_gic_ap1r },
/* ICC_AP0R2_EL1 */ { SYS_DESC(SYS_ICC_AP1R2_EL1), access_gic_ap1r },
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(6), access_gic_ap0r }, { SYS_DESC(SYS_ICC_AP1R3_EL1), access_gic_ap1r },
/* ICC_AP0R3_EL1 */ { SYS_DESC(SYS_ICC_BPR1_EL1), access_gic_bpr1 },
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(7), access_gic_ap0r }, { SYS_DESC(SYS_ICC_CTLR_EL1), access_gic_ctlr },
/* ICC_AP1R0_EL1 */ { SYS_DESC(SYS_ICC_SRE_EL1), access_gic_sre },
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(0), access_gic_ap1r }, { SYS_DESC(SYS_ICC_IGRPEN0_EL1), access_gic_grpen0 },
/* ICC_AP1R1_EL1 */ { SYS_DESC(SYS_ICC_IGRPEN1_EL1), access_gic_grpen1 },
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(1), access_gic_ap1r },
/* ICC_AP1R2_EL1 */
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(2), access_gic_ap1r },
/* ICC_AP1R3_EL1 */
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(3), access_gic_ap1r },
/* ICC_BPR1_EL1 */
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(3), access_gic_bpr1 },
/* ICC_CTLR_EL1 */
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(4), access_gic_ctlr },
/* ICC_SRE_EL1 */
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(5), access_gic_sre },
/* ICC_IGRPEN0_EL1 */
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(6), access_gic_grpen0 },
/* ICC_GRPEN1_EL1 */
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(7), access_gic_grpen1 },
}; };
int vgic_v3_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, bool is_write, u64 id, int vgic_v3_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, bool is_write, u64 id,
......
...@@ -1094,7 +1094,7 @@ static void kvm_trap_emul_check_requests(struct kvm_vcpu *vcpu, int cpu, ...@@ -1094,7 +1094,7 @@ static void kvm_trap_emul_check_requests(struct kvm_vcpu *vcpu, int cpu,
struct mm_struct *mm; struct mm_struct *mm;
int i; int i;
if (likely(!vcpu->requests)) if (likely(!kvm_request_pending(vcpu)))
return; return;
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) { if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
......
...@@ -2337,7 +2337,7 @@ static int kvm_vz_check_requests(struct kvm_vcpu *vcpu, int cpu) ...@@ -2337,7 +2337,7 @@ static int kvm_vz_check_requests(struct kvm_vcpu *vcpu, int cpu)
int ret = 0; int ret = 0;
int i; int i;
if (!vcpu->requests) if (!kvm_request_pending(vcpu))
return 0; return 0;
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) { if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
......
...@@ -86,7 +86,6 @@ struct kvmppc_vcore { ...@@ -86,7 +86,6 @@ struct kvmppc_vcore {
u16 last_cpu; u16 last_cpu;
u8 vcore_state; u8 vcore_state;
u8 in_guest; u8 in_guest;
struct kvmppc_vcore *master_vcore;
struct kvm_vcpu *runnable_threads[MAX_SMT_THREADS]; struct kvm_vcpu *runnable_threads[MAX_SMT_THREADS];
struct list_head preempt_list; struct list_head preempt_list;
spinlock_t lock; spinlock_t lock;
......
...@@ -81,7 +81,7 @@ struct kvm_split_mode { ...@@ -81,7 +81,7 @@ struct kvm_split_mode {
u8 subcore_size; u8 subcore_size;
u8 do_nap; u8 do_nap;
u8 napped[MAX_SMT_THREADS]; u8 napped[MAX_SMT_THREADS];
struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; struct kvmppc_vcore *vc[MAX_SUBCORES];
}; };
/* /*
......
...@@ -35,6 +35,7 @@ ...@@ -35,6 +35,7 @@
#include <asm/page.h> #include <asm/page.h>
#include <asm/cacheflush.h> #include <asm/cacheflush.h>
#include <asm/hvcall.h> #include <asm/hvcall.h>
#include <asm/mce.h>
#define KVM_MAX_VCPUS NR_CPUS #define KVM_MAX_VCPUS NR_CPUS
#define KVM_MAX_VCORES NR_CPUS #define KVM_MAX_VCORES NR_CPUS
...@@ -52,8 +53,8 @@ ...@@ -52,8 +53,8 @@
#define KVM_IRQCHIP_NUM_PINS 256 #define KVM_IRQCHIP_NUM_PINS 256
/* PPC-specific vcpu->requests bit members */ /* PPC-specific vcpu->requests bit members */
#define KVM_REQ_WATCHDOG 8 #define KVM_REQ_WATCHDOG KVM_ARCH_REQ(0)
#define KVM_REQ_EPR_EXIT 9 #define KVM_REQ_EPR_EXIT KVM_ARCH_REQ(1)
#include <linux/mmu_notifier.h> #include <linux/mmu_notifier.h>
...@@ -267,6 +268,8 @@ struct kvm_resize_hpt; ...@@ -267,6 +268,8 @@ struct kvm_resize_hpt;
struct kvm_arch { struct kvm_arch {
unsigned int lpid; unsigned int lpid;
unsigned int smt_mode; /* # vcpus per virtual core */
unsigned int emul_smt_mode; /* emualted SMT mode, on P9 */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
unsigned int tlb_sets; unsigned int tlb_sets;
struct kvm_hpt_info hpt; struct kvm_hpt_info hpt;
...@@ -285,6 +288,7 @@ struct kvm_arch { ...@@ -285,6 +288,7 @@ struct kvm_arch {
cpumask_t need_tlb_flush; cpumask_t need_tlb_flush;
cpumask_t cpu_in_guest; cpumask_t cpu_in_guest;
u8 radix; u8 radix;
u8 fwnmi_enabled;
pgd_t *pgtable; pgd_t *pgtable;
u64 process_table; u64 process_table;
struct dentry *debugfs_dir; struct dentry *debugfs_dir;
...@@ -566,6 +570,7 @@ struct kvm_vcpu_arch { ...@@ -566,6 +570,7 @@ struct kvm_vcpu_arch {
ulong wort; ulong wort;
ulong tid; ulong tid;
ulong psscr; ulong psscr;
ulong hfscr;
ulong shadow_srr1; ulong shadow_srr1;
#endif #endif
u32 vrsave; /* also USPRG0 */ u32 vrsave; /* also USPRG0 */
...@@ -579,7 +584,7 @@ struct kvm_vcpu_arch { ...@@ -579,7 +584,7 @@ struct kvm_vcpu_arch {
ulong mcsrr0; ulong mcsrr0;
ulong mcsrr1; ulong mcsrr1;
ulong mcsr; ulong mcsr;
u32 dec; ulong dec;
#ifdef CONFIG_BOOKE #ifdef CONFIG_BOOKE
u32 decar; u32 decar;
#endif #endif
...@@ -710,6 +715,7 @@ struct kvm_vcpu_arch { ...@@ -710,6 +715,7 @@ struct kvm_vcpu_arch {
unsigned long pending_exceptions; unsigned long pending_exceptions;
u8 ceded; u8 ceded;
u8 prodded; u8 prodded;
u8 doorbell_request;
u32 last_inst; u32 last_inst;
struct swait_queue_head *wqp; struct swait_queue_head *wqp;
...@@ -722,6 +728,7 @@ struct kvm_vcpu_arch { ...@@ -722,6 +728,7 @@ struct kvm_vcpu_arch {
int prev_cpu; int prev_cpu;
bool timer_running; bool timer_running;
wait_queue_head_t cpu_run; wait_queue_head_t cpu_run;
struct machine_check_event mce_evt; /* Valid if trap == 0x200 */
struct kvm_vcpu_arch_shared *shared; struct kvm_vcpu_arch_shared *shared;
#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE) #if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
......
...@@ -315,6 +315,8 @@ struct kvmppc_ops { ...@@ -315,6 +315,8 @@ struct kvmppc_ops {
struct irq_bypass_producer *); struct irq_bypass_producer *);
int (*configure_mmu)(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg); int (*configure_mmu)(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg);
int (*get_rmmu_info)(struct kvm *kvm, struct kvm_ppc_rmmu_info *info); int (*get_rmmu_info)(struct kvm *kvm, struct kvm_ppc_rmmu_info *info);
int (*set_smt_mode)(struct kvm *kvm, unsigned long mode,
unsigned long flags);
}; };
extern struct kvmppc_ops *kvmppc_hv_ops; extern struct kvmppc_ops *kvmppc_hv_ops;
......
...@@ -103,6 +103,8 @@ ...@@ -103,6 +103,8 @@
#define OP_31_XOP_STBUX 247 #define OP_31_XOP_STBUX 247
#define OP_31_XOP_LHZX 279 #define OP_31_XOP_LHZX 279
#define OP_31_XOP_LHZUX 311 #define OP_31_XOP_LHZUX 311
#define OP_31_XOP_MSGSNDP 142
#define OP_31_XOP_MSGCLRP 174
#define OP_31_XOP_MFSPR 339 #define OP_31_XOP_MFSPR 339
#define OP_31_XOP_LWAX 341 #define OP_31_XOP_LWAX 341
#define OP_31_XOP_LHAX 343 #define OP_31_XOP_LHAX 343
......
...@@ -60,6 +60,12 @@ struct kvm_regs { ...@@ -60,6 +60,12 @@ struct kvm_regs {
#define KVM_SREGS_E_FSL_PIDn (1 << 0) /* PID1/PID2 */ #define KVM_SREGS_E_FSL_PIDn (1 << 0) /* PID1/PID2 */
/* flags for kvm_run.flags */
#define KVM_RUN_PPC_NMI_DISP_MASK (3 << 0)
#define KVM_RUN_PPC_NMI_DISP_FULLY_RECOV (1 << 0)
#define KVM_RUN_PPC_NMI_DISP_LIMITED_RECOV (2 << 0)
#define KVM_RUN_PPC_NMI_DISP_NOT_RECOV (3 << 0)
/* /*
* Feature bits indicate which sections of the sregs struct are valid, * Feature bits indicate which sections of the sregs struct are valid,
* both in KVM_GET_SREGS and KVM_SET_SREGS. On KVM_SET_SREGS, registers * both in KVM_GET_SREGS and KVM_SET_SREGS. On KVM_SET_SREGS, registers
......
...@@ -485,6 +485,7 @@ int main(void) ...@@ -485,6 +485,7 @@ int main(void)
OFFSET(KVM_ENABLED_HCALLS, kvm, arch.enabled_hcalls); OFFSET(KVM_ENABLED_HCALLS, kvm, arch.enabled_hcalls);
OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v); OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v);
OFFSET(KVM_RADIX, kvm, arch.radix); OFFSET(KVM_RADIX, kvm, arch.radix);
OFFSET(KVM_FWNMI, kvm, arch.fwnmi_enabled);
OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr); OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr);
OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar); OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar);
OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr); OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
...@@ -513,6 +514,7 @@ int main(void) ...@@ -513,6 +514,7 @@ int main(void)
OFFSET(VCPU_PENDING_EXC, kvm_vcpu, arch.pending_exceptions); OFFSET(VCPU_PENDING_EXC, kvm_vcpu, arch.pending_exceptions);
OFFSET(VCPU_CEDED, kvm_vcpu, arch.ceded); OFFSET(VCPU_CEDED, kvm_vcpu, arch.ceded);
OFFSET(VCPU_PRODDED, kvm_vcpu, arch.prodded); OFFSET(VCPU_PRODDED, kvm_vcpu, arch.prodded);
OFFSET(VCPU_DBELL_REQ, kvm_vcpu, arch.doorbell_request);
OFFSET(VCPU_MMCR, kvm_vcpu, arch.mmcr); OFFSET(VCPU_MMCR, kvm_vcpu, arch.mmcr);
OFFSET(VCPU_PMC, kvm_vcpu, arch.pmc); OFFSET(VCPU_PMC, kvm_vcpu, arch.pmc);
OFFSET(VCPU_SPMC, kvm_vcpu, arch.spmc); OFFSET(VCPU_SPMC, kvm_vcpu, arch.spmc);
...@@ -542,6 +544,7 @@ int main(void) ...@@ -542,6 +544,7 @@ int main(void)
OFFSET(VCPU_WORT, kvm_vcpu, arch.wort); OFFSET(VCPU_WORT, kvm_vcpu, arch.wort);
OFFSET(VCPU_TID, kvm_vcpu, arch.tid); OFFSET(VCPU_TID, kvm_vcpu, arch.tid);
OFFSET(VCPU_PSSCR, kvm_vcpu, arch.psscr); OFFSET(VCPU_PSSCR, kvm_vcpu, arch.psscr);
OFFSET(VCPU_HFSCR, kvm_vcpu, arch.hfscr);
OFFSET(VCORE_ENTRY_EXIT, kvmppc_vcore, entry_exit_map); OFFSET(VCORE_ENTRY_EXIT, kvmppc_vcore, entry_exit_map);
OFFSET(VCORE_IN_GUEST, kvmppc_vcore, in_guest); OFFSET(VCORE_IN_GUEST, kvmppc_vcore, in_guest);
OFFSET(VCORE_NAPPING_THREADS, kvmppc_vcore, napping_threads); OFFSET(VCORE_NAPPING_THREADS, kvmppc_vcore, napping_threads);
......
...@@ -405,6 +405,7 @@ void machine_check_print_event_info(struct machine_check_event *evt, ...@@ -405,6 +405,7 @@ void machine_check_print_event_info(struct machine_check_event *evt,
break; break;
} }
} }
EXPORT_SYMBOL_GPL(machine_check_print_event_info);
uint64_t get_mce_fault_addr(struct machine_check_event *evt) uint64_t get_mce_fault_addr(struct machine_check_event *evt)
{ {
......
此差异已折叠。
...@@ -307,7 +307,7 @@ void kvmhv_commence_exit(int trap) ...@@ -307,7 +307,7 @@ void kvmhv_commence_exit(int trap)
return; return;
for (i = 0; i < MAX_SUBCORES; ++i) { for (i = 0; i < MAX_SUBCORES; ++i) {
vc = sip->master_vcs[i]; vc = sip->vc[i];
if (!vc) if (!vc)
break; break;
do { do {
......
...@@ -61,13 +61,6 @@ BEGIN_FTR_SECTION ...@@ -61,13 +61,6 @@ BEGIN_FTR_SECTION
std r3, HSTATE_DABR(r13) std r3, HSTATE_DABR(r13)
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
/* Hard-disable interrupts */
mfmsr r10
std r10, HSTATE_HOST_MSR(r13)
rldicl r10,r10,48,1
rotldi r10,r10,16
mtmsrd r10,1
/* Save host PMU registers */ /* Save host PMU registers */
BEGIN_FTR_SECTION BEGIN_FTR_SECTION
/* Work around P8 PMAE bug */ /* Work around P8 PMAE bug */
...@@ -153,6 +146,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) ...@@ -153,6 +146,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
* *
* R1 = host R1 * R1 = host R1
* R2 = host R2 * R2 = host R2
* R3 = trap number on this thread
* R12 = exit handler id * R12 = exit handler id
* R13 = PACA * R13 = PACA
*/ */
......
...@@ -130,12 +130,28 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu) ...@@ -130,12 +130,28 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
out: out:
/* /*
* For guest that supports FWNMI capability, hook the MCE event into
* vcpu structure. We are going to exit the guest with KVM_EXIT_NMI
* exit reason. On our way to exit we will pull this event from vcpu
* structure and print it from thread 0 of the core/subcore.
*
* For guest that does not support FWNMI capability (old QEMU):
* We are now going enter guest either through machine check * We are now going enter guest either through machine check
* interrupt (for unhandled errors) or will continue from * interrupt (for unhandled errors) or will continue from
* current HSRR0 (for handled errors) in guest. Hence * current HSRR0 (for handled errors) in guest. Hence
* queue up the event so that we can log it from host console later. * queue up the event so that we can log it from host console later.
*/ */
machine_check_queue_event(); if (vcpu->kvm->arch.fwnmi_enabled) {
/*
* Hook up the mce event on to vcpu structure.
* First clear the old event.
*/
memset(&vcpu->arch.mce_evt, 0, sizeof(vcpu->arch.mce_evt));
if (get_mce_event(&mce_evt, MCE_EVENT_RELEASE)) {
vcpu->arch.mce_evt = mce_evt;
}
} else
machine_check_queue_event();
return handled; return handled;
} }
......
...@@ -45,7 +45,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) ...@@ -45,7 +45,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define NAPPING_NOVCPU 2 #define NAPPING_NOVCPU 2
/* Stack frame offsets for kvmppc_hv_entry */ /* Stack frame offsets for kvmppc_hv_entry */
#define SFS 144 #define SFS 160
#define STACK_SLOT_TRAP (SFS-4) #define STACK_SLOT_TRAP (SFS-4)
#define STACK_SLOT_TID (SFS-16) #define STACK_SLOT_TID (SFS-16)
#define STACK_SLOT_PSSCR (SFS-24) #define STACK_SLOT_PSSCR (SFS-24)
...@@ -54,6 +54,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) ...@@ -54,6 +54,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_CIABR (SFS-48) #define STACK_SLOT_CIABR (SFS-48)
#define STACK_SLOT_DAWR (SFS-56) #define STACK_SLOT_DAWR (SFS-56)
#define STACK_SLOT_DAWRX (SFS-64) #define STACK_SLOT_DAWRX (SFS-64)
#define STACK_SLOT_HFSCR (SFS-72)
/* /*
* Call kvmppc_hv_entry in real mode. * Call kvmppc_hv_entry in real mode.
...@@ -68,6 +69,7 @@ _GLOBAL_TOC(kvmppc_hv_entry_trampoline) ...@@ -68,6 +69,7 @@ _GLOBAL_TOC(kvmppc_hv_entry_trampoline)
std r0, PPC_LR_STKOFF(r1) std r0, PPC_LR_STKOFF(r1)
stdu r1, -112(r1) stdu r1, -112(r1)
mfmsr r10 mfmsr r10
std r10, HSTATE_HOST_MSR(r13)
LOAD_REG_ADDR(r5, kvmppc_call_hv_entry) LOAD_REG_ADDR(r5, kvmppc_call_hv_entry)
li r0,MSR_RI li r0,MSR_RI
andc r0,r10,r0 andc r0,r10,r0
...@@ -152,20 +154,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) ...@@ -152,20 +154,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
stb r0, HSTATE_HWTHREAD_REQ(r13) stb r0, HSTATE_HWTHREAD_REQ(r13)
/* /*
* For external and machine check interrupts, we need * For external interrupts we need to call the Linux
* to call the Linux handler to process the interrupt. * handler to process the interrupt. We do that by jumping
* We do that by jumping to absolute address 0x500 for * to absolute address 0x500 for external interrupts.
* external interrupts, or the machine_check_fwnmi label * The [h]rfid at the end of the handler will return to
* for machine checks (since firmware might have patched * the book3s_hv_interrupts.S code. For other interrupts
* the vector area at 0x200). The [h]rfid at the end of the * we do the rfid to get back to the book3s_hv_interrupts.S
* handler will return to the book3s_hv_interrupts.S code. * code here.
* For other interrupts we do the rfid to get back
* to the book3s_hv_interrupts.S code here.
*/ */
ld r8, 112+PPC_LR_STKOFF(r1) ld r8, 112+PPC_LR_STKOFF(r1)
addi r1, r1, 112 addi r1, r1, 112
ld r7, HSTATE_HOST_MSR(r13) ld r7, HSTATE_HOST_MSR(r13)
/* Return the trap number on this thread as the return value */
mr r3, r12
/* /*
* If we came back from the guest via a relocation-on interrupt, * If we came back from the guest via a relocation-on interrupt,
* we will be in virtual mode at this point, which makes it a * we will be in virtual mode at this point, which makes it a
...@@ -175,59 +178,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) ...@@ -175,59 +178,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
andi. r0, r0, MSR_IR /* in real mode? */ andi. r0, r0, MSR_IR /* in real mode? */
bne .Lvirt_return bne .Lvirt_return
cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK /* RFI into the highmem handler */
cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
beq 15f /* Invoke the H_DOORBELL handler */
cmpwi cr2, r12, BOOK3S_INTERRUPT_HMI
beq cr2, 14f /* HMI check */
/* RFI into the highmem handler, or branch to interrupt handler */
mfmsr r6 mfmsr r6
li r0, MSR_RI li r0, MSR_RI
andc r6, r6, r0 andc r6, r6, r0
mtmsrd r6, 1 /* Clear RI in MSR */ mtmsrd r6, 1 /* Clear RI in MSR */
mtsrr0 r8 mtsrr0 r8
mtsrr1 r7 mtsrr1 r7
beq cr1, 13f /* machine check */
RFI RFI
/* On POWER7, we have external interrupts set to use HSRR0/1 */ /* Virtual-mode return */
11: mtspr SPRN_HSRR0, r8
mtspr SPRN_HSRR1, r7
ba 0x500
13: b machine_check_fwnmi
14: mtspr SPRN_HSRR0, r8
mtspr SPRN_HSRR1, r7
b hmi_exception_after_realmode
15: mtspr SPRN_HSRR0, r8
mtspr SPRN_HSRR1, r7
ba 0xe80
/* Virtual-mode return - can't get here for HMI or machine check */
.Lvirt_return: .Lvirt_return:
cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL mtlr r8
beq 16f
cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
beq 17f
andi. r0, r7, MSR_EE /* were interrupts hard-enabled? */
beq 18f
mtmsrd r7, 1 /* if so then re-enable them */
18: mtlr r8
blr blr
16: mtspr SPRN_HSRR0, r8 /* jump to reloc-on external vector */
mtspr SPRN_HSRR1, r7
b exc_virt_0x4500_hardware_interrupt
17: mtspr SPRN_HSRR0, r8
mtspr SPRN_HSRR1, r7
b exc_virt_0x4e80_h_doorbell
kvmppc_primary_no_guest: kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */ /* We handle this much like a ceded vcpu */
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */ /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
...@@ -769,6 +733,8 @@ BEGIN_FTR_SECTION ...@@ -769,6 +733,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_PSSCR(r1) std r6, STACK_SLOT_PSSCR(r1)
std r7, STACK_SLOT_PID(r1) std r7, STACK_SLOT_PID(r1)
std r8, STACK_SLOT_IAMR(r1) std r8, STACK_SLOT_IAMR(r1)
mfspr r5, SPRN_HFSCR
std r5, STACK_SLOT_HFSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
BEGIN_FTR_SECTION BEGIN_FTR_SECTION
mfspr r5, SPRN_CIABR mfspr r5, SPRN_CIABR
...@@ -920,8 +886,10 @@ FTR_SECTION_ELSE ...@@ -920,8 +886,10 @@ FTR_SECTION_ELSE
ld r5, VCPU_TID(r4) ld r5, VCPU_TID(r4)
ld r6, VCPU_PSSCR(r4) ld r6, VCPU_PSSCR(r4)
oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */ oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
ld r7, VCPU_HFSCR(r4)
mtspr SPRN_TIDR, r5 mtspr SPRN_TIDR, r5
mtspr SPRN_PSSCR, r6 mtspr SPRN_PSSCR, r6
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
8: 8:
...@@ -936,7 +904,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) ...@@ -936,7 +904,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
mftb r7 mftb r7
subf r3,r7,r8 subf r3,r7,r8
mtspr SPRN_DEC,r3 mtspr SPRN_DEC,r3
stw r3,VCPU_DEC(r4) std r3,VCPU_DEC(r4)
ld r5, VCPU_SPRG0(r4) ld r5, VCPU_SPRG0(r4)
ld r6, VCPU_SPRG1(r4) ld r6, VCPU_SPRG1(r4)
...@@ -1048,7 +1016,13 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */ ...@@ -1048,7 +1016,13 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */
li r0, BOOK3S_INTERRUPT_EXTERNAL li r0, BOOK3S_INTERRUPT_EXTERNAL
bne cr1, 12f bne cr1, 12f
mfspr r0, SPRN_DEC mfspr r0, SPRN_DEC
cmpwi r0, 0 BEGIN_FTR_SECTION
/* On POWER9 check whether the guest has large decrementer enabled */
andis. r8, r8, LPCR_LD@h
bne 15f
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
extsw r0, r0
15: cmpdi r0, 0
li r0, BOOK3S_INTERRUPT_DECREMENTER li r0, BOOK3S_INTERRUPT_DECREMENTER
bge 5f bge 5f
...@@ -1058,6 +1032,23 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */ ...@@ -1058,6 +1032,23 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */
mr r9, r4 mr r9, r4
bl kvmppc_msr_interrupt bl kvmppc_msr_interrupt
5: 5:
BEGIN_FTR_SECTION
b fast_guest_return
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
/* On POWER9, check for pending doorbell requests */
lbz r0, VCPU_DBELL_REQ(r4)
cmpwi r0, 0
beq fast_guest_return
ld r5, HSTATE_KVM_VCORE(r13)
/* Set DPDES register so the CPU will take a doorbell interrupt */
li r0, 1
mtspr SPRN_DPDES, r0
std r0, VCORE_DPDES(r5)
/* Make sure other cpus see vcore->dpdes set before dbell req clear */
lwsync
/* Clear the pending doorbell request */
li r0, 0
stb r0, VCPU_DBELL_REQ(r4)
/* /*
* Required state: * Required state:
...@@ -1232,6 +1223,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ...@@ -1232,6 +1223,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
stw r12,VCPU_TRAP(r9) stw r12,VCPU_TRAP(r9)
/*
* Now that we have saved away SRR0/1 and HSRR0/1,
* interrupts are recoverable in principle, so set MSR_RI.
* This becomes important for relocation-on interrupts from
* the guest, which we can get in radix mode on POWER9.
*/
li r0, MSR_RI
mtmsrd r0, 1
#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
addi r3, r9, VCPU_TB_RMINTR addi r3, r9, VCPU_TB_RMINTR
mr r4, r9 mr r4, r9
...@@ -1288,6 +1288,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ...@@ -1288,6 +1288,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
beq 4f beq 4f
b guest_exit_cont b guest_exit_cont
3: 3:
/* If it's a hypervisor facility unavailable interrupt, save HFSCR */
cmpwi r12, BOOK3S_INTERRUPT_H_FAC_UNAVAIL
bne 14f
mfspr r3, SPRN_HFSCR
std r3, VCPU_HFSCR(r9)
b guest_exit_cont
14:
/* External interrupt ? */ /* External interrupt ? */
cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
bne+ guest_exit_cont bne+ guest_exit_cont
...@@ -1475,12 +1482,18 @@ mc_cont: ...@@ -1475,12 +1482,18 @@ mc_cont:
mtspr SPRN_SPURR,r4 mtspr SPRN_SPURR,r4
/* Save DEC */ /* Save DEC */
ld r3, HSTATE_KVM_VCORE(r13)
mfspr r5,SPRN_DEC mfspr r5,SPRN_DEC
mftb r6 mftb r6
/* On P9, if the guest has large decr enabled, don't sign extend */
BEGIN_FTR_SECTION
ld r4, VCORE_LPCR(r3)
andis. r4, r4, LPCR_LD@h
bne 16f
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
extsw r5,r5 extsw r5,r5
add r5,r5,r6 16: add r5,r5,r6
/* r5 is a guest timebase value here, convert to host TB */ /* r5 is a guest timebase value here, convert to host TB */
ld r3,HSTATE_KVM_VCORE(r13)
ld r4,VCORE_TB_OFFSET(r3) ld r4,VCORE_TB_OFFSET(r3)
subf r5,r4,r5 subf r5,r4,r5
std r5,VCPU_DEC_EXPIRES(r9) std r5,VCPU_DEC_EXPIRES(r9)
...@@ -1525,6 +1538,9 @@ FTR_SECTION_ELSE ...@@ -1525,6 +1538,9 @@ FTR_SECTION_ELSE
rldicl r6, r6, 4, 50 /* r6 &= PSSCR_GUEST_VIS */ rldicl r6, r6, 4, 50 /* r6 &= PSSCR_GUEST_VIS */
rotldi r6, r6, 60 rotldi r6, r6, 60
std r6, VCPU_PSSCR(r9) std r6, VCPU_PSSCR(r9)
/* Restore host HFSCR value */
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
/* /*
* Restore various registers to 0, where non-zero values * Restore various registers to 0, where non-zero values
...@@ -2402,8 +2418,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) ...@@ -2402,8 +2418,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
mfspr r3, SPRN_DEC mfspr r3, SPRN_DEC
mfspr r4, SPRN_HDEC mfspr r4, SPRN_HDEC
mftb r5 mftb r5
BEGIN_FTR_SECTION
/* On P9 check whether the guest has large decrementer mode enabled */
ld r6, HSTATE_KVM_VCORE(r13)
ld r6, VCORE_LPCR(r6)
andis. r6, r6, LPCR_LD@h
bne 68f
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
extsw r3, r3 extsw r3, r3
EXTEND_HDEC(r4) 68: EXTEND_HDEC(r4)
cmpd r3, r4 cmpd r3, r4
ble 67f ble 67f
mtspr SPRN_DEC, r4 mtspr SPRN_DEC, r4
...@@ -2589,22 +2612,32 @@ machine_check_realmode: ...@@ -2589,22 +2612,32 @@ machine_check_realmode:
ld r9, HSTATE_KVM_VCPU(r13) ld r9, HSTATE_KVM_VCPU(r13)
li r12, BOOK3S_INTERRUPT_MACHINE_CHECK li r12, BOOK3S_INTERRUPT_MACHINE_CHECK
/* /*
* Deliver unhandled/fatal (e.g. UE) MCE errors to guest through * For the guest that is FWNMI capable, deliver all the MCE errors
* machine check interrupt (set HSRR0 to 0x200). And for handled * (handled/unhandled) by exiting the guest with KVM_EXIT_NMI exit
* errors (no-fatal), just go back to guest execution with current * reason. This new approach injects machine check errors in guest
* HSRR0 instead of exiting guest. This new approach will inject * address space to guest with additional information in the form
* machine check to guest for fatal error causing guest to crash. * of RTAS event, thus enabling guest kernel to suitably handle
* * such errors.
* The old code used to return to host for unhandled errors which
* was causing guest to hang with soft lockups inside guest and
* makes it difficult to recover guest instance.
* *
* For the guest that is not FWNMI capable (old QEMU) fallback
* to old behaviour for backward compatibility:
* Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
* through machine check interrupt (set HSRR0 to 0x200).
* For handled errors (no-fatal), just go back to guest execution
* with current HSRR0.
* if we receive machine check with MSR(RI=0) then deliver it to * if we receive machine check with MSR(RI=0) then deliver it to
* guest as machine check causing guest to crash. * guest as machine check causing guest to crash.
*/ */
ld r11, VCPU_MSR(r9) ld r11, VCPU_MSR(r9)
rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */ rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */
bne mc_cont /* if so, exit to host */ bne mc_cont /* if so, exit to host */
/* Check if guest is capable of handling NMI exit */
ld r10, VCPU_KVM(r9)
lbz r10, KVM_FWNMI(r10)
cmpdi r10, 1 /* FWNMI capable? */
beq mc_cont /* if so, exit with KVM_EXIT_NMI. */
/* if not, fall through for backward compatibility. */
andi. r10, r11, MSR_RI /* check for unrecoverable exception */ andi. r10, r11, MSR_RI /* check for unrecoverable exception */
beq 1f /* Deliver a machine check to guest */ beq 1f /* Deliver a machine check to guest */
ld r10, VCPU_PC(r9) ld r10, VCPU_PC(r9)
......
...@@ -1257,8 +1257,8 @@ static void xive_pre_save_scan(struct kvmppc_xive *xive) ...@@ -1257,8 +1257,8 @@ static void xive_pre_save_scan(struct kvmppc_xive *xive)
if (!xc) if (!xc)
continue; continue;
for (j = 0; j < KVMPPC_XIVE_Q_COUNT; j++) { for (j = 0; j < KVMPPC_XIVE_Q_COUNT; j++) {
if (xc->queues[i].qpage) if (xc->queues[j].qpage)
xive_pre_save_queue(xive, &xc->queues[i]); xive_pre_save_queue(xive, &xc->queues[j]);
} }
} }
......
...@@ -687,7 +687,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu) ...@@ -687,7 +687,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
kvmppc_core_check_exceptions(vcpu); kvmppc_core_check_exceptions(vcpu);
if (vcpu->requests) { if (kvm_request_pending(vcpu)) {
/* Exception delivery raised request; start over */ /* Exception delivery raised request; start over */
return 1; return 1;
} }
......
...@@ -39,7 +39,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) ...@@ -39,7 +39,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
unsigned long dec_nsec; unsigned long dec_nsec;
unsigned long long dec_time; unsigned long long dec_time;
pr_debug("mtDEC: %x\n", vcpu->arch.dec); pr_debug("mtDEC: %lx\n", vcpu->arch.dec);
hrtimer_try_to_cancel(&vcpu->arch.dec_timer); hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
#ifdef CONFIG_PPC_BOOK3S #ifdef CONFIG_PPC_BOOK3S
...@@ -109,7 +109,7 @@ static int kvmppc_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) ...@@ -109,7 +109,7 @@ static int kvmppc_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
case SPRN_TBWU: break; case SPRN_TBWU: break;
case SPRN_DEC: case SPRN_DEC:
vcpu->arch.dec = spr_val; vcpu->arch.dec = (u32) spr_val;
kvmppc_emulate_dec(vcpu); kvmppc_emulate_dec(vcpu);
break; break;
......
...@@ -55,8 +55,7 @@ EXPORT_SYMBOL_GPL(kvmppc_pr_ops); ...@@ -55,8 +55,7 @@ EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
{ {
return !!(v->arch.pending_exceptions) || return !!(v->arch.pending_exceptions) || kvm_request_pending(v);
v->requests;
} }
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
...@@ -108,7 +107,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) ...@@ -108,7 +107,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
*/ */
smp_mb(); smp_mb();
if (vcpu->requests) { if (kvm_request_pending(vcpu)) {
/* Make sure we process requests preemptable */ /* Make sure we process requests preemptable */
local_irq_enable(); local_irq_enable();
trace_kvm_check_requests(vcpu); trace_kvm_check_requests(vcpu);
...@@ -554,13 +553,28 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -554,13 +553,28 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_SMT: case KVM_CAP_PPC_SMT:
r = 0; r = 0;
if (hv_enabled) { if (kvm) {
if (kvm->arch.emul_smt_mode > 1)
r = kvm->arch.emul_smt_mode;
else
r = kvm->arch.smt_mode;
} else if (hv_enabled) {
if (cpu_has_feature(CPU_FTR_ARCH_300)) if (cpu_has_feature(CPU_FTR_ARCH_300))
r = 1; r = 1;
else else
r = threads_per_subcore; r = threads_per_subcore;
} }
break; break;
case KVM_CAP_PPC_SMT_POSSIBLE:
r = 1;
if (hv_enabled) {
if (!cpu_has_feature(CPU_FTR_ARCH_300))
r = ((threads_per_subcore << 1) - 1);
else
/* P9 can emulate dbells, so allow any mode */
r = 8 | 4 | 2 | 1;
}
break;
case KVM_CAP_PPC_RMA: case KVM_CAP_PPC_RMA:
r = 0; r = 0;
break; break;
...@@ -618,6 +632,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -618,6 +632,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
/* Disable this on POWER9 until code handles new HPTE format */ /* Disable this on POWER9 until code handles new HPTE format */
r = !!hv_enabled && !cpu_has_feature(CPU_FTR_ARCH_300); r = !!hv_enabled && !cpu_has_feature(CPU_FTR_ARCH_300);
break; break;
#endif
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_FWNMI:
r = hv_enabled;
break;
#endif #endif
case KVM_CAP_PPC_HTM: case KVM_CAP_PPC_HTM:
r = cpu_has_feature(CPU_FTR_TM_COMP) && r = cpu_has_feature(CPU_FTR_TM_COMP) &&
...@@ -1538,6 +1557,15 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, ...@@ -1538,6 +1557,15 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
break; break;
} }
#endif /* CONFIG_KVM_XICS */ #endif /* CONFIG_KVM_XICS */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_FWNMI:
r = -EINVAL;
if (!is_kvmppc_hv_enabled(vcpu->kvm))
break;
r = 0;
vcpu->kvm->arch.fwnmi_enabled = true;
break;
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
default: default:
r = -EINVAL; r = -EINVAL;
break; break;
...@@ -1712,6 +1740,15 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, ...@@ -1712,6 +1740,15 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = 0; r = 0;
break; break;
} }
case KVM_CAP_PPC_SMT: {
unsigned long mode = cap->args[0];
unsigned long flags = cap->args[1];
r = -EINVAL;
if (kvm->arch.kvm_ops->set_smt_mode)
r = kvm->arch.kvm_ops->set_smt_mode(kvm, mode, flags);
break;
}
#endif #endif
default: default:
r = -EINVAL; r = -EINVAL;
......
...@@ -59,7 +59,9 @@ union ctlreg0 { ...@@ -59,7 +59,9 @@ union ctlreg0 {
unsigned long lap : 1; /* Low-address-protection control */ unsigned long lap : 1; /* Low-address-protection control */
unsigned long : 4; unsigned long : 4;
unsigned long edat : 1; /* Enhanced-DAT-enablement control */ unsigned long edat : 1; /* Enhanced-DAT-enablement control */
unsigned long : 4; unsigned long : 2;
unsigned long iep : 1; /* Instruction-Execution-Protection */
unsigned long : 1;
unsigned long afp : 1; /* AFP-register control */ unsigned long afp : 1; /* AFP-register control */
unsigned long vx : 1; /* Vector enablement control */ unsigned long vx : 1; /* Vector enablement control */
unsigned long : 7; unsigned long : 7;
......
...@@ -42,9 +42,11 @@ ...@@ -42,9 +42,11 @@
#define KVM_HALT_POLL_NS_DEFAULT 80000 #define KVM_HALT_POLL_NS_DEFAULT 80000
/* s390-specific vcpu->requests bit members */ /* s390-specific vcpu->requests bit members */
#define KVM_REQ_ENABLE_IBS 8 #define KVM_REQ_ENABLE_IBS KVM_ARCH_REQ(0)
#define KVM_REQ_DISABLE_IBS 9 #define KVM_REQ_DISABLE_IBS KVM_ARCH_REQ(1)
#define KVM_REQ_ICPT_OPEREXC 10 #define KVM_REQ_ICPT_OPEREXC KVM_ARCH_REQ(2)
#define KVM_REQ_START_MIGRATION KVM_ARCH_REQ(3)
#define KVM_REQ_STOP_MIGRATION KVM_ARCH_REQ(4)
#define SIGP_CTRL_C 0x80 #define SIGP_CTRL_C 0x80
#define SIGP_CTRL_SCN_MASK 0x3f #define SIGP_CTRL_SCN_MASK 0x3f
...@@ -56,7 +58,7 @@ union bsca_sigp_ctrl { ...@@ -56,7 +58,7 @@ union bsca_sigp_ctrl {
__u8 r : 1; __u8 r : 1;
__u8 scn : 6; __u8 scn : 6;
}; };
} __packed; };
union esca_sigp_ctrl { union esca_sigp_ctrl {
__u16 value; __u16 value;
...@@ -65,14 +67,14 @@ union esca_sigp_ctrl { ...@@ -65,14 +67,14 @@ union esca_sigp_ctrl {
__u8 reserved: 7; __u8 reserved: 7;
__u8 scn; __u8 scn;
}; };
} __packed; };
struct esca_entry { struct esca_entry {
union esca_sigp_ctrl sigp_ctrl; union esca_sigp_ctrl sigp_ctrl;
__u16 reserved1[3]; __u16 reserved1[3];
__u64 sda; __u64 sda;
__u64 reserved2[6]; __u64 reserved2[6];
} __packed; };
struct bsca_entry { struct bsca_entry {
__u8 reserved0; __u8 reserved0;
...@@ -80,7 +82,7 @@ struct bsca_entry { ...@@ -80,7 +82,7 @@ struct bsca_entry {
__u16 reserved[3]; __u16 reserved[3];
__u64 sda; __u64 sda;
__u64 reserved2[2]; __u64 reserved2[2];
} __attribute__((packed)); };
union ipte_control { union ipte_control {
unsigned long val; unsigned long val;
...@@ -97,7 +99,7 @@ struct bsca_block { ...@@ -97,7 +99,7 @@ struct bsca_block {
__u64 mcn; __u64 mcn;
__u64 reserved2; __u64 reserved2;
struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS]; struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
} __attribute__((packed)); };
struct esca_block { struct esca_block {
union ipte_control ipte_control; union ipte_control ipte_control;
...@@ -105,7 +107,7 @@ struct esca_block { ...@@ -105,7 +107,7 @@ struct esca_block {
__u64 mcn[4]; __u64 mcn[4];
__u64 reserved2[20]; __u64 reserved2[20];
struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS]; struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
} __packed; };
/* /*
* This struct is used to store some machine check info from lowcore * This struct is used to store some machine check info from lowcore
...@@ -274,7 +276,7 @@ struct kvm_s390_sie_block { ...@@ -274,7 +276,7 @@ struct kvm_s390_sie_block {
struct kvm_s390_itdb { struct kvm_s390_itdb {
__u8 data[256]; __u8 data[256];
} __packed; };
struct sie_page { struct sie_page {
struct kvm_s390_sie_block sie_block; struct kvm_s390_sie_block sie_block;
...@@ -282,7 +284,7 @@ struct sie_page { ...@@ -282,7 +284,7 @@ struct sie_page {
__u8 reserved218[1000]; /* 0x0218 */ __u8 reserved218[1000]; /* 0x0218 */
struct kvm_s390_itdb itdb; /* 0x0600 */ struct kvm_s390_itdb itdb; /* 0x0600 */
__u8 reserved700[2304]; /* 0x0700 */ __u8 reserved700[2304]; /* 0x0700 */
} __packed; };
struct kvm_vcpu_stat { struct kvm_vcpu_stat {
u64 exit_userspace; u64 exit_userspace;
...@@ -695,7 +697,7 @@ struct sie_page2 { ...@@ -695,7 +697,7 @@ struct sie_page2 {
__u64 fac_list[S390_ARCH_FAC_LIST_SIZE_U64]; /* 0x0000 */ __u64 fac_list[S390_ARCH_FAC_LIST_SIZE_U64]; /* 0x0000 */
struct kvm_s390_crypto_cb crycb; /* 0x0800 */ struct kvm_s390_crypto_cb crycb; /* 0x0800 */
u8 reserved900[0x1000 - 0x900]; /* 0x0900 */ u8 reserved900[0x1000 - 0x900]; /* 0x0900 */
} __packed; };
struct kvm_s390_vsie { struct kvm_s390_vsie {
struct mutex mutex; struct mutex mutex;
...@@ -705,6 +707,12 @@ struct kvm_s390_vsie { ...@@ -705,6 +707,12 @@ struct kvm_s390_vsie {
struct page *pages[KVM_MAX_VCPUS]; struct page *pages[KVM_MAX_VCPUS];
}; };
struct kvm_s390_migration_state {
unsigned long bitmap_size; /* in bits (number of guest pages) */
atomic64_t dirty_pages; /* number of dirty pages */
unsigned long *pgste_bitmap;
};
struct kvm_arch{ struct kvm_arch{
void *sca; void *sca;
int use_esca; int use_esca;
...@@ -732,6 +740,7 @@ struct kvm_arch{ ...@@ -732,6 +740,7 @@ struct kvm_arch{
struct kvm_s390_crypto crypto; struct kvm_s390_crypto crypto;
struct kvm_s390_vsie vsie; struct kvm_s390_vsie vsie;
u64 epoch; u64 epoch;
struct kvm_s390_migration_state *migration_state;
/* subset of available cpu features enabled by user space */ /* subset of available cpu features enabled by user space */
DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS); DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
}; };
......
...@@ -26,6 +26,12 @@ ...@@ -26,6 +26,12 @@
#define MCCK_CODE_PSW_MWP_VALID _BITUL(63 - 20) #define MCCK_CODE_PSW_MWP_VALID _BITUL(63 - 20)
#define MCCK_CODE_PSW_IA_VALID _BITUL(63 - 23) #define MCCK_CODE_PSW_IA_VALID _BITUL(63 - 23)
#define MCCK_CR14_CR_PENDING_SUB_MASK (1 << 28)
#define MCCK_CR14_RECOVERY_SUB_MASK (1 << 27)
#define MCCK_CR14_DEGRAD_SUB_MASK (1 << 26)
#define MCCK_CR14_EXT_DAMAGE_SUB_MASK (1 << 25)
#define MCCK_CR14_WARN_SUB_MASK (1 << 24)
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
union mci { union mci {
......
...@@ -28,6 +28,7 @@ ...@@ -28,6 +28,7 @@
#define KVM_DEV_FLIC_CLEAR_IO_IRQ 8 #define KVM_DEV_FLIC_CLEAR_IO_IRQ 8
#define KVM_DEV_FLIC_AISM 9 #define KVM_DEV_FLIC_AISM 9
#define KVM_DEV_FLIC_AIRQ_INJECT 10 #define KVM_DEV_FLIC_AIRQ_INJECT 10
#define KVM_DEV_FLIC_AISM_ALL 11
/* /*
* We can have up to 4*64k pending subchannels + 8 adapter interrupts, * We can have up to 4*64k pending subchannels + 8 adapter interrupts,
* as well as up to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts. * as well as up to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts.
...@@ -53,6 +54,11 @@ struct kvm_s390_ais_req { ...@@ -53,6 +54,11 @@ struct kvm_s390_ais_req {
__u16 mode; __u16 mode;
}; };
struct kvm_s390_ais_all {
__u8 simm;
__u8 nimm;
};
#define KVM_S390_IO_ADAPTER_MASK 1 #define KVM_S390_IO_ADAPTER_MASK 1
#define KVM_S390_IO_ADAPTER_MAP 2 #define KVM_S390_IO_ADAPTER_MAP 2
#define KVM_S390_IO_ADAPTER_UNMAP 3 #define KVM_S390_IO_ADAPTER_UNMAP 3
...@@ -70,6 +76,7 @@ struct kvm_s390_io_adapter_req { ...@@ -70,6 +76,7 @@ struct kvm_s390_io_adapter_req {
#define KVM_S390_VM_TOD 1 #define KVM_S390_VM_TOD 1
#define KVM_S390_VM_CRYPTO 2 #define KVM_S390_VM_CRYPTO 2
#define KVM_S390_VM_CPU_MODEL 3 #define KVM_S390_VM_CPU_MODEL 3
#define KVM_S390_VM_MIGRATION 4
/* kvm attributes for mem_ctrl */ /* kvm attributes for mem_ctrl */
#define KVM_S390_VM_MEM_ENABLE_CMMA 0 #define KVM_S390_VM_MEM_ENABLE_CMMA 0
...@@ -151,6 +158,11 @@ struct kvm_s390_vm_cpu_subfunc { ...@@ -151,6 +158,11 @@ struct kvm_s390_vm_cpu_subfunc {
#define KVM_S390_VM_CRYPTO_DISABLE_AES_KW 2 #define KVM_S390_VM_CRYPTO_DISABLE_AES_KW 2
#define KVM_S390_VM_CRYPTO_DISABLE_DEA_KW 3 #define KVM_S390_VM_CRYPTO_DISABLE_DEA_KW 3
/* kvm attributes for migration mode */
#define KVM_S390_VM_MIGRATION_STOP 0
#define KVM_S390_VM_MIGRATION_START 1
#define KVM_S390_VM_MIGRATION_STATUS 2
/* for KVM_GET_REGS and KVM_SET_REGS */ /* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs { struct kvm_regs {
/* general purpose regs for s390 */ /* general purpose regs for s390 */
......
...@@ -89,7 +89,7 @@ struct region3_table_entry_fc1 { ...@@ -89,7 +89,7 @@ struct region3_table_entry_fc1 {
unsigned long f : 1; /* Fetch-Protection Bit */ unsigned long f : 1; /* Fetch-Protection Bit */
unsigned long fc : 1; /* Format-Control */ unsigned long fc : 1; /* Format-Control */
unsigned long p : 1; /* DAT-Protection Bit */ unsigned long p : 1; /* DAT-Protection Bit */
unsigned long co : 1; /* Change-Recording Override */ unsigned long iep: 1; /* Instruction-Execution-Protection */
unsigned long : 2; unsigned long : 2;
unsigned long i : 1; /* Region-Invalid Bit */ unsigned long i : 1; /* Region-Invalid Bit */
unsigned long cr : 1; /* Common-Region Bit */ unsigned long cr : 1; /* Common-Region Bit */
...@@ -131,7 +131,7 @@ struct segment_entry_fc1 { ...@@ -131,7 +131,7 @@ struct segment_entry_fc1 {
unsigned long f : 1; /* Fetch-Protection Bit */ unsigned long f : 1; /* Fetch-Protection Bit */
unsigned long fc : 1; /* Format-Control */ unsigned long fc : 1; /* Format-Control */
unsigned long p : 1; /* DAT-Protection Bit */ unsigned long p : 1; /* DAT-Protection Bit */
unsigned long co : 1; /* Change-Recording Override */ unsigned long iep: 1; /* Instruction-Execution-Protection */
unsigned long : 2; unsigned long : 2;
unsigned long i : 1; /* Segment-Invalid Bit */ unsigned long i : 1; /* Segment-Invalid Bit */
unsigned long cs : 1; /* Common-Segment Bit */ unsigned long cs : 1; /* Common-Segment Bit */
...@@ -168,7 +168,8 @@ union page_table_entry { ...@@ -168,7 +168,8 @@ union page_table_entry {
unsigned long z : 1; /* Zero Bit */ unsigned long z : 1; /* Zero Bit */
unsigned long i : 1; /* Page-Invalid Bit */ unsigned long i : 1; /* Page-Invalid Bit */
unsigned long p : 1; /* DAT-Protection Bit */ unsigned long p : 1; /* DAT-Protection Bit */
unsigned long : 9; unsigned long iep: 1; /* Instruction-Execution-Protection */
unsigned long : 8;
}; };
}; };
...@@ -241,7 +242,7 @@ struct ale { ...@@ -241,7 +242,7 @@ struct ale {
unsigned long asteo : 25; /* ASN-Second-Table-Entry Origin */ unsigned long asteo : 25; /* ASN-Second-Table-Entry Origin */
unsigned long : 6; unsigned long : 6;
unsigned long astesn : 32; /* ASTE Sequence Number */ unsigned long astesn : 32; /* ASTE Sequence Number */
} __packed; };
struct aste { struct aste {
unsigned long i : 1; /* ASX-Invalid Bit */ unsigned long i : 1; /* ASX-Invalid Bit */
...@@ -257,7 +258,7 @@ struct aste { ...@@ -257,7 +258,7 @@ struct aste {
unsigned long ald : 32; unsigned long ald : 32;
unsigned long astesn : 32; unsigned long astesn : 32;
/* .. more fields there */ /* .. more fields there */
} __packed; };
int ipte_lock_held(struct kvm_vcpu *vcpu) int ipte_lock_held(struct kvm_vcpu *vcpu)
{ {
...@@ -485,6 +486,7 @@ enum prot_type { ...@@ -485,6 +486,7 @@ enum prot_type {
PROT_TYPE_KEYC = 1, PROT_TYPE_KEYC = 1,
PROT_TYPE_ALC = 2, PROT_TYPE_ALC = 2,
PROT_TYPE_DAT = 3, PROT_TYPE_DAT = 3,
PROT_TYPE_IEP = 4,
}; };
static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
...@@ -500,6 +502,9 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, ...@@ -500,6 +502,9 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
switch (code) { switch (code) {
case PGM_PROTECTION: case PGM_PROTECTION:
switch (prot) { switch (prot) {
case PROT_TYPE_IEP:
tec->b61 = 1;
/* FALL THROUGH */
case PROT_TYPE_LA: case PROT_TYPE_LA:
tec->b56 = 1; tec->b56 = 1;
break; break;
...@@ -591,6 +596,7 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val) ...@@ -591,6 +596,7 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val)
* @gpa: points to where guest physical (absolute) address should be stored * @gpa: points to where guest physical (absolute) address should be stored
* @asce: effective asce * @asce: effective asce
* @mode: indicates the access mode to be used * @mode: indicates the access mode to be used
* @prot: returns the type for protection exceptions
* *
* Translate a guest virtual address into a guest absolute address by means * Translate a guest virtual address into a guest absolute address by means
* of dynamic address translation as specified by the architecture. * of dynamic address translation as specified by the architecture.
...@@ -606,19 +612,21 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val) ...@@ -606,19 +612,21 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val)
*/ */
static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva, static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
unsigned long *gpa, const union asce asce, unsigned long *gpa, const union asce asce,
enum gacc_mode mode) enum gacc_mode mode, enum prot_type *prot)
{ {
union vaddress vaddr = {.addr = gva}; union vaddress vaddr = {.addr = gva};
union raddress raddr = {.addr = gva}; union raddress raddr = {.addr = gva};
union page_table_entry pte; union page_table_entry pte;
int dat_protection = 0; int dat_protection = 0;
int iep_protection = 0;
union ctlreg0 ctlreg0; union ctlreg0 ctlreg0;
unsigned long ptr; unsigned long ptr;
int edat1, edat2; int edat1, edat2, iep;
ctlreg0.val = vcpu->arch.sie_block->gcr[0]; ctlreg0.val = vcpu->arch.sie_block->gcr[0];
edat1 = ctlreg0.edat && test_kvm_facility(vcpu->kvm, 8); edat1 = ctlreg0.edat && test_kvm_facility(vcpu->kvm, 8);
edat2 = edat1 && test_kvm_facility(vcpu->kvm, 78); edat2 = edat1 && test_kvm_facility(vcpu->kvm, 78);
iep = ctlreg0.iep && test_kvm_facility(vcpu->kvm, 130);
if (asce.r) if (asce.r)
goto real_address; goto real_address;
ptr = asce.origin * 4096; ptr = asce.origin * 4096;
...@@ -702,6 +710,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva, ...@@ -702,6 +710,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
return PGM_TRANSLATION_SPEC; return PGM_TRANSLATION_SPEC;
if (rtte.fc && edat2) { if (rtte.fc && edat2) {
dat_protection |= rtte.fc1.p; dat_protection |= rtte.fc1.p;
iep_protection = rtte.fc1.iep;
raddr.rfaa = rtte.fc1.rfaa; raddr.rfaa = rtte.fc1.rfaa;
goto absolute_address; goto absolute_address;
} }
...@@ -729,6 +738,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva, ...@@ -729,6 +738,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
return PGM_TRANSLATION_SPEC; return PGM_TRANSLATION_SPEC;
if (ste.fc && edat1) { if (ste.fc && edat1) {
dat_protection |= ste.fc1.p; dat_protection |= ste.fc1.p;
iep_protection = ste.fc1.iep;
raddr.sfaa = ste.fc1.sfaa; raddr.sfaa = ste.fc1.sfaa;
goto absolute_address; goto absolute_address;
} }
...@@ -745,12 +755,19 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva, ...@@ -745,12 +755,19 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
if (pte.z) if (pte.z)
return PGM_TRANSLATION_SPEC; return PGM_TRANSLATION_SPEC;
dat_protection |= pte.p; dat_protection |= pte.p;
iep_protection = pte.iep;
raddr.pfra = pte.pfra; raddr.pfra = pte.pfra;
real_address: real_address:
raddr.addr = kvm_s390_real_to_abs(vcpu, raddr.addr); raddr.addr = kvm_s390_real_to_abs(vcpu, raddr.addr);
absolute_address: absolute_address:
if (mode == GACC_STORE && dat_protection) if (mode == GACC_STORE && dat_protection) {
*prot = PROT_TYPE_DAT;
return PGM_PROTECTION; return PGM_PROTECTION;
}
if (mode == GACC_IFETCH && iep_protection && iep) {
*prot = PROT_TYPE_IEP;
return PGM_PROTECTION;
}
if (kvm_is_error_gpa(vcpu->kvm, raddr.addr)) if (kvm_is_error_gpa(vcpu->kvm, raddr.addr))
return PGM_ADDRESSING; return PGM_ADDRESSING;
*gpa = raddr.addr; *gpa = raddr.addr;
...@@ -782,6 +799,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, ...@@ -782,6 +799,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
{ {
psw_t *psw = &vcpu->arch.sie_block->gpsw; psw_t *psw = &vcpu->arch.sie_block->gpsw;
int lap_enabled, rc = 0; int lap_enabled, rc = 0;
enum prot_type prot;
lap_enabled = low_address_protection_enabled(vcpu, asce); lap_enabled = low_address_protection_enabled(vcpu, asce);
while (nr_pages) { while (nr_pages) {
...@@ -791,7 +809,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, ...@@ -791,7 +809,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
PROT_TYPE_LA); PROT_TYPE_LA);
ga &= PAGE_MASK; ga &= PAGE_MASK;
if (psw_bits(*psw).dat) { if (psw_bits(*psw).dat) {
rc = guest_translate(vcpu, ga, pages, asce, mode); rc = guest_translate(vcpu, ga, pages, asce, mode, &prot);
if (rc < 0) if (rc < 0)
return rc; return rc;
} else { } else {
...@@ -800,7 +818,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, ...@@ -800,7 +818,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
rc = PGM_ADDRESSING; rc = PGM_ADDRESSING;
} }
if (rc) if (rc)
return trans_exc(vcpu, rc, ga, ar, mode, PROT_TYPE_DAT); return trans_exc(vcpu, rc, ga, ar, mode, prot);
ga += PAGE_SIZE; ga += PAGE_SIZE;
pages++; pages++;
nr_pages--; nr_pages--;
...@@ -886,6 +904,7 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar, ...@@ -886,6 +904,7 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
unsigned long *gpa, enum gacc_mode mode) unsigned long *gpa, enum gacc_mode mode)
{ {
psw_t *psw = &vcpu->arch.sie_block->gpsw; psw_t *psw = &vcpu->arch.sie_block->gpsw;
enum prot_type prot;
union asce asce; union asce asce;
int rc; int rc;
...@@ -900,9 +919,9 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar, ...@@ -900,9 +919,9 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
} }
if (psw_bits(*psw).dat && !asce.r) { /* Use DAT? */ if (psw_bits(*psw).dat && !asce.r) { /* Use DAT? */
rc = guest_translate(vcpu, gva, gpa, asce, mode); rc = guest_translate(vcpu, gva, gpa, asce, mode, &prot);
if (rc > 0) if (rc > 0)
return trans_exc(vcpu, rc, gva, 0, mode, PROT_TYPE_DAT); return trans_exc(vcpu, rc, gva, 0, mode, prot);
} else { } else {
*gpa = kvm_s390_real_to_abs(vcpu, gva); *gpa = kvm_s390_real_to_abs(vcpu, gva);
if (kvm_is_error_gpa(vcpu->kvm, *gpa)) if (kvm_is_error_gpa(vcpu->kvm, *gpa))
......
...@@ -251,8 +251,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu) ...@@ -251,8 +251,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask); __clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
if (psw_mchk_disabled(vcpu)) if (psw_mchk_disabled(vcpu))
active_mask &= ~IRQ_PEND_MCHK_MASK; active_mask &= ~IRQ_PEND_MCHK_MASK;
/*
* Check both floating and local interrupt's cr14 because
* bit IRQ_PEND_MCHK_REP could be set in both cases.
*/
if (!(vcpu->arch.sie_block->gcr[14] & if (!(vcpu->arch.sie_block->gcr[14] &
vcpu->kvm->arch.float_int.mchk.cr14)) (vcpu->kvm->arch.float_int.mchk.cr14 |
vcpu->arch.local_int.irq.mchk.cr14)))
__clear_bit(IRQ_PEND_MCHK_REP, &active_mask); __clear_bit(IRQ_PEND_MCHK_REP, &active_mask);
/* /*
...@@ -1876,6 +1881,28 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len) ...@@ -1876,6 +1881,28 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
return ret < 0 ? ret : n; return ret < 0 ? ret : n;
} }
static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_all ais;
if (attr->attr < sizeof(ais))
return -EINVAL;
if (!test_kvm_facility(kvm, 72))
return -ENOTSUPP;
mutex_lock(&fi->ais_lock);
ais.simm = fi->simm;
ais.nimm = fi->nimm;
mutex_unlock(&fi->ais_lock);
if (copy_to_user((void __user *)attr->addr, &ais, sizeof(ais)))
return -EFAULT;
return 0;
}
static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr) static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
{ {
int r; int r;
...@@ -1885,6 +1912,9 @@ static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr) ...@@ -1885,6 +1912,9 @@ static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
r = get_all_floating_irqs(dev->kvm, (u8 __user *) attr->addr, r = get_all_floating_irqs(dev->kvm, (u8 __user *) attr->addr,
attr->attr); attr->attr);
break; break;
case KVM_DEV_FLIC_AISM_ALL:
r = flic_ais_mode_get_all(dev->kvm, attr);
break;
default: default:
r = -EINVAL; r = -EINVAL;
} }
...@@ -2235,6 +2265,25 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr) ...@@ -2235,6 +2265,25 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
return kvm_s390_inject_airq(kvm, adapter); return kvm_s390_inject_airq(kvm, adapter);
} }
static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_all ais;
if (!test_kvm_facility(kvm, 72))
return -ENOTSUPP;
if (copy_from_user(&ais, (void __user *)attr->addr, sizeof(ais)))
return -EFAULT;
mutex_lock(&fi->ais_lock);
fi->simm = ais.simm;
fi->nimm = ais.nimm;
mutex_unlock(&fi->ais_lock);
return 0;
}
static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr) static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
{ {
int r = 0; int r = 0;
...@@ -2277,6 +2326,9 @@ static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr) ...@@ -2277,6 +2326,9 @@ static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
case KVM_DEV_FLIC_AIRQ_INJECT: case KVM_DEV_FLIC_AIRQ_INJECT:
r = flic_inject_airq(dev->kvm, attr); r = flic_inject_airq(dev->kvm, attr);
break; break;
case KVM_DEV_FLIC_AISM_ALL:
r = flic_ais_mode_set_all(dev->kvm, attr);
break;
default: default:
r = -EINVAL; r = -EINVAL;
} }
...@@ -2298,6 +2350,7 @@ static int flic_has_attr(struct kvm_device *dev, ...@@ -2298,6 +2350,7 @@ static int flic_has_attr(struct kvm_device *dev,
case KVM_DEV_FLIC_CLEAR_IO_IRQ: case KVM_DEV_FLIC_CLEAR_IO_IRQ:
case KVM_DEV_FLIC_AISM: case KVM_DEV_FLIC_AISM:
case KVM_DEV_FLIC_AIRQ_INJECT: case KVM_DEV_FLIC_AIRQ_INJECT:
case KVM_DEV_FLIC_AISM_ALL:
return 0; return 0;
} }
return -ENXIO; return -ENXIO;
...@@ -2415,6 +2468,42 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e, ...@@ -2415,6 +2468,42 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
return ret; return ret;
} }
/*
* Inject the machine check to the guest.
*/
void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
struct mcck_volatile_info *mcck_info)
{
struct kvm_s390_interrupt_info inti;
struct kvm_s390_irq irq;
struct kvm_s390_mchk_info *mchk;
union mci mci;
__u64 cr14 = 0; /* upper bits are not used */
mci.val = mcck_info->mcic;
if (mci.sr)
cr14 |= MCCK_CR14_RECOVERY_SUB_MASK;
if (mci.dg)
cr14 |= MCCK_CR14_DEGRAD_SUB_MASK;
if (mci.w)
cr14 |= MCCK_CR14_WARN_SUB_MASK;
mchk = mci.ck ? &inti.mchk : &irq.u.mchk;
mchk->cr14 = cr14;
mchk->mcic = mcck_info->mcic;
mchk->ext_damage_code = mcck_info->ext_damage_code;
mchk->failing_storage_address = mcck_info->failing_storage_address;
if (mci.ck) {
/* Inject the floating machine check */
inti.type = KVM_S390_MCHK;
WARN_ON_ONCE(__inject_vm(vcpu->kvm, &inti));
} else {
/* Inject the machine check to specified vcpu */
irq.type = KVM_S390_MCHK;
WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &irq));
}
}
int kvm_set_routing_entry(struct kvm *kvm, int kvm_set_routing_entry(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *e, struct kvm_kernel_irq_routing_entry *e,
const struct kvm_irq_routing_entry *ue) const struct kvm_irq_routing_entry *ue)
......
...@@ -30,6 +30,7 @@ ...@@ -30,6 +30,7 @@
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <linux/bitmap.h> #include <linux/bitmap.h>
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
#include <linux/string.h>
#include <asm/asm-offsets.h> #include <asm/asm-offsets.h>
#include <asm/lowcore.h> #include <asm/lowcore.h>
...@@ -386,6 +387,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -386,6 +387,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_SKEYS: case KVM_CAP_S390_SKEYS:
case KVM_CAP_S390_IRQ_STATE: case KVM_CAP_S390_IRQ_STATE:
case KVM_CAP_S390_USER_INSTR0: case KVM_CAP_S390_USER_INSTR0:
case KVM_CAP_S390_CMMA_MIGRATION:
case KVM_CAP_S390_AIS: case KVM_CAP_S390_AIS:
r = 1; r = 1;
break; break;
...@@ -749,6 +751,129 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr) ...@@ -749,6 +751,129 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr)
return 0; return 0;
} }
static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
{
int cx;
struct kvm_vcpu *vcpu;
kvm_for_each_vcpu(cx, vcpu, kvm)
kvm_s390_sync_request(req, vcpu);
}
/*
* Must be called with kvm->srcu held to avoid races on memslots, and with
* kvm->lock to avoid races with ourselves and kvm_s390_vm_stop_migration.
*/
static int kvm_s390_vm_start_migration(struct kvm *kvm)
{
struct kvm_s390_migration_state *mgs;
struct kvm_memory_slot *ms;
/* should be the only one */
struct kvm_memslots *slots;
unsigned long ram_pages;
int slotnr;
/* migration mode already enabled */
if (kvm->arch.migration_state)
return 0;
slots = kvm_memslots(kvm);
if (!slots || !slots->used_slots)
return -EINVAL;
mgs = kzalloc(sizeof(*mgs), GFP_KERNEL);
if (!mgs)
return -ENOMEM;
kvm->arch.migration_state = mgs;
if (kvm->arch.use_cmma) {
/*
* Get the last slot. They should be sorted by base_gfn, so the
* last slot is also the one at the end of the address space.
* We have verified above that at least one slot is present.
*/
ms = slots->memslots + slots->used_slots - 1;
/* round up so we only use full longs */
ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG);
/* allocate enough bytes to store all the bits */
mgs->pgste_bitmap = vmalloc(ram_pages / 8);
if (!mgs->pgste_bitmap) {
kfree(mgs);
kvm->arch.migration_state = NULL;
return -ENOMEM;
}
mgs->bitmap_size = ram_pages;
atomic64_set(&mgs->dirty_pages, ram_pages);
/* mark all the pages in active slots as dirty */
for (slotnr = 0; slotnr < slots->used_slots; slotnr++) {
ms = slots->memslots + slotnr;
bitmap_set(mgs->pgste_bitmap, ms->base_gfn, ms->npages);
}
kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
}
return 0;
}
/*
* Must be called with kvm->lock to avoid races with ourselves and
* kvm_s390_vm_start_migration.
*/
static int kvm_s390_vm_stop_migration(struct kvm *kvm)
{
struct kvm_s390_migration_state *mgs;
/* migration mode already disabled */
if (!kvm->arch.migration_state)
return 0;
mgs = kvm->arch.migration_state;
kvm->arch.migration_state = NULL;
if (kvm->arch.use_cmma) {
kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
vfree(mgs->pgste_bitmap);
}
kfree(mgs);
return 0;
}
static int kvm_s390_vm_set_migration(struct kvm *kvm,
struct kvm_device_attr *attr)
{
int idx, res = -ENXIO;
mutex_lock(&kvm->lock);
switch (attr->attr) {
case KVM_S390_VM_MIGRATION_START:
idx = srcu_read_lock(&kvm->srcu);
res = kvm_s390_vm_start_migration(kvm);
srcu_read_unlock(&kvm->srcu, idx);
break;
case KVM_S390_VM_MIGRATION_STOP:
res = kvm_s390_vm_stop_migration(kvm);
break;
default:
break;
}
mutex_unlock(&kvm->lock);
return res;
}
static int kvm_s390_vm_get_migration(struct kvm *kvm,
struct kvm_device_attr *attr)
{
u64 mig = (kvm->arch.migration_state != NULL);
if (attr->attr != KVM_S390_VM_MIGRATION_STATUS)
return -ENXIO;
if (copy_to_user((void __user *)attr->addr, &mig, sizeof(mig)))
return -EFAULT;
return 0;
}
static int kvm_s390_set_tod_high(struct kvm *kvm, struct kvm_device_attr *attr) static int kvm_s390_set_tod_high(struct kvm *kvm, struct kvm_device_attr *attr)
{ {
u8 gtod_high; u8 gtod_high;
...@@ -1089,6 +1214,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr) ...@@ -1089,6 +1214,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
case KVM_S390_VM_CRYPTO: case KVM_S390_VM_CRYPTO:
ret = kvm_s390_vm_set_crypto(kvm, attr); ret = kvm_s390_vm_set_crypto(kvm, attr);
break; break;
case KVM_S390_VM_MIGRATION:
ret = kvm_s390_vm_set_migration(kvm, attr);
break;
default: default:
ret = -ENXIO; ret = -ENXIO;
break; break;
...@@ -1111,6 +1239,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr) ...@@ -1111,6 +1239,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
case KVM_S390_VM_CPU_MODEL: case KVM_S390_VM_CPU_MODEL:
ret = kvm_s390_get_cpu_model(kvm, attr); ret = kvm_s390_get_cpu_model(kvm, attr);
break; break;
case KVM_S390_VM_MIGRATION:
ret = kvm_s390_vm_get_migration(kvm, attr);
break;
default: default:
ret = -ENXIO; ret = -ENXIO;
break; break;
...@@ -1178,6 +1309,9 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr) ...@@ -1178,6 +1309,9 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
break; break;
} }
break; break;
case KVM_S390_VM_MIGRATION:
ret = 0;
break;
default: default:
ret = -ENXIO; ret = -ENXIO;
break; break;
...@@ -1285,6 +1419,182 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) ...@@ -1285,6 +1419,182 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
return r; return r;
} }
/*
* Base address and length must be sent at the start of each block, therefore
* it's cheaper to send some clean data, as long as it's less than the size of
* two longs.
*/
#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *))
/* for consistency */
#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)
/*
* This function searches for the next page with dirty CMMA attributes, and
* saves the attributes in the buffer up to either the end of the buffer or
* until a block of at least KVM_S390_MAX_BIT_DISTANCE clean bits is found;
* no trailing clean bytes are saved.
* In case no dirty bits were found, or if CMMA was not enabled or used, the
* output buffer will indicate 0 as length.
*/
static int kvm_s390_get_cmma_bits(struct kvm *kvm,
struct kvm_s390_cmma_log *args)
{
struct kvm_s390_migration_state *s = kvm->arch.migration_state;
unsigned long bufsize, hva, pgstev, i, next, cur;
int srcu_idx, peek, r = 0, rr;
u8 *res;
cur = args->start_gfn;
i = next = pgstev = 0;
if (unlikely(!kvm->arch.use_cmma))
return -ENXIO;
/* Invalid/unsupported flags were specified */
if (args->flags & ~KVM_S390_CMMA_PEEK)
return -EINVAL;
/* Migration mode query, and we are not doing a migration */
peek = !!(args->flags & KVM_S390_CMMA_PEEK);
if (!peek && !s)
return -EINVAL;
/* CMMA is disabled or was not used, or the buffer has length zero */
bufsize = min(args->count, KVM_S390_CMMA_SIZE_MAX);
if (!bufsize || !kvm->mm->context.use_cmma) {
memset(args, 0, sizeof(*args));
return 0;
}
if (!peek) {
/* We are not peeking, and there are no dirty pages */
if (!atomic64_read(&s->dirty_pages)) {
memset(args, 0, sizeof(*args));
return 0;
}
cur = find_next_bit(s->pgste_bitmap, s->bitmap_size,
args->start_gfn);
if (cur >= s->bitmap_size) /* nothing found, loop back */
cur = find_next_bit(s->pgste_bitmap, s->bitmap_size, 0);
if (cur >= s->bitmap_size) { /* again! (very unlikely) */
memset(args, 0, sizeof(*args));
return 0;
}
next = find_next_bit(s->pgste_bitmap, s->bitmap_size, cur + 1);
}
res = vmalloc(bufsize);
if (!res)
return -ENOMEM;
args->start_gfn = cur;
down_read(&kvm->mm->mmap_sem);
srcu_idx = srcu_read_lock(&kvm->srcu);
while (i < bufsize) {
hva = gfn_to_hva(kvm, cur);
if (kvm_is_error_hva(hva)) {
r = -EFAULT;
break;
}
/* decrement only if we actually flipped the bit to 0 */
if (!peek && test_and_clear_bit(cur, s->pgste_bitmap))
atomic64_dec(&s->dirty_pages);
r = get_pgste(kvm->mm, hva, &pgstev);
if (r < 0)
pgstev = 0;
/* save the value */
res[i++] = (pgstev >> 24) & 0x3;
/*
* if the next bit is too far away, stop.
* if we reached the previous "next", find the next one
*/
if (!peek) {
if (next > cur + KVM_S390_MAX_BIT_DISTANCE)
break;
if (cur == next)
next = find_next_bit(s->pgste_bitmap,
s->bitmap_size, cur + 1);
/* reached the end of the bitmap or of the buffer, stop */
if ((next >= s->bitmap_size) ||
(next >= args->start_gfn + bufsize))
break;
}
cur++;
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
up_read(&kvm->mm->mmap_sem);
args->count = i;
args->remaining = s ? atomic64_read(&s->dirty_pages) : 0;
rr = copy_to_user((void __user *)args->values, res, args->count);
if (rr)
r = -EFAULT;
vfree(res);
return r;
}
/*
* This function sets the CMMA attributes for the given pages. If the input
* buffer has zero length, no action is taken, otherwise the attributes are
* set and the mm->context.use_cmma flag is set.
*/
static int kvm_s390_set_cmma_bits(struct kvm *kvm,
const struct kvm_s390_cmma_log *args)
{
unsigned long hva, mask, pgstev, i;
uint8_t *bits;
int srcu_idx, r = 0;
mask = args->mask;
if (!kvm->arch.use_cmma)
return -ENXIO;
/* invalid/unsupported flags */
if (args->flags != 0)
return -EINVAL;
/* Enforce sane limit on memory allocation */
if (args->count > KVM_S390_CMMA_SIZE_MAX)
return -EINVAL;
/* Nothing to do */
if (args->count == 0)
return 0;
bits = vmalloc(sizeof(*bits) * args->count);
if (!bits)
return -ENOMEM;
r = copy_from_user(bits, (void __user *)args->values, args->count);
if (r) {
r = -EFAULT;
goto out;
}
down_read(&kvm->mm->mmap_sem);
srcu_idx = srcu_read_lock(&kvm->srcu);
for (i = 0; i < args->count; i++) {
hva = gfn_to_hva(kvm, args->start_gfn + i);
if (kvm_is_error_hva(hva)) {
r = -EFAULT;
break;
}
pgstev = bits[i];
pgstev = pgstev << 24;
mask &= _PGSTE_GPS_USAGE_MASK;
set_pgste_bits(kvm->mm, hva, mask, pgstev);
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
up_read(&kvm->mm->mmap_sem);
if (!kvm->mm->context.use_cmma) {
down_write(&kvm->mm->mmap_sem);
kvm->mm->context.use_cmma = 1;
up_write(&kvm->mm->mmap_sem);
}
out:
vfree(bits);
return r;
}
long kvm_arch_vm_ioctl(struct file *filp, long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg) unsigned int ioctl, unsigned long arg)
{ {
...@@ -1363,6 +1673,29 @@ long kvm_arch_vm_ioctl(struct file *filp, ...@@ -1363,6 +1673,29 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_s390_set_skeys(kvm, &args); r = kvm_s390_set_skeys(kvm, &args);
break; break;
} }
case KVM_S390_GET_CMMA_BITS: {
struct kvm_s390_cmma_log args;
r = -EFAULT;
if (copy_from_user(&args, argp, sizeof(args)))
break;
r = kvm_s390_get_cmma_bits(kvm, &args);
if (!r) {
r = copy_to_user(argp, &args, sizeof(args));
if (r)
r = -EFAULT;
}
break;
}
case KVM_S390_SET_CMMA_BITS: {
struct kvm_s390_cmma_log args;
r = -EFAULT;
if (copy_from_user(&args, argp, sizeof(args)))
break;
r = kvm_s390_set_cmma_bits(kvm, &args);
break;
}
default: default:
r = -ENOTTY; r = -ENOTTY;
} }
...@@ -1631,6 +1964,10 @@ void kvm_arch_destroy_vm(struct kvm *kvm) ...@@ -1631,6 +1964,10 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_s390_destroy_adapters(kvm); kvm_s390_destroy_adapters(kvm);
kvm_s390_clear_float_irqs(kvm); kvm_s390_clear_float_irqs(kvm);
kvm_s390_vsie_destroy(kvm); kvm_s390_vsie_destroy(kvm);
if (kvm->arch.migration_state) {
vfree(kvm->arch.migration_state->pgste_bitmap);
kfree(kvm->arch.migration_state);
}
KVM_EVENT(3, "vm 0x%pK destroyed", kvm); KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
} }
...@@ -1975,7 +2312,6 @@ int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu) ...@@ -1975,7 +2312,6 @@ int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu)
if (!vcpu->arch.sie_block->cbrlo) if (!vcpu->arch.sie_block->cbrlo)
return -ENOMEM; return -ENOMEM;
vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
vcpu->arch.sie_block->ecb2 &= ~ECB2_PFMFI; vcpu->arch.sie_block->ecb2 &= ~ECB2_PFMFI;
return 0; return 0;
} }
...@@ -2439,7 +2775,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) ...@@ -2439,7 +2775,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
{ {
retry: retry:
kvm_s390_vcpu_request_handled(vcpu); kvm_s390_vcpu_request_handled(vcpu);
if (!vcpu->requests) if (!kvm_request_pending(vcpu))
return 0; return 0;
/* /*
* We use MMU_RELOAD just to re-arm the ipte notifier for the * We use MMU_RELOAD just to re-arm the ipte notifier for the
...@@ -2488,6 +2824,27 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) ...@@ -2488,6 +2824,27 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
goto retry; goto retry;
} }
if (kvm_check_request(KVM_REQ_START_MIGRATION, vcpu)) {
/*
* Disable CMMA virtualization; we will emulate the ESSA
* instruction manually, in order to provide additional
* functionalities needed for live migration.
*/
vcpu->arch.sie_block->ecb2 &= ~ECB2_CMMA;
goto retry;
}
if (kvm_check_request(KVM_REQ_STOP_MIGRATION, vcpu)) {
/*
* Re-enable CMMA virtualization if CMMA is available and
* was used.
*/
if ((vcpu->kvm->arch.use_cmma) &&
(vcpu->kvm->mm->context.use_cmma))
vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
goto retry;
}
/* nothing to do, just clear the request */ /* nothing to do, just clear the request */
kvm_clear_request(KVM_REQ_UNHALT, vcpu); kvm_clear_request(KVM_REQ_UNHALT, vcpu);
...@@ -2682,6 +3039,9 @@ static int vcpu_post_run_fault_in_sie(struct kvm_vcpu *vcpu) ...@@ -2682,6 +3039,9 @@ static int vcpu_post_run_fault_in_sie(struct kvm_vcpu *vcpu)
static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason) static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
{ {
struct mcck_volatile_info *mcck_info;
struct sie_page *sie_page;
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d", VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
vcpu->arch.sie_block->icptcode); vcpu->arch.sie_block->icptcode);
trace_kvm_s390_sie_exit(vcpu, vcpu->arch.sie_block->icptcode); trace_kvm_s390_sie_exit(vcpu, vcpu->arch.sie_block->icptcode);
...@@ -2692,6 +3052,15 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason) ...@@ -2692,6 +3052,15 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14; vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15; vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
if (exit_reason == -EINTR) {
VCPU_EVENT(vcpu, 3, "%s", "machine check");
sie_page = container_of(vcpu->arch.sie_block,
struct sie_page, sie_block);
mcck_info = &sie_page->mcck_info;
kvm_s390_reinject_machine_check(vcpu, mcck_info);
return 0;
}
if (vcpu->arch.sie_block->icptcode > 0) { if (vcpu->arch.sie_block->icptcode > 0) {
int rc = kvm_handle_sie_intercept(vcpu); int rc = kvm_handle_sie_intercept(vcpu);
......
...@@ -397,4 +397,6 @@ static inline int kvm_s390_use_sca_entries(void) ...@@ -397,4 +397,6 @@ static inline int kvm_s390_use_sca_entries(void)
*/ */
return sclp.has_sigpif; return sclp.has_sigpif;
} }
void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
struct mcck_volatile_info *mcck_info);
#endif #endif
...@@ -24,6 +24,7 @@ ...@@ -24,6 +24,7 @@
#include <asm/ebcdic.h> #include <asm/ebcdic.h>
#include <asm/sysinfo.h> #include <asm/sysinfo.h>
#include <asm/pgtable.h> #include <asm/pgtable.h>
#include <asm/page-states.h>
#include <asm/pgalloc.h> #include <asm/pgalloc.h>
#include <asm/gmap.h> #include <asm/gmap.h>
#include <asm/io.h> #include <asm/io.h>
...@@ -949,13 +950,72 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) ...@@ -949,13 +950,72 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
return 0; return 0;
} }
static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
{
struct kvm_s390_migration_state *ms = vcpu->kvm->arch.migration_state;
int r1, r2, nappended, entries;
unsigned long gfn, hva, res, pgstev, ptev;
unsigned long *cbrlo;
/*
* We don't need to set SD.FPF.SK to 1 here, because if we have a
* machine check here we either handle it or crash
*/
kvm_s390_get_regs_rre(vcpu, &r1, &r2);
gfn = vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT;
hva = gfn_to_hva(vcpu->kvm, gfn);
entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3;
if (kvm_is_error_hva(hva))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
nappended = pgste_perform_essa(vcpu->kvm->mm, hva, orc, &ptev, &pgstev);
if (nappended < 0) {
res = orc ? 0x10 : 0;
vcpu->run->s.regs.gprs[r1] = res; /* Exception Indication */
return 0;
}
res = (pgstev & _PGSTE_GPS_USAGE_MASK) >> 22;
/*
* Set the block-content state part of the result. 0 means resident, so
* nothing to do if the page is valid. 2 is for preserved pages
* (non-present and non-zero), and 3 for zero pages (non-present and
* zero).
*/
if (ptev & _PAGE_INVALID) {
res |= 2;
if (pgstev & _PGSTE_GPS_ZERO)
res |= 1;
}
vcpu->run->s.regs.gprs[r1] = res;
/*
* It is possible that all the normal 511 slots were full, in which case
* we will now write in the 512th slot, which is reserved for host use.
* In both cases we let the normal essa handling code process all the
* slots, including the reserved one, if needed.
*/
if (nappended > 0) {
cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo & PAGE_MASK);
cbrlo[entries] = gfn << PAGE_SHIFT;
}
if (orc) {
/* increment only if we are really flipping the bit to 1 */
if (!test_and_set_bit(gfn, ms->pgste_bitmap))
atomic64_inc(&ms->dirty_pages);
}
return nappended;
}
static int handle_essa(struct kvm_vcpu *vcpu) static int handle_essa(struct kvm_vcpu *vcpu)
{ {
/* entries expected to be 1FF */ /* entries expected to be 1FF */
int entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3; int entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3;
unsigned long *cbrlo; unsigned long *cbrlo;
struct gmap *gmap; struct gmap *gmap;
int i; int i, orc;
VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries); VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries);
gmap = vcpu->arch.gmap; gmap = vcpu->arch.gmap;
...@@ -965,12 +1025,45 @@ static int handle_essa(struct kvm_vcpu *vcpu) ...@@ -965,12 +1025,45 @@ static int handle_essa(struct kvm_vcpu *vcpu)
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
/* Check for invalid operation request code */
if (((vcpu->arch.sie_block->ipb & 0xf0000000) >> 28) > 6) orc = (vcpu->arch.sie_block->ipb & 0xf0000000) >> 28;
if (orc > ESSA_MAX)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION); return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
/* Retry the ESSA instruction */ if (likely(!vcpu->kvm->arch.migration_state)) {
kvm_s390_retry_instr(vcpu); /*
* CMMA is enabled in the KVM settings, but is disabled in
* the SIE block and in the mm_context, and we are not doing
* a migration. Enable CMMA in the mm_context.
* Since we need to take a write lock to write to the context
* to avoid races with storage keys handling, we check if the
* value really needs to be written to; if the value is
* already correct, we do nothing and avoid the lock.
*/
if (vcpu->kvm->mm->context.use_cmma == 0) {
down_write(&vcpu->kvm->mm->mmap_sem);
vcpu->kvm->mm->context.use_cmma = 1;
up_write(&vcpu->kvm->mm->mmap_sem);
}
/*
* If we are here, we are supposed to have CMMA enabled in
* the SIE block. Enabling CMMA works on a per-CPU basis,
* while the context use_cmma flag is per process.
* It's possible that the context flag is enabled and the
* SIE flag is not, so we set the flag always; if it was
* already set, nothing changes, otherwise we enable it
* on this CPU too.
*/
vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
/* Retry the ESSA instruction */
kvm_s390_retry_instr(vcpu);
} else {
/* Account for the possible extra cbrl entry */
i = do_essa(vcpu, orc);
if (i < 0)
return i;
entries += i;
}
vcpu->arch.sie_block->cbrlo &= PAGE_MASK; /* reset nceo */ vcpu->arch.sie_block->cbrlo &= PAGE_MASK; /* reset nceo */
cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo); cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo);
down_read(&gmap->mm->mmap_sem); down_read(&gmap->mm->mmap_sem);
......
...@@ -26,16 +26,21 @@ ...@@ -26,16 +26,21 @@
struct vsie_page { struct vsie_page {
struct kvm_s390_sie_block scb_s; /* 0x0000 */ struct kvm_s390_sie_block scb_s; /* 0x0000 */
/*
* the backup info for machine check. ensure it's at
* the same offset as that in struct sie_page!
*/
struct mcck_volatile_info mcck_info; /* 0x0200 */
/* the pinned originial scb */ /* the pinned originial scb */
struct kvm_s390_sie_block *scb_o; /* 0x0200 */ struct kvm_s390_sie_block *scb_o; /* 0x0218 */
/* the shadow gmap in use by the vsie_page */ /* the shadow gmap in use by the vsie_page */
struct gmap *gmap; /* 0x0208 */ struct gmap *gmap; /* 0x0220 */
/* address of the last reported fault to guest2 */ /* address of the last reported fault to guest2 */
unsigned long fault_addr; /* 0x0210 */ unsigned long fault_addr; /* 0x0228 */
__u8 reserved[0x0700 - 0x0218]; /* 0x0218 */ __u8 reserved[0x0700 - 0x0230]; /* 0x0230 */
struct kvm_s390_crypto_cb crycb; /* 0x0700 */ struct kvm_s390_crypto_cb crycb; /* 0x0700 */
__u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */ __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */
} __packed; };
/* trigger a validity icpt for the given scb */ /* trigger a validity icpt for the given scb */
static int set_validity_icpt(struct kvm_s390_sie_block *scb, static int set_validity_icpt(struct kvm_s390_sie_block *scb,
...@@ -801,6 +806,8 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) ...@@ -801,6 +806,8 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
{ {
struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s; struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
struct kvm_s390_sie_block *scb_o = vsie_page->scb_o; struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
struct mcck_volatile_info *mcck_info;
struct sie_page *sie_page;
int rc; int rc;
handle_last_fault(vcpu, vsie_page); handle_last_fault(vcpu, vsie_page);
...@@ -822,6 +829,14 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) ...@@ -822,6 +829,14 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
local_irq_enable(); local_irq_enable();
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
if (rc == -EINTR) {
VCPU_EVENT(vcpu, 3, "%s", "machine check");
sie_page = container_of(scb_s, struct sie_page, sie_block);
mcck_info = &sie_page->mcck_info;
kvm_s390_reinject_machine_check(vcpu, mcck_info);
return 0;
}
if (rc > 0) if (rc > 0)
rc = 0; /* we could still have an icpt */ rc = 0; /* we could still have an icpt */
else if (rc == -EFAULT) else if (rc == -EFAULT)
......
...@@ -48,28 +48,31 @@ ...@@ -48,28 +48,31 @@
#define KVM_IRQCHIP_NUM_PINS KVM_IOAPIC_NUM_PINS #define KVM_IRQCHIP_NUM_PINS KVM_IOAPIC_NUM_PINS
/* x86-specific vcpu->requests bit members */ /* x86-specific vcpu->requests bit members */
#define KVM_REQ_MIGRATE_TIMER 8 #define KVM_REQ_MIGRATE_TIMER KVM_ARCH_REQ(0)
#define KVM_REQ_REPORT_TPR_ACCESS 9 #define KVM_REQ_REPORT_TPR_ACCESS KVM_ARCH_REQ(1)
#define KVM_REQ_TRIPLE_FAULT 10 #define KVM_REQ_TRIPLE_FAULT KVM_ARCH_REQ(2)
#define KVM_REQ_MMU_SYNC 11 #define KVM_REQ_MMU_SYNC KVM_ARCH_REQ(3)
#define KVM_REQ_CLOCK_UPDATE 12 #define KVM_REQ_CLOCK_UPDATE KVM_ARCH_REQ(4)
#define KVM_REQ_EVENT 14 #define KVM_REQ_EVENT KVM_ARCH_REQ(6)
#define KVM_REQ_APF_HALT 15 #define KVM_REQ_APF_HALT KVM_ARCH_REQ(7)
#define KVM_REQ_STEAL_UPDATE 16 #define KVM_REQ_STEAL_UPDATE KVM_ARCH_REQ(8)
#define KVM_REQ_NMI 17 #define KVM_REQ_NMI KVM_ARCH_REQ(9)
#define KVM_REQ_PMU 18 #define KVM_REQ_PMU KVM_ARCH_REQ(10)
#define KVM_REQ_PMI 19 #define KVM_REQ_PMI KVM_ARCH_REQ(11)
#define KVM_REQ_SMI 20 #define KVM_REQ_SMI KVM_ARCH_REQ(12)
#define KVM_REQ_MASTERCLOCK_UPDATE 21 #define KVM_REQ_MASTERCLOCK_UPDATE KVM_ARCH_REQ(13)
#define KVM_REQ_MCLOCK_INPROGRESS (22 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_MCLOCK_INPROGRESS \
#define KVM_REQ_SCAN_IOAPIC (23 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) KVM_ARCH_REQ_FLAGS(14, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_GLOBAL_CLOCK_UPDATE 24 #define KVM_REQ_SCAN_IOAPIC \
#define KVM_REQ_APIC_PAGE_RELOAD (25 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) KVM_ARCH_REQ_FLAGS(15, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_CRASH 26 #define KVM_REQ_GLOBAL_CLOCK_UPDATE KVM_ARCH_REQ(16)
#define KVM_REQ_IOAPIC_EOI_EXIT 27 #define KVM_REQ_APIC_PAGE_RELOAD \
#define KVM_REQ_HV_RESET 28 KVM_ARCH_REQ_FLAGS(17, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_EXIT 29 #define KVM_REQ_HV_CRASH KVM_ARCH_REQ(18)
#define KVM_REQ_HV_STIMER 30 #define KVM_REQ_IOAPIC_EOI_EXIT KVM_ARCH_REQ(19)
#define KVM_REQ_HV_RESET KVM_ARCH_REQ(20)
#define KVM_REQ_HV_EXIT KVM_ARCH_REQ(21)
#define KVM_REQ_HV_STIMER KVM_ARCH_REQ(22)
#define CR0_RESERVED_BITS \ #define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
...@@ -254,7 +257,8 @@ union kvm_mmu_page_role { ...@@ -254,7 +257,8 @@ union kvm_mmu_page_role {
unsigned cr0_wp:1; unsigned cr0_wp:1;
unsigned smep_andnot_wp:1; unsigned smep_andnot_wp:1;
unsigned smap_andnot_wp:1; unsigned smap_andnot_wp:1;
unsigned :8; unsigned ad_disabled:1;
unsigned :7;
/* /*
* This is left at the top of the word so that * This is left at the top of the word so that
......
...@@ -426,6 +426,8 @@ ...@@ -426,6 +426,8 @@
#define MSR_IA32_TSC_ADJUST 0x0000003b #define MSR_IA32_TSC_ADJUST 0x0000003b
#define MSR_IA32_BNDCFGS 0x00000d90 #define MSR_IA32_BNDCFGS 0x00000d90
#define MSR_IA32_BNDCFGS_RSVD 0x00000ffc
#define MSR_IA32_XSS 0x00000da0 #define MSR_IA32_XSS 0x00000da0
#define FEATURE_CONTROL_LOCKED (1<<0) #define FEATURE_CONTROL_LOCKED (1<<0)
......
...@@ -144,6 +144,14 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu) ...@@ -144,6 +144,14 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
return best && (best->ebx & bit(X86_FEATURE_RTM)); return best && (best->ebx & bit(X86_FEATURE_RTM));
} }
static inline bool guest_cpuid_has_mpx(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_MPX));
}
static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu) static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
{ {
struct kvm_cpuid_entry2 *best; struct kvm_cpuid_entry2 *best;
......
...@@ -900,7 +900,7 @@ static __always_inline int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, ...@@ -900,7 +900,7 @@ static __always_inline int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt,
if (rc != X86EMUL_CONTINUE) \ if (rc != X86EMUL_CONTINUE) \
goto done; \ goto done; \
ctxt->_eip += sizeof(_type); \ ctxt->_eip += sizeof(_type); \
_x = *(_type __aligned(1) *) ctxt->fetch.ptr; \ memcpy(&_x, ctxt->fetch.ptr, sizeof(_type)); \
ctxt->fetch.ptr += sizeof(_type); \ ctxt->fetch.ptr += sizeof(_type); \
_x; \ _x; \
}) })
...@@ -3941,6 +3941,25 @@ static int check_fxsr(struct x86_emulate_ctxt *ctxt) ...@@ -3941,6 +3941,25 @@ static int check_fxsr(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE; return X86EMUL_CONTINUE;
} }
/*
* Hardware doesn't save and restore XMM 0-7 without CR4.OSFXSR, but does save
* and restore MXCSR.
*/
static size_t __fxstate_size(int nregs)
{
return offsetof(struct fxregs_state, xmm_space[0]) + nregs * 16;
}
static inline size_t fxstate_size(struct x86_emulate_ctxt *ctxt)
{
bool cr4_osfxsr;
if (ctxt->mode == X86EMUL_MODE_PROT64)
return __fxstate_size(16);
cr4_osfxsr = ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR;
return __fxstate_size(cr4_osfxsr ? 8 : 0);
}
/* /*
* FXSAVE and FXRSTOR have 4 different formats depending on execution mode, * FXSAVE and FXRSTOR have 4 different formats depending on execution mode,
* 1) 16 bit mode * 1) 16 bit mode
...@@ -3962,7 +3981,6 @@ static int check_fxsr(struct x86_emulate_ctxt *ctxt) ...@@ -3962,7 +3981,6 @@ static int check_fxsr(struct x86_emulate_ctxt *ctxt)
static int em_fxsave(struct x86_emulate_ctxt *ctxt) static int em_fxsave(struct x86_emulate_ctxt *ctxt)
{ {
struct fxregs_state fx_state; struct fxregs_state fx_state;
size_t size;
int rc; int rc;
rc = check_fxsr(ctxt); rc = check_fxsr(ctxt);
...@@ -3978,68 +3996,42 @@ static int em_fxsave(struct x86_emulate_ctxt *ctxt) ...@@ -3978,68 +3996,42 @@ static int em_fxsave(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE) if (rc != X86EMUL_CONTINUE)
return rc; return rc;
if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR) return segmented_write_std(ctxt, ctxt->memop.addr.mem, &fx_state,
size = offsetof(struct fxregs_state, xmm_space[8 * 16/4]); fxstate_size(ctxt));
else
size = offsetof(struct fxregs_state, xmm_space[0]);
return segmented_write_std(ctxt, ctxt->memop.addr.mem, &fx_state, size);
}
static int fxrstor_fixup(struct x86_emulate_ctxt *ctxt,
struct fxregs_state *new)
{
int rc = X86EMUL_CONTINUE;
struct fxregs_state old;
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(old));
if (rc != X86EMUL_CONTINUE)
return rc;
/*
* 64 bit host will restore XMM 8-15, which is not correct on non-64
* bit guests. Load the current values in order to preserve 64 bit
* XMMs after fxrstor.
*/
#ifdef CONFIG_X86_64
/* XXX: accessing XMM 8-15 very awkwardly */
memcpy(&new->xmm_space[8 * 16/4], &old.xmm_space[8 * 16/4], 8 * 16);
#endif
/*
* Hardware doesn't save and restore XMM 0-7 without CR4.OSFXSR, but
* does save and restore MXCSR.
*/
if (!(ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))
memcpy(new->xmm_space, old.xmm_space, 8 * 16);
return rc;
} }
static int em_fxrstor(struct x86_emulate_ctxt *ctxt) static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
{ {
struct fxregs_state fx_state; struct fxregs_state fx_state;
int rc; int rc;
size_t size;
rc = check_fxsr(ctxt); rc = check_fxsr(ctxt);
if (rc != X86EMUL_CONTINUE) if (rc != X86EMUL_CONTINUE)
return rc; return rc;
rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, &fx_state, 512); ctxt->ops->get_fpu(ctxt);
if (rc != X86EMUL_CONTINUE)
return rc;
if (fx_state.mxcsr >> 16) size = fxstate_size(ctxt);
return emulate_gp(ctxt, 0); if (size < __fxstate_size(16)) {
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
if (rc != X86EMUL_CONTINUE)
goto out;
}
ctxt->ops->get_fpu(ctxt); rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, &fx_state, size);
if (rc != X86EMUL_CONTINUE)
goto out;
if (ctxt->mode < X86EMUL_MODE_PROT64) if (fx_state.mxcsr >> 16) {
rc = fxrstor_fixup(ctxt, &fx_state); rc = emulate_gp(ctxt, 0);
goto out;
}
if (rc == X86EMUL_CONTINUE) if (rc == X86EMUL_CONTINUE)
rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state)); rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state));
out:
ctxt->ops->put_fpu(ctxt); ctxt->ops->put_fpu(ctxt);
return rc; return rc;
......
...@@ -1495,6 +1495,7 @@ EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use); ...@@ -1495,6 +1495,7 @@ EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
static void cancel_hv_timer(struct kvm_lapic *apic) static void cancel_hv_timer(struct kvm_lapic *apic)
{ {
WARN_ON(!apic->lapic_timer.hv_timer_in_use);
preempt_disable(); preempt_disable();
kvm_x86_ops->cancel_hv_timer(apic->vcpu); kvm_x86_ops->cancel_hv_timer(apic->vcpu);
apic->lapic_timer.hv_timer_in_use = false; apic->lapic_timer.hv_timer_in_use = false;
...@@ -1503,25 +1504,56 @@ static void cancel_hv_timer(struct kvm_lapic *apic) ...@@ -1503,25 +1504,56 @@ static void cancel_hv_timer(struct kvm_lapic *apic)
static bool start_hv_timer(struct kvm_lapic *apic) static bool start_hv_timer(struct kvm_lapic *apic)
{ {
u64 tscdeadline = apic->lapic_timer.tscdeadline; struct kvm_timer *ktimer = &apic->lapic_timer;
int r;
if ((atomic_read(&apic->lapic_timer.pending) && if (!kvm_x86_ops->set_hv_timer)
!apic_lvtt_period(apic)) || return false;
kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
if (apic->lapic_timer.hv_timer_in_use) if (!apic_lvtt_period(apic) && atomic_read(&ktimer->pending))
cancel_hv_timer(apic); return false;
} else {
apic->lapic_timer.hv_timer_in_use = true;
hrtimer_cancel(&apic->lapic_timer.timer);
/* In case the sw timer triggered in the window */ r = kvm_x86_ops->set_hv_timer(apic->vcpu, ktimer->tscdeadline);
if (atomic_read(&apic->lapic_timer.pending) && if (r < 0)
!apic_lvtt_period(apic)) return false;
cancel_hv_timer(apic);
ktimer->hv_timer_in_use = true;
hrtimer_cancel(&ktimer->timer);
/*
* Also recheck ktimer->pending, in case the sw timer triggered in
* the window. For periodic timer, leave the hv timer running for
* simplicity, and the deadline will be recomputed on the next vmexit.
*/
if (!apic_lvtt_period(apic) && (r || atomic_read(&ktimer->pending))) {
if (r)
apic_timer_expired(apic);
return false;
} }
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
apic->lapic_timer.hv_timer_in_use); trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, true);
return apic->lapic_timer.hv_timer_in_use; return true;
}
static void start_sw_timer(struct kvm_lapic *apic)
{
struct kvm_timer *ktimer = &apic->lapic_timer;
if (apic->lapic_timer.hv_timer_in_use)
cancel_hv_timer(apic);
if (!apic_lvtt_period(apic) && atomic_read(&ktimer->pending))
return;
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
start_sw_period(apic);
else if (apic_lvtt_tscdeadline(apic))
start_sw_tscdeadline(apic);
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, false);
}
static void restart_apic_timer(struct kvm_lapic *apic)
{
if (!start_hv_timer(apic))
start_sw_timer(apic);
} }
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu) void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
...@@ -1535,19 +1567,14 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu) ...@@ -1535,19 +1567,14 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
if (apic_lvtt_period(apic) && apic->lapic_timer.period) { if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
advance_periodic_target_expiration(apic); advance_periodic_target_expiration(apic);
if (!start_hv_timer(apic)) restart_apic_timer(apic);
start_sw_period(apic);
} }
} }
EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer); EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu) void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
{ {
struct kvm_lapic *apic = vcpu->arch.apic; restart_apic_timer(vcpu->arch.apic);
WARN_ON(apic->lapic_timer.hv_timer_in_use);
start_hv_timer(apic);
} }
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer); EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
...@@ -1556,33 +1583,28 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu) ...@@ -1556,33 +1583,28 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic; struct kvm_lapic *apic = vcpu->arch.apic;
/* Possibly the TSC deadline timer is not enabled yet */ /* Possibly the TSC deadline timer is not enabled yet */
if (!apic->lapic_timer.hv_timer_in_use) if (apic->lapic_timer.hv_timer_in_use)
return; start_sw_timer(apic);
}
cancel_hv_timer(apic); EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
if (atomic_read(&apic->lapic_timer.pending)) void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
return; {
struct kvm_lapic *apic = vcpu->arch.apic;
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) WARN_ON(!apic->lapic_timer.hv_timer_in_use);
start_sw_period(apic); restart_apic_timer(apic);
else if (apic_lvtt_tscdeadline(apic))
start_sw_tscdeadline(apic);
} }
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
static void start_apic_timer(struct kvm_lapic *apic) static void start_apic_timer(struct kvm_lapic *apic)
{ {
atomic_set(&apic->lapic_timer.pending, 0); atomic_set(&apic->lapic_timer.pending, 0);
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) { if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
if (set_target_expiration(apic) && && !set_target_expiration(apic))
!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic))) return;
start_sw_period(apic);
} else if (apic_lvtt_tscdeadline(apic)) { restart_apic_timer(apic);
if (!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
start_sw_tscdeadline(apic);
}
} }
static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val) static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
...@@ -1813,16 +1835,6 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu) ...@@ -1813,16 +1835,6 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu)
* LAPIC interface * LAPIC interface
*---------------------------------------------------------------------- *----------------------------------------------------------------------
*/ */
u64 kvm_get_lapic_target_expiration_tsc(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
if (!lapic_in_kernel(vcpu))
return 0;
return apic->lapic_timer.tscdeadline;
}
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu) u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
{ {
struct kvm_lapic *apic = vcpu->arch.apic; struct kvm_lapic *apic = vcpu->arch.apic;
......
...@@ -87,7 +87,6 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); ...@@ -87,7 +87,6 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
u64 kvm_get_lapic_target_expiration_tsc(struct kvm_vcpu *vcpu);
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data); void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
...@@ -216,4 +215,5 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu); ...@@ -216,4 +215,5 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu);
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu); void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu);
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu); void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu);
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu); bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu);
void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu);
#endif #endif
...@@ -183,13 +183,13 @@ static u64 __read_mostly shadow_user_mask; ...@@ -183,13 +183,13 @@ static u64 __read_mostly shadow_user_mask;
static u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_accessed_mask;
static u64 __read_mostly shadow_dirty_mask; static u64 __read_mostly shadow_dirty_mask;
static u64 __read_mostly shadow_mmio_mask; static u64 __read_mostly shadow_mmio_mask;
static u64 __read_mostly shadow_mmio_value;
static u64 __read_mostly shadow_present_mask; static u64 __read_mostly shadow_present_mask;
/* /*
* The mask/value to distinguish a PTE that has been marked not-present for * SPTEs used by MMUs without A/D bits are marked with shadow_acc_track_value.
* access tracking purposes. * Non-present SPTEs with shadow_acc_track_value set are in place for access
* The mask would be either 0 if access tracking is disabled, or * tracking.
* SPTE_SPECIAL_MASK|VMX_EPT_RWX_MASK if access tracking is enabled.
*/ */
static u64 __read_mostly shadow_acc_track_mask; static u64 __read_mostly shadow_acc_track_mask;
static const u64 shadow_acc_track_value = SPTE_SPECIAL_MASK; static const u64 shadow_acc_track_value = SPTE_SPECIAL_MASK;
...@@ -207,16 +207,40 @@ static const u64 shadow_acc_track_saved_bits_shift = PT64_SECOND_AVAIL_BITS_SHIF ...@@ -207,16 +207,40 @@ static const u64 shadow_acc_track_saved_bits_shift = PT64_SECOND_AVAIL_BITS_SHIF
static void mmu_spte_set(u64 *sptep, u64 spte); static void mmu_spte_set(u64 *sptep, u64 spte);
static void mmu_free_roots(struct kvm_vcpu *vcpu); static void mmu_free_roots(struct kvm_vcpu *vcpu);
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask) void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
{ {
BUG_ON((mmio_mask & mmio_value) != mmio_value);
shadow_mmio_value = mmio_value | SPTE_SPECIAL_MASK;
shadow_mmio_mask = mmio_mask | SPTE_SPECIAL_MASK; shadow_mmio_mask = mmio_mask | SPTE_SPECIAL_MASK;
} }
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
{
return sp->role.ad_disabled;
}
static inline bool spte_ad_enabled(u64 spte)
{
MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
return !(spte & shadow_acc_track_value);
}
static inline u64 spte_shadow_accessed_mask(u64 spte)
{
MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
}
static inline u64 spte_shadow_dirty_mask(u64 spte)
{
MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
}
static inline bool is_access_track_spte(u64 spte) static inline bool is_access_track_spte(u64 spte)
{ {
/* Always false if shadow_acc_track_mask is zero. */ return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
return (spte & shadow_acc_track_mask) == shadow_acc_track_value;
} }
/* /*
...@@ -270,7 +294,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn, ...@@ -270,7 +294,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
u64 mask = generation_mmio_spte_mask(gen); u64 mask = generation_mmio_spte_mask(gen);
access &= ACC_WRITE_MASK | ACC_USER_MASK; access &= ACC_WRITE_MASK | ACC_USER_MASK;
mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT; mask |= shadow_mmio_value | access | gfn << PAGE_SHIFT;
trace_mark_mmio_spte(sptep, gfn, access, gen); trace_mark_mmio_spte(sptep, gfn, access, gen);
mmu_spte_set(sptep, mask); mmu_spte_set(sptep, mask);
...@@ -278,7 +302,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn, ...@@ -278,7 +302,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
static bool is_mmio_spte(u64 spte) static bool is_mmio_spte(u64 spte)
{ {
return (spte & shadow_mmio_mask) == shadow_mmio_mask; return (spte & shadow_mmio_mask) == shadow_mmio_value;
} }
static gfn_t get_mmio_spte_gfn(u64 spte) static gfn_t get_mmio_spte_gfn(u64 spte)
...@@ -315,12 +339,20 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte) ...@@ -315,12 +339,20 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
return likely(kvm_gen == spte_gen); return likely(kvm_gen == spte_gen);
} }
/*
* Sets the shadow PTE masks used by the MMU.
*
* Assumptions:
* - Setting either @accessed_mask or @dirty_mask requires setting both
* - At least one of @accessed_mask or @acc_track_mask must be set
*/
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask, u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
u64 acc_track_mask) u64 acc_track_mask)
{ {
if (acc_track_mask != 0) BUG_ON(!dirty_mask != !accessed_mask);
acc_track_mask |= SPTE_SPECIAL_MASK; BUG_ON(!accessed_mask && !acc_track_mask);
BUG_ON(acc_track_mask & shadow_acc_track_value);
shadow_user_mask = user_mask; shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask; shadow_accessed_mask = accessed_mask;
...@@ -329,7 +361,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, ...@@ -329,7 +361,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
shadow_x_mask = x_mask; shadow_x_mask = x_mask;
shadow_present_mask = p_mask; shadow_present_mask = p_mask;
shadow_acc_track_mask = acc_track_mask; shadow_acc_track_mask = acc_track_mask;
WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
} }
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
...@@ -549,7 +580,7 @@ static bool spte_has_volatile_bits(u64 spte) ...@@ -549,7 +580,7 @@ static bool spte_has_volatile_bits(u64 spte)
is_access_track_spte(spte)) is_access_track_spte(spte))
return true; return true;
if (shadow_accessed_mask) { if (spte_ad_enabled(spte)) {
if ((spte & shadow_accessed_mask) == 0 || if ((spte & shadow_accessed_mask) == 0 ||
(is_writable_pte(spte) && (spte & shadow_dirty_mask) == 0)) (is_writable_pte(spte) && (spte & shadow_dirty_mask) == 0))
return true; return true;
...@@ -560,14 +591,17 @@ static bool spte_has_volatile_bits(u64 spte) ...@@ -560,14 +591,17 @@ static bool spte_has_volatile_bits(u64 spte)
static bool is_accessed_spte(u64 spte) static bool is_accessed_spte(u64 spte)
{ {
return shadow_accessed_mask ? spte & shadow_accessed_mask u64 accessed_mask = spte_shadow_accessed_mask(spte);
: !is_access_track_spte(spte);
return accessed_mask ? spte & accessed_mask
: !is_access_track_spte(spte);
} }
static bool is_dirty_spte(u64 spte) static bool is_dirty_spte(u64 spte)
{ {
return shadow_dirty_mask ? spte & shadow_dirty_mask u64 dirty_mask = spte_shadow_dirty_mask(spte);
: spte & PT_WRITABLE_MASK;
return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
} }
/* Rules for using mmu_spte_set: /* Rules for using mmu_spte_set:
...@@ -707,10 +741,10 @@ static u64 mmu_spte_get_lockless(u64 *sptep) ...@@ -707,10 +741,10 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
static u64 mark_spte_for_access_track(u64 spte) static u64 mark_spte_for_access_track(u64 spte)
{ {
if (shadow_accessed_mask != 0) if (spte_ad_enabled(spte))
return spte & ~shadow_accessed_mask; return spte & ~shadow_accessed_mask;
if (shadow_acc_track_mask == 0 || is_access_track_spte(spte)) if (is_access_track_spte(spte))
return spte; return spte;
/* /*
...@@ -729,7 +763,6 @@ static u64 mark_spte_for_access_track(u64 spte) ...@@ -729,7 +763,6 @@ static u64 mark_spte_for_access_track(u64 spte)
spte |= (spte & shadow_acc_track_saved_bits_mask) << spte |= (spte & shadow_acc_track_saved_bits_mask) <<
shadow_acc_track_saved_bits_shift; shadow_acc_track_saved_bits_shift;
spte &= ~shadow_acc_track_mask; spte &= ~shadow_acc_track_mask;
spte |= shadow_acc_track_value;
return spte; return spte;
} }
...@@ -741,6 +774,7 @@ static u64 restore_acc_track_spte(u64 spte) ...@@ -741,6 +774,7 @@ static u64 restore_acc_track_spte(u64 spte)
u64 saved_bits = (spte >> shadow_acc_track_saved_bits_shift) u64 saved_bits = (spte >> shadow_acc_track_saved_bits_shift)
& shadow_acc_track_saved_bits_mask; & shadow_acc_track_saved_bits_mask;
WARN_ON_ONCE(spte_ad_enabled(spte));
WARN_ON_ONCE(!is_access_track_spte(spte)); WARN_ON_ONCE(!is_access_track_spte(spte));
new_spte &= ~shadow_acc_track_mask; new_spte &= ~shadow_acc_track_mask;
...@@ -759,7 +793,7 @@ static bool mmu_spte_age(u64 *sptep) ...@@ -759,7 +793,7 @@ static bool mmu_spte_age(u64 *sptep)
if (!is_accessed_spte(spte)) if (!is_accessed_spte(spte))
return false; return false;
if (shadow_accessed_mask) { if (spte_ad_enabled(spte)) {
clear_bit((ffs(shadow_accessed_mask) - 1), clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep); (unsigned long *)sptep);
} else { } else {
...@@ -1390,6 +1424,22 @@ static bool spte_clear_dirty(u64 *sptep) ...@@ -1390,6 +1424,22 @@ static bool spte_clear_dirty(u64 *sptep)
return mmu_spte_update(sptep, spte); return mmu_spte_update(sptep, spte);
} }
static bool wrprot_ad_disabled_spte(u64 *sptep)
{
bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
(unsigned long *)sptep);
if (was_writable)
kvm_set_pfn_dirty(spte_to_pfn(*sptep));
return was_writable;
}
/*
* Gets the GFN ready for another round of dirty logging by clearing the
* - D bit on ad-enabled SPTEs, and
* - W bit on ad-disabled SPTEs.
* Returns true iff any D or W bits were cleared.
*/
static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
{ {
u64 *sptep; u64 *sptep;
...@@ -1397,7 +1447,10 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) ...@@ -1397,7 +1447,10 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
bool flush = false; bool flush = false;
for_each_rmap_spte(rmap_head, &iter, sptep) for_each_rmap_spte(rmap_head, &iter, sptep)
flush |= spte_clear_dirty(sptep); if (spte_ad_enabled(*sptep))
flush |= spte_clear_dirty(sptep);
else
flush |= wrprot_ad_disabled_spte(sptep);
return flush; return flush;
} }
...@@ -1420,7 +1473,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) ...@@ -1420,7 +1473,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
bool flush = false; bool flush = false;
for_each_rmap_spte(rmap_head, &iter, sptep) for_each_rmap_spte(rmap_head, &iter, sptep)
flush |= spte_set_dirty(sptep); if (spte_ad_enabled(*sptep))
flush |= spte_set_dirty(sptep);
return flush; return flush;
} }
...@@ -1452,7 +1506,8 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, ...@@ -1452,7 +1506,8 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
} }
/** /**
* kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages, or write
* protect the page if the D-bit isn't supported.
* @kvm: kvm instance * @kvm: kvm instance
* @slot: slot to clear D-bit * @slot: slot to clear D-bit
* @gfn_offset: start of the BITS_PER_LONG pages we care about * @gfn_offset: start of the BITS_PER_LONG pages we care about
...@@ -1766,18 +1821,9 @@ static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, ...@@ -1766,18 +1821,9 @@ static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
u64 *sptep; u64 *sptep;
struct rmap_iterator iter; struct rmap_iterator iter;
/*
* If there's no access bit in the secondary pte set by the hardware and
* fast access tracking is also not enabled, it's up to gup-fast/gup to
* set the access bit in the primary pte or in the page structure.
*/
if (!shadow_accessed_mask && !shadow_acc_track_mask)
goto out;
for_each_rmap_spte(rmap_head, &iter, sptep) for_each_rmap_spte(rmap_head, &iter, sptep)
if (is_accessed_spte(*sptep)) if (is_accessed_spte(*sptep))
return 1; return 1;
out:
return 0; return 0;
} }
...@@ -1798,18 +1844,6 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) ...@@ -1798,18 +1844,6 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
{ {
/*
* In case of absence of EPT Access and Dirty Bits supports,
* emulate the accessed bit for EPT, by checking if this page has
* an EPT mapping, and clearing it if it does. On the next access,
* a new EPT mapping will be established.
* This has some overhead, but not as much as the cost of swapping
* out actively used pages or breaking up actively used hugepages.
*/
if (!shadow_accessed_mask && !shadow_acc_track_mask)
return kvm_handle_hva_range(kvm, start, end, 0,
kvm_unmap_rmapp);
return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp); return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
} }
...@@ -2398,7 +2432,12 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, ...@@ -2398,7 +2432,12 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK | spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
shadow_user_mask | shadow_x_mask | shadow_accessed_mask; shadow_user_mask | shadow_x_mask;
if (sp_ad_disabled(sp))
spte |= shadow_acc_track_value;
else
spte |= shadow_accessed_mask;
mmu_spte_set(sptep, spte); mmu_spte_set(sptep, spte);
...@@ -2666,10 +2705,15 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, ...@@ -2666,10 +2705,15 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
{ {
u64 spte = 0; u64 spte = 0;
int ret = 0; int ret = 0;
struct kvm_mmu_page *sp;
if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access)) if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
return 0; return 0;
sp = page_header(__pa(sptep));
if (sp_ad_disabled(sp))
spte |= shadow_acc_track_value;
/* /*
* For the EPT case, shadow_present_mask is 0 if hardware * For the EPT case, shadow_present_mask is 0 if hardware
* supports exec-only page table entries. In that case, * supports exec-only page table entries. In that case,
...@@ -2678,7 +2722,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, ...@@ -2678,7 +2722,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
*/ */
spte |= shadow_present_mask; spte |= shadow_present_mask;
if (!speculative) if (!speculative)
spte |= shadow_accessed_mask; spte |= spte_shadow_accessed_mask(spte);
if (pte_access & ACC_EXEC_MASK) if (pte_access & ACC_EXEC_MASK)
spte |= shadow_x_mask; spte |= shadow_x_mask;
...@@ -2735,7 +2779,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, ...@@ -2735,7 +2779,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (pte_access & ACC_WRITE_MASK) { if (pte_access & ACC_WRITE_MASK) {
kvm_vcpu_mark_page_dirty(vcpu, gfn); kvm_vcpu_mark_page_dirty(vcpu, gfn);
spte |= shadow_dirty_mask; spte |= spte_shadow_dirty_mask(spte);
} }
if (speculative) if (speculative)
...@@ -2877,16 +2921,16 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) ...@@ -2877,16 +2921,16 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
{ {
struct kvm_mmu_page *sp; struct kvm_mmu_page *sp;
sp = page_header(__pa(sptep));
/* /*
* Since it's no accessed bit on EPT, it's no way to * Without accessed bits, there's no way to distinguish between
* distinguish between actually accessed translations * actually accessed translations and prefetched, so disable pte
* and prefetched, so disable pte prefetch if EPT is * prefetch if accessed bits aren't available.
* enabled.
*/ */
if (!shadow_accessed_mask) if (sp_ad_disabled(sp))
return; return;
sp = page_header(__pa(sptep));
if (sp->role.level > PT_PAGE_TABLE_LEVEL) if (sp->role.level > PT_PAGE_TABLE_LEVEL)
return; return;
...@@ -4290,6 +4334,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) ...@@ -4290,6 +4334,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->base_role.word = 0; context->base_role.word = 0;
context->base_role.smm = is_smm(vcpu); context->base_role.smm = is_smm(vcpu);
context->base_role.ad_disabled = (shadow_accessed_mask == 0);
context->page_fault = tdp_page_fault; context->page_fault = tdp_page_fault;
context->sync_page = nonpaging_sync_page; context->sync_page = nonpaging_sync_page;
context->invlpg = nonpaging_invlpg; context->invlpg = nonpaging_invlpg;
...@@ -4377,6 +4422,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, ...@@ -4377,6 +4422,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
context->root_level = context->shadow_root_level; context->root_level = context->shadow_root_level;
context->root_hpa = INVALID_PAGE; context->root_hpa = INVALID_PAGE;
context->direct_map = false; context->direct_map = false;
context->base_role.ad_disabled = !accessed_dirty;
update_permission_bitmask(vcpu, context, true); update_permission_bitmask(vcpu, context, true);
update_pkru_bitmask(vcpu, context, true); update_pkru_bitmask(vcpu, context, true);
...@@ -4636,6 +4682,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, ...@@ -4636,6 +4682,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
mask.smep_andnot_wp = 1; mask.smep_andnot_wp = 1;
mask.smap_andnot_wp = 1; mask.smap_andnot_wp = 1;
mask.smm = 1; mask.smm = 1;
mask.ad_disabled = 1;
/* /*
* If we don't have indirect shadow pages, it means no page is * If we don't have indirect shadow pages, it means no page is
......
...@@ -51,7 +51,7 @@ static inline u64 rsvd_bits(int s, int e) ...@@ -51,7 +51,7 @@ static inline u64 rsvd_bits(int s, int e)
return ((1ULL << (e - s + 1)) - 1) << s; return ((1ULL << (e - s + 1)) - 1) << s;
} }
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask); void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value);
void void
reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context); reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
......
...@@ -30,8 +30,9 @@ ...@@ -30,8 +30,9 @@
\ \
role.word = __entry->role; \ role.word = __entry->role; \
\ \
trace_seq_printf(p, "sp gen %lx gfn %llx %u%s q%u%s %s%s" \ trace_seq_printf(p, "sp gen %lx gfn %llx l%u%s q%u%s %s%s" \
" %snxe root %u %s%c", __entry->mmu_valid_gen, \ " %snxe %sad root %u %s%c", \
__entry->mmu_valid_gen, \
__entry->gfn, role.level, \ __entry->gfn, role.level, \
role.cr4_pae ? " pae" : "", \ role.cr4_pae ? " pae" : "", \
role.quadrant, \ role.quadrant, \
...@@ -39,6 +40,7 @@ ...@@ -39,6 +40,7 @@
access_str[role.access], \ access_str[role.access], \
role.invalid ? " invalid" : "", \ role.invalid ? " invalid" : "", \
role.nxe ? "" : "!", \ role.nxe ? "" : "!", \
role.ad_disabled ? "!" : "", \
__entry->root_count, \ __entry->root_count, \
__entry->unsync ? "unsync" : "sync", 0); \ __entry->unsync ? "unsync" : "sync", 0); \
saved_ptr; \ saved_ptr; \
......
...@@ -190,6 +190,7 @@ struct vcpu_svm { ...@@ -190,6 +190,7 @@ struct vcpu_svm {
struct nested_state nested; struct nested_state nested;
bool nmi_singlestep; bool nmi_singlestep;
u64 nmi_singlestep_guest_rflags;
unsigned int3_injected; unsigned int3_injected;
unsigned long int3_rip; unsigned long int3_rip;
...@@ -964,6 +965,18 @@ static void svm_disable_lbrv(struct vcpu_svm *svm) ...@@ -964,6 +965,18 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0); set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
} }
static void disable_nmi_singlestep(struct vcpu_svm *svm)
{
svm->nmi_singlestep = false;
if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
/* Clear our flags if they were not set by the guest */
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
}
}
/* Note: /* Note:
* This hash table is used to map VM_ID to a struct kvm_arch, * This hash table is used to map VM_ID to a struct kvm_arch,
* when handling AMD IOMMU GALOG notification to schedule in * when handling AMD IOMMU GALOG notification to schedule in
...@@ -1713,11 +1726,24 @@ static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu) ...@@ -1713,11 +1726,24 @@ static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu) static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
{ {
return to_svm(vcpu)->vmcb->save.rflags; struct vcpu_svm *svm = to_svm(vcpu);
unsigned long rflags = svm->vmcb->save.rflags;
if (svm->nmi_singlestep) {
/* Hide our flags if they were not set by the guest */
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
rflags &= ~X86_EFLAGS_TF;
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
rflags &= ~X86_EFLAGS_RF;
}
return rflags;
} }
static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
{ {
if (to_svm(vcpu)->nmi_singlestep)
rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
/* /*
* Any change of EFLAGS.VM is accompanied by a reload of SS * Any change of EFLAGS.VM is accompanied by a reload of SS
* (caused by either a task switch or an inter-privilege IRET), * (caused by either a task switch or an inter-privilege IRET),
...@@ -2112,10 +2138,7 @@ static int db_interception(struct vcpu_svm *svm) ...@@ -2112,10 +2138,7 @@ static int db_interception(struct vcpu_svm *svm)
} }
if (svm->nmi_singlestep) { if (svm->nmi_singlestep) {
svm->nmi_singlestep = false; disable_nmi_singlestep(svm);
if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
svm->vmcb->save.rflags &=
~(X86_EFLAGS_TF | X86_EFLAGS_RF);
} }
if (svm->vcpu.guest_debug & if (svm->vcpu.guest_debug &
...@@ -2370,8 +2393,8 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu) ...@@ -2370,8 +2393,8 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
static int nested_svm_check_permissions(struct vcpu_svm *svm) static int nested_svm_check_permissions(struct vcpu_svm *svm)
{ {
if (!(svm->vcpu.arch.efer & EFER_SVME) if (!(svm->vcpu.arch.efer & EFER_SVME) ||
|| !is_paging(&svm->vcpu)) { !is_paging(&svm->vcpu)) {
kvm_queue_exception(&svm->vcpu, UD_VECTOR); kvm_queue_exception(&svm->vcpu, UD_VECTOR);
return 1; return 1;
} }
...@@ -2381,7 +2404,7 @@ static int nested_svm_check_permissions(struct vcpu_svm *svm) ...@@ -2381,7 +2404,7 @@ static int nested_svm_check_permissions(struct vcpu_svm *svm)
return 1; return 1;
} }
return 0; return 0;
} }
static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
...@@ -2534,6 +2557,31 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm) ...@@ -2534,6 +2557,31 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST; return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
} }
/* DB exceptions for our internal use must not cause vmexit */
static int nested_svm_intercept_db(struct vcpu_svm *svm)
{
unsigned long dr6;
/* if we're not singlestepping, it's not ours */
if (!svm->nmi_singlestep)
return NESTED_EXIT_DONE;
/* if it's not a singlestep exception, it's not ours */
if (kvm_get_dr(&svm->vcpu, 6, &dr6))
return NESTED_EXIT_DONE;
if (!(dr6 & DR6_BS))
return NESTED_EXIT_DONE;
/* if the guest is singlestepping, it should get the vmexit */
if (svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF) {
disable_nmi_singlestep(svm);
return NESTED_EXIT_DONE;
}
/* it's ours, the nested hypervisor must not see this one */
return NESTED_EXIT_HOST;
}
static int nested_svm_exit_special(struct vcpu_svm *svm) static int nested_svm_exit_special(struct vcpu_svm *svm)
{ {
u32 exit_code = svm->vmcb->control.exit_code; u32 exit_code = svm->vmcb->control.exit_code;
...@@ -2589,8 +2637,12 @@ static int nested_svm_intercept(struct vcpu_svm *svm) ...@@ -2589,8 +2637,12 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
} }
case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: { case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE); u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
if (svm->nested.intercept_exceptions & excp_bits) if (svm->nested.intercept_exceptions & excp_bits) {
vmexit = NESTED_EXIT_DONE; if (exit_code == SVM_EXIT_EXCP_BASE + DB_VECTOR)
vmexit = nested_svm_intercept_db(svm);
else
vmexit = NESTED_EXIT_DONE;
}
/* async page fault always cause vmexit */ /* async page fault always cause vmexit */
else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) && else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
svm->apf_reason != 0) svm->apf_reason != 0)
...@@ -4627,10 +4679,17 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu) ...@@ -4627,10 +4679,17 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
== HF_NMI_MASK) == HF_NMI_MASK)
return; /* IRET will cause a vm exit */ return; /* IRET will cause a vm exit */
if ((svm->vcpu.arch.hflags & HF_GIF_MASK) == 0)
return; /* STGI will cause a vm exit */
if (svm->nested.exit_required)
return; /* we're not going to run the guest yet */
/* /*
* Something prevents NMI from been injected. Single step over possible * Something prevents NMI from been injected. Single step over possible
* problem (IRET or exception injection or interrupt shadow) * problem (IRET or exception injection or interrupt shadow)
*/ */
svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
svm->nmi_singlestep = true; svm->nmi_singlestep = true;
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
} }
...@@ -4771,6 +4830,22 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) ...@@ -4771,6 +4830,22 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(svm->nested.exit_required)) if (unlikely(svm->nested.exit_required))
return; return;
/*
* Disable singlestep if we're injecting an interrupt/exception.
* We don't want our modified rflags to be pushed on the stack where
* we might not be able to easily reset them if we disabled NMI
* singlestep later.
*/
if (svm->nmi_singlestep && svm->vmcb->control.event_inj) {
/*
* Event injection happens before external interrupts cause a
* vmexit and interrupts are disabled here, so smp_send_reschedule
* is enough to force an immediate vmexit.
*/
disable_nmi_singlestep(svm);
smp_send_reschedule(vcpu->cpu);
}
pre_svm_run(svm); pre_svm_run(svm);
sync_lapic_to_cr8(vcpu); sync_lapic_to_cr8(vcpu);
......
...@@ -913,8 +913,9 @@ static void nested_release_page_clean(struct page *page) ...@@ -913,8 +913,9 @@ static void nested_release_page_clean(struct page *page)
kvm_release_page_clean(page); kvm_release_page_clean(page);
} }
static bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu);
static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu); static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
static u64 construct_eptp(unsigned long root_hpa); static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa);
static bool vmx_xsaves_supported(void); static bool vmx_xsaves_supported(void);
static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr); static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
static void vmx_set_segment(struct kvm_vcpu *vcpu, static void vmx_set_segment(struct kvm_vcpu *vcpu,
...@@ -2772,7 +2773,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx) ...@@ -2772,7 +2773,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
if (enable_ept_ad_bits) { if (enable_ept_ad_bits) {
vmx->nested.nested_vmx_secondary_ctls_high |= vmx->nested.nested_vmx_secondary_ctls_high |=
SECONDARY_EXEC_ENABLE_PML; SECONDARY_EXEC_ENABLE_PML;
vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT; vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT;
} }
} else } else
vmx->nested.nested_vmx_ept_caps = 0; vmx->nested.nested_vmx_ept_caps = 0;
...@@ -3198,7 +3199,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ...@@ -3198,7 +3199,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP); msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP);
break; break;
case MSR_IA32_BNDCFGS: case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported()) if (!kvm_mpx_supported() ||
(!msr_info->host_initiated && !guest_cpuid_has_mpx(vcpu)))
return 1; return 1;
msr_info->data = vmcs_read64(GUEST_BNDCFGS); msr_info->data = vmcs_read64(GUEST_BNDCFGS);
break; break;
...@@ -3280,7 +3282,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ...@@ -3280,7 +3282,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmcs_writel(GUEST_SYSENTER_ESP, data); vmcs_writel(GUEST_SYSENTER_ESP, data);
break; break;
case MSR_IA32_BNDCFGS: case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported()) if (!kvm_mpx_supported() ||
(!msr_info->host_initiated && !guest_cpuid_has_mpx(vcpu)))
return 1;
if (is_noncanonical_address(data & PAGE_MASK) ||
(data & MSR_IA32_BNDCFGS_RSVD))
return 1; return 1;
vmcs_write64(GUEST_BNDCFGS, data); vmcs_write64(GUEST_BNDCFGS, data);
break; break;
...@@ -4013,7 +4019,7 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid) ...@@ -4013,7 +4019,7 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
if (enable_ept) { if (enable_ept) {
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return; return;
ept_sync_context(construct_eptp(vcpu->arch.mmu.root_hpa)); ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));
} else { } else {
vpid_sync_context(vpid); vpid_sync_context(vpid);
} }
...@@ -4188,14 +4194,15 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) ...@@ -4188,14 +4194,15 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
vmx->emulation_required = emulation_required(vcpu); vmx->emulation_required = emulation_required(vcpu);
} }
static u64 construct_eptp(unsigned long root_hpa) static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
{ {
u64 eptp; u64 eptp;
/* TODO write the value reading from MSR */ /* TODO write the value reading from MSR */
eptp = VMX_EPT_DEFAULT_MT | eptp = VMX_EPT_DEFAULT_MT |
VMX_EPT_DEFAULT_GAW << VMX_EPT_GAW_EPTP_SHIFT; VMX_EPT_DEFAULT_GAW << VMX_EPT_GAW_EPTP_SHIFT;
if (enable_ept_ad_bits) if (enable_ept_ad_bits &&
(!is_guest_mode(vcpu) || nested_ept_ad_enabled(vcpu)))
eptp |= VMX_EPT_AD_ENABLE_BIT; eptp |= VMX_EPT_AD_ENABLE_BIT;
eptp |= (root_hpa & PAGE_MASK); eptp |= (root_hpa & PAGE_MASK);
...@@ -4209,7 +4216,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) ...@@ -4209,7 +4216,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
guest_cr3 = cr3; guest_cr3 = cr3;
if (enable_ept) { if (enable_ept) {
eptp = construct_eptp(cr3); eptp = construct_eptp(vcpu, cr3);
vmcs_write64(EPT_POINTER, eptp); vmcs_write64(EPT_POINTER, eptp);
if (is_paging(vcpu) || is_guest_mode(vcpu)) if (is_paging(vcpu) || is_guest_mode(vcpu))
guest_cr3 = kvm_read_cr3(vcpu); guest_cr3 = kvm_read_cr3(vcpu);
...@@ -5170,7 +5177,8 @@ static void ept_set_mmio_spte_mask(void) ...@@ -5170,7 +5177,8 @@ static void ept_set_mmio_spte_mask(void)
* EPT Misconfigurations can be generated if the value of bits 2:0 * EPT Misconfigurations can be generated if the value of bits 2:0
* of an EPT paging-structure entry is 110b (write/execute). * of an EPT paging-structure entry is 110b (write/execute).
*/ */
kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE); kvm_mmu_set_mmio_spte_mask(VMX_EPT_RWX_MASK,
VMX_EPT_MISCONFIG_WX_VALUE);
} }
#define VMX_XSS_EXIT_BITMAP 0 #define VMX_XSS_EXIT_BITMAP 0
...@@ -6220,17 +6228,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) ...@@ -6220,17 +6228,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
exit_qualification = vmcs_readl(EXIT_QUALIFICATION); exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
if (is_guest_mode(vcpu)
&& !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
/*
* Fix up exit_qualification according to whether guest
* page table accesses are reads or writes.
*/
u64 eptp = nested_ept_get_cr3(vcpu);
if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
}
/* /*
* EPT violation happened while executing iret from NMI, * EPT violation happened while executing iret from NMI,
* "blocked by NMI" bit has to be set before next VM entry. * "blocked by NMI" bit has to be set before next VM entry.
...@@ -6453,7 +6450,7 @@ void vmx_enable_tdp(void) ...@@ -6453,7 +6450,7 @@ void vmx_enable_tdp(void)
enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull, enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
0ull, VMX_EPT_EXECUTABLE_MASK, 0ull, VMX_EPT_EXECUTABLE_MASK,
cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK, cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
enable_ept_ad_bits ? 0ull : VMX_EPT_RWX_MASK); VMX_EPT_RWX_MASK);
ept_set_mmio_spte_mask(); ept_set_mmio_spte_mask();
kvm_enable_tdp(); kvm_enable_tdp();
...@@ -6557,7 +6554,6 @@ static __init int hardware_setup(void) ...@@ -6557,7 +6554,6 @@ static __init int hardware_setup(void)
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true);
memcpy(vmx_msr_bitmap_legacy_x2apic_apicv, memcpy(vmx_msr_bitmap_legacy_x2apic_apicv,
vmx_msr_bitmap_legacy, PAGE_SIZE); vmx_msr_bitmap_legacy, PAGE_SIZE);
...@@ -7661,7 +7657,10 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) ...@@ -7661,7 +7657,10 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
unsigned long type, types; unsigned long type, types;
gva_t gva; gva_t gva;
struct x86_exception e; struct x86_exception e;
int vpid; struct {
u64 vpid;
u64 gla;
} operand;
if (!(vmx->nested.nested_vmx_secondary_ctls_high & if (!(vmx->nested.nested_vmx_secondary_ctls_high &
SECONDARY_EXEC_ENABLE_VPID) || SECONDARY_EXEC_ENABLE_VPID) ||
...@@ -7691,17 +7690,28 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) ...@@ -7691,17 +7690,28 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
vmx_instruction_info, false, &gva)) vmx_instruction_info, false, &gva))
return 1; return 1;
if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vpid, if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
sizeof(u32), &e)) { sizeof(operand), &e)) {
kvm_inject_page_fault(vcpu, &e); kvm_inject_page_fault(vcpu, &e);
return 1; return 1;
} }
if (operand.vpid >> 16) {
nested_vmx_failValid(vcpu,
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
return kvm_skip_emulated_instruction(vcpu);
}
switch (type) { switch (type) {
case VMX_VPID_EXTENT_INDIVIDUAL_ADDR: case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
if (is_noncanonical_address(operand.gla)) {
nested_vmx_failValid(vcpu,
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
return kvm_skip_emulated_instruction(vcpu);
}
/* fall through */
case VMX_VPID_EXTENT_SINGLE_CONTEXT: case VMX_VPID_EXTENT_SINGLE_CONTEXT:
case VMX_VPID_EXTENT_SINGLE_NON_GLOBAL: case VMX_VPID_EXTENT_SINGLE_NON_GLOBAL:
if (!vpid) { if (!operand.vpid) {
nested_vmx_failValid(vcpu, nested_vmx_failValid(vcpu,
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
return kvm_skip_emulated_instruction(vcpu); return kvm_skip_emulated_instruction(vcpu);
...@@ -9394,6 +9404,11 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, ...@@ -9394,6 +9404,11 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
vmcs12->guest_physical_address = fault->address; vmcs12->guest_physical_address = fault->address;
} }
static bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
{
return nested_ept_get_cr3(vcpu) & VMX_EPT_AD_ENABLE_BIT;
}
/* Callbacks for nested_ept_init_mmu_context: */ /* Callbacks for nested_ept_init_mmu_context: */
static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu) static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
...@@ -9404,18 +9419,18 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu) ...@@ -9404,18 +9419,18 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu) static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
{ {
u64 eptp; bool wants_ad;
WARN_ON(mmu_is_nested(vcpu)); WARN_ON(mmu_is_nested(vcpu));
eptp = nested_ept_get_cr3(vcpu); wants_ad = nested_ept_ad_enabled(vcpu);
if ((eptp & VMX_EPT_AD_ENABLE_BIT) && !enable_ept_ad_bits) if (wants_ad && !enable_ept_ad_bits)
return 1; return 1;
kvm_mmu_unload(vcpu); kvm_mmu_unload(vcpu);
kvm_init_shadow_ept_mmu(vcpu, kvm_init_shadow_ept_mmu(vcpu,
to_vmx(vcpu)->nested.nested_vmx_ept_caps & to_vmx(vcpu)->nested.nested_vmx_ept_caps &
VMX_EPT_EXECUTE_ONLY_BIT, VMX_EPT_EXECUTE_ONLY_BIT,
eptp & VMX_EPT_AD_ENABLE_BIT); wants_ad);
vcpu->arch.mmu.set_cr3 = vmx_set_cr3; vcpu->arch.mmu.set_cr3 = vmx_set_cr3;
vcpu->arch.mmu.get_cr3 = nested_ept_get_cr3; vcpu->arch.mmu.get_cr3 = nested_ept_get_cr3;
vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault; vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
...@@ -10728,8 +10743,7 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) ...@@ -10728,8 +10743,7 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3); vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
} }
if (nested_cpu_has_ept(vmcs12)) vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS);
vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS);
if (nested_cpu_has_vid(vmcs12)) if (nested_cpu_has_vid(vmcs12))
vmcs12->guest_intr_status = vmcs_read16(GUEST_INTR_STATUS); vmcs12->guest_intr_status = vmcs_read16(GUEST_INTR_STATUS);
...@@ -10754,8 +10768,6 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) ...@@ -10754,8 +10768,6 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP); vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP);
if (kvm_mpx_supported()) if (kvm_mpx_supported())
vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
if (nested_cpu_has_xsaves(vmcs12))
vmcs12->xss_exit_bitmap = vmcs_read64(XSS_EXIT_BITMAP);
} }
/* /*
...@@ -11152,7 +11164,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc) ...@@ -11152,7 +11164,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
vmx->hv_deadline_tsc = tscl + delta_tsc; vmx->hv_deadline_tsc = tscl + delta_tsc;
vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL, vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
PIN_BASED_VMX_PREEMPTION_TIMER); PIN_BASED_VMX_PREEMPTION_TIMER);
return 0;
return delta_tsc == 0;
} }
static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu) static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
......
...@@ -2841,10 +2841,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) ...@@ -2841,10 +2841,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_vcpu_write_tsc_offset(vcpu, offset); kvm_vcpu_write_tsc_offset(vcpu, offset);
vcpu->arch.tsc_catchup = 1; vcpu->arch.tsc_catchup = 1;
} }
if (kvm_lapic_hv_timer_in_use(vcpu) &&
kvm_x86_ops->set_hv_timer(vcpu, if (kvm_lapic_hv_timer_in_use(vcpu))
kvm_get_lapic_target_expiration_tsc(vcpu))) kvm_lapic_restart_hv_timer(vcpu);
kvm_lapic_switch_to_sw_timer(vcpu);
/* /*
* On a host with synchronized TSC, there is no need to update * On a host with synchronized TSC, there is no need to update
* kvmclock on vcpu->cpu migration * kvmclock on vcpu->cpu migration
...@@ -6011,7 +6011,7 @@ static void kvm_set_mmio_spte_mask(void) ...@@ -6011,7 +6011,7 @@ static void kvm_set_mmio_spte_mask(void)
mask &= ~1ull; mask &= ~1ull;
#endif #endif
kvm_mmu_set_mmio_spte_mask(mask); kvm_mmu_set_mmio_spte_mask(mask, mask);
} }
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
...@@ -6733,7 +6733,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ...@@ -6733,7 +6733,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
bool req_immediate_exit = false; bool req_immediate_exit = false;
if (vcpu->requests) { if (kvm_request_pending(vcpu)) {
if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
kvm_mmu_unload(vcpu); kvm_mmu_unload(vcpu);
if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
...@@ -6897,7 +6897,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ...@@ -6897,7 +6897,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_ops->sync_pir_to_irr(vcpu); kvm_x86_ops->sync_pir_to_irr(vcpu);
} }
if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
|| need_resched() || signal_pending(current)) { || need_resched() || signal_pending(current)) {
vcpu->mode = OUTSIDE_GUEST_MODE; vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb(); smp_wmb();
......
...@@ -57,9 +57,7 @@ struct arch_timer_cpu { ...@@ -57,9 +57,7 @@ struct arch_timer_cpu {
int kvm_timer_hyp_init(void); int kvm_timer_hyp_init(void);
int kvm_timer_enable(struct kvm_vcpu *vcpu); int kvm_timer_enable(struct kvm_vcpu *vcpu);
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu, int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
const struct kvm_irq_level *virt_irq,
const struct kvm_irq_level *phys_irq);
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu); void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu); void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu); void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
...@@ -70,6 +68,10 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu); ...@@ -70,6 +68,10 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);
u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid); u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value); int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx); bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
void kvm_timer_schedule(struct kvm_vcpu *vcpu); void kvm_timer_schedule(struct kvm_vcpu *vcpu);
void kvm_timer_unschedule(struct kvm_vcpu *vcpu); void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
......
...@@ -35,6 +35,7 @@ struct kvm_pmu { ...@@ -35,6 +35,7 @@ struct kvm_pmu {
int irq_num; int irq_num;
struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS]; struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS];
bool ready; bool ready;
bool created;
bool irq_level; bool irq_level;
}; };
...@@ -63,6 +64,7 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, ...@@ -63,6 +64,7 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr); struct kvm_device_attr *attr);
int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr); struct kvm_device_attr *attr);
int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu);
#else #else
struct kvm_pmu { struct kvm_pmu {
}; };
...@@ -112,6 +114,10 @@ static inline int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, ...@@ -112,6 +114,10 @@ static inline int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu,
{ {
return -ENXIO; return -ENXIO;
} }
static inline int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
{
return 0;
}
#endif #endif
#endif #endif
此差异已折叠。
...@@ -405,6 +405,7 @@ ...@@ -405,6 +405,7 @@
#define ICH_LR_PHYS_ID_SHIFT 32 #define ICH_LR_PHYS_ID_SHIFT 32
#define ICH_LR_PHYS_ID_MASK (0x3ffULL << ICH_LR_PHYS_ID_SHIFT) #define ICH_LR_PHYS_ID_MASK (0x3ffULL << ICH_LR_PHYS_ID_SHIFT)
#define ICH_LR_PRIORITY_SHIFT 48 #define ICH_LR_PRIORITY_SHIFT 48
#define ICH_LR_PRIORITY_MASK (0xffULL << ICH_LR_PRIORITY_SHIFT)
/* These are for GICv2 emulation only */ /* These are for GICv2 emulation only */
#define GICH_LR_VIRTUALID (0x3ffUL << 0) #define GICH_LR_VIRTUALID (0x3ffUL << 0)
...@@ -416,6 +417,11 @@ ...@@ -416,6 +417,11 @@
#define ICH_HCR_EN (1 << 0) #define ICH_HCR_EN (1 << 0)
#define ICH_HCR_UIE (1 << 1) #define ICH_HCR_UIE (1 << 1)
#define ICH_HCR_TC (1 << 10)
#define ICH_HCR_TALL0 (1 << 11)
#define ICH_HCR_TALL1 (1 << 12)
#define ICH_HCR_EOIcount_SHIFT 27
#define ICH_HCR_EOIcount_MASK (0x1f << ICH_HCR_EOIcount_SHIFT)
#define ICH_VMCR_ACK_CTL_SHIFT 2 #define ICH_VMCR_ACK_CTL_SHIFT 2
#define ICH_VMCR_ACK_CTL_MASK (1 << ICH_VMCR_ACK_CTL_SHIFT) #define ICH_VMCR_ACK_CTL_MASK (1 << ICH_VMCR_ACK_CTL_SHIFT)
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -60,7 +60,7 @@ static const unsigned short cc_map[16] = { ...@@ -60,7 +60,7 @@ static const unsigned short cc_map[16] = {
/* /*
* Check if a trapped instruction should have been executed or not. * Check if a trapped instruction should have been executed or not.
*/ */
bool kvm_condition_valid32(const struct kvm_vcpu *vcpu) bool __hyp_text kvm_condition_valid32(const struct kvm_vcpu *vcpu)
{ {
unsigned long cpsr; unsigned long cpsr;
u32 cpsr_cond; u32 cpsr_cond;
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -34,7 +34,7 @@ static int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e, ...@@ -34,7 +34,7 @@ static int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e,
if (!vgic_valid_spi(kvm, spi_id)) if (!vgic_valid_spi(kvm, spi_id))
return -EINVAL; return -EINVAL;
return kvm_vgic_inject_irq(kvm, 0, spi_id, level); return kvm_vgic_inject_irq(kvm, 0, spi_id, level, NULL);
} }
/** /**
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册