提交 7cbb39d4 编写于 作者: L Linus Torvalds

Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "PPC and ARM do not have much going on this time.  Most of the cool
  stuff, instead, is in s390 and (after a few releases) x86.

  ARM has some caching fixes and PPC has transactional memory support in
  guests.  MIPS has some fixes, with more probably coming in 3.16 as
  QEMU will soon get support for MIPS KVM.

  For x86 there are optimizations for debug registers, which trigger on
  some Windows games, and other important fixes for Windows guests.  We
  now expose to the guest Broadwell instruction set extensions and also
  Intel MPX.  There's also a fix/workaround for OS X guests, nested
  virtualization features (preemption timer), and a couple kvmclock
  refinements.

  For s390, the main news is asynchronous page faults, together with
  improvements to IRQs (floating irqs and adapter irqs) that speed up
  virtio devices"

* tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
  KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
  KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
  KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
  KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
  KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
  KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
  KVM: PPC: Book3S HV: Add transactional memory support
  KVM: Specify byte order for KVM_EXIT_MMIO
  KVM: vmx: fix MPX detection
  KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
  KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
  KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
  KVM: s390: clear local interrupts at cpu initial reset
  KVM: s390: Fix possible memory leak in SIGP functions
  KVM: s390: fix calculation of idle_mask array size
  KVM: s390: randomize sca address
  KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
  KVM: Bump KVM_MAX_IRQ_ROUTES for s390
  KVM: s390: irq routing for adapter interrupts.
  KVM: s390: adapter interrupt sources
  ...
......@@ -586,8 +586,8 @@ struct kvm_fpu {
4.24 KVM_CREATE_IRQCHIP
Capability: KVM_CAP_IRQCHIP
Architectures: x86, ia64, ARM, arm64
Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
Architectures: x86, ia64, ARM, arm64, s390
Type: vm ioctl
Parameters: none
Returns: 0 on success, -1 on error
......@@ -596,7 +596,10 @@ Creates an interrupt controller model in the kernel. On x86, creates a virtual
ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
only go to the IOAPIC. On ia64, a IOSAPIC is created. On ARM/arm64, a GIC is
created.
created. On s390, a dummy irq routing table is created.
Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
4.25 KVM_IRQ_LINE
......@@ -612,6 +615,20 @@ On some architectures it is required that an interrupt controller model has
been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered
interrupts require the level to be set to 1 and then back to 0.
On real hardware, interrupt pins can be active-low or active-high. This
does not matter for the level field of struct kvm_irq_level: 1 always
means active (asserted), 0 means inactive (deasserted).
x86 allows the operating system to program the interrupt polarity
(active-low/active-high) for level-triggered interrupts, and KVM used
to consider the polarity. However, due to bitrot in the handling of
active-low interrupts, the above convention is now valid on x86 too.
This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace
should not present interrupts to the guest as active-low unless this
capability is present (or unless it is not using the in-kernel irqchip,
of course).
ARM/arm64 can signal an interrupt either at the CPU level, or at the
in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
use PPIs designated for specific cpus. The irq field is interpreted
......@@ -628,7 +645,7 @@ The irq_type field has the following values:
(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
In both cases, level is used to raise/lower the line.
In both cases, level is used to assert/deassert the line.
struct kvm_irq_level {
union {
......@@ -918,9 +935,9 @@ documentation when it pops into existence).
4.37 KVM_ENABLE_CAP
Capability: KVM_CAP_ENABLE_CAP
Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
Architectures: ppc, s390
Type: vcpu ioctl
Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
Parameters: struct kvm_enable_cap (in)
Returns: 0 on success; -1 on error
......@@ -951,6 +968,8 @@ function properly, this is the place to put them.
__u8 pad[64];
};
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
4.38 KVM_GET_MP_STATE
......@@ -1320,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
4.52 KVM_SET_GSI_ROUTING
Capability: KVM_CAP_IRQ_ROUTING
Architectures: x86 ia64
Architectures: x86 ia64 s390
Type: vm ioctl
Parameters: struct kvm_irq_routing (in)
Returns: 0 on success, -1 on error
......@@ -1343,6 +1362,7 @@ struct kvm_irq_routing_entry {
union {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
__u32 pad[8];
} u;
};
......@@ -1350,6 +1370,7 @@ struct kvm_irq_routing_entry {
/* gsi routing entry types */
#define KVM_IRQ_ROUTING_IRQCHIP 1
#define KVM_IRQ_ROUTING_MSI 2
#define KVM_IRQ_ROUTING_S390_ADAPTER 3
No flags are specified so far, the corresponding field must be set to zero.
......@@ -1365,6 +1386,14 @@ struct kvm_irq_routing_msi {
__u32 pad;
};
struct kvm_irq_routing_s390_adapter {
__u64 ind_addr;
__u64 summary_addr;
__u64 ind_offset;
__u32 summary_offset;
__u32 adapter_id;
};
4.53 KVM_ASSIGN_SET_MSIX_NR
......@@ -2566,6 +2595,10 @@ executed a memory-mapped I/O instruction which could not be satisfied
by kvm. The 'data' member contains the written data if 'is_write' is
true, and should be filled by application code otherwise.
The 'data' member contains, in its first 'len' bytes, the value as it would
appear if the VCPU performed a load or store of the appropriate width directly
to the byte array.
NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
operations are complete (and guest state is consistent) only after userspace
......
FLIC (floating interrupt controller)
====================================
FLIC handles floating (non per-cpu) interrupts, i.e. I/O, service and some
machine check interruptions. All interrupts are stored in a per-vm list of
pending interrupts. FLIC performs operations on this list.
Only one FLIC instance may be instantiated.
FLIC provides support to
- add interrupts (KVM_DEV_FLIC_ENQUEUE)
- inspect currently pending interrupts (KVM_FLIC_GET_ALL_IRQS)
- purge all pending floating interrupts (KVM_DEV_FLIC_CLEAR_IRQS)
- enable/disable for the guest transparent async page faults
- register and modify adapter interrupt sources (KVM_DEV_FLIC_ADAPTER_*)
Groups:
KVM_DEV_FLIC_ENQUEUE
Passes a buffer and length into the kernel which are then injected into
the list of pending interrupts.
attr->addr contains the pointer to the buffer and attr->attr contains
the length of the buffer.
The format of the data structure kvm_s390_irq as it is copied from userspace
is defined in usr/include/linux/kvm.h.
KVM_DEV_FLIC_GET_ALL_IRQS
Copies all floating interrupts into a buffer provided by userspace.
When the buffer is too small it returns -ENOMEM, which is the indication
for userspace to try again with a bigger buffer.
All interrupts remain pending, i.e. are not deleted from the list of
currently pending interrupts.
attr->addr contains the userspace address of the buffer into which all
interrupt data will be copied.
attr->attr contains the size of the buffer in bytes.
KVM_DEV_FLIC_CLEAR_IRQS
Simply deletes all elements from the list of currently pending floating
interrupts. No interrupts are injected into the guest.
KVM_DEV_FLIC_APF_ENABLE
Enables async page faults for the guest. So in case of a major page fault
the host is allowed to handle this async and continues the guest.
KVM_DEV_FLIC_APF_DISABLE_WAIT
Disables async page faults for the guest and waits until already pending
async page faults are done. This is necessary to trigger a completion interrupt
for every init interrupt before migrating the interrupt list.
KVM_DEV_FLIC_ADAPTER_REGISTER
Register an I/O adapter interrupt source. Takes a kvm_s390_io_adapter
describing the adapter to register:
struct kvm_s390_io_adapter {
__u32 id;
__u8 isc;
__u8 maskable;
__u8 swap;
__u8 pad;
};
id contains the unique id for the adapter, isc the I/O interruption subclass
to use, maskable whether this adapter may be masked (interrupts turned off)
and swap whether the indicators need to be byte swapped.
KVM_DEV_FLIC_ADAPTER_MODIFY
Modifies attributes of an existing I/O adapter interrupt source. Takes
a kvm_s390_io_adapter_req specifiying the adapter and the operation:
struct kvm_s390_io_adapter_req {
__u32 id;
__u8 type;
__u8 mask;
__u16 pad0;
__u64 addr;
};
id specifies the adapter and type the operation. The supported operations
are:
KVM_S390_IO_ADAPTER_MASK
mask or unmask the adapter, as specified in mask
KVM_S390_IO_ADAPTER_MAP
perform a gmap translation for the guest address provided in addr,
pin a userspace page for the translated address and add it to the
list of mappings
KVM_S390_IO_ADAPTER_UNMAP
release a userspace page for the translated address specified in addr
from the list of mappings
......@@ -55,6 +55,7 @@
* The bits we set in HCR:
* TAC: Trap ACTLR
* TSC: Trap SMC
* TVM: Trap VM ops (until MMU and caches are on)
* TSW: Trap cache operations by set/way
* TWI: Trap WFI
* TWE: Trap WFE
......@@ -68,8 +69,7 @@
*/
#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
HCR_TWE | HCR_SWIO | HCR_TIDCP)
#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
HCR_TVM | HCR_TWE | HCR_SWIO | HCR_TIDCP)
/* System Control Register (SCTLR) bits */
#define SCTLR_TE (1 << 30)
......
......@@ -48,7 +48,9 @@
#define c13_TID_URO 26 /* Thread ID, User R/O */
#define c13_TID_PRIV 27 /* Thread ID, Privileged */
#define c14_CNTKCTL 28 /* Timer Control Register (PL1) */
#define NR_CP15_REGS 29 /* Number of regs (incl. invalid) */
#define c10_AMAIR0 29 /* Auxilary Memory Attribute Indirection Reg0 */
#define c10_AMAIR1 30 /* Auxilary Memory Attribute Indirection Reg1 */
#define NR_CP15_REGS 31 /* Number of regs (incl. invalid) */
#define ARM_EXCEPTION_RESET 0
#define ARM_EXCEPTION_UNDEFINED 1
......
......@@ -101,6 +101,12 @@ struct kvm_vcpu_arch {
/* The CPU type we expose to the VM */
u32 midr;
/* HYP trapping configuration */
u32 hcr;
/* Interrupt related fields */
u32 irq_lines; /* IRQ and FIQ levels */
/* Exception Information */
struct kvm_vcpu_fault_info fault;
......@@ -128,9 +134,6 @@ struct kvm_vcpu_arch {
/* IO related fields */
struct kvm_decode mmio_decode;
/* Interrupt related fields */
u32 irq_lines; /* IRQ and FIQ levels */
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
......
......@@ -114,11 +114,34 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
pmd_val(*pmd) |= L_PMD_S2_RDWR;
}
/* Open coded p*d_addr_end that can deal with 64bit addresses */
#define kvm_pgd_addr_end(addr, end) \
({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK; \
(__boundary - 1 < (end) - 1)? __boundary: (end); \
})
#define kvm_pud_addr_end(addr,end) (end)
#define kvm_pmd_addr_end(addr, end) \
({ u64 __boundary = ((addr) + PMD_SIZE) & PMD_MASK; \
(__boundary - 1 < (end) - 1)? __boundary: (end); \
})
struct kvm;
static inline void coherent_icache_guest_page(struct kvm *kvm, hva_t hva,
unsigned long size)
#define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l))
static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
{
return (vcpu->arch.cp15[c1_SCTLR] & 0b101) == 0b101;
}
static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
unsigned long size)
{
if (!vcpu_has_cache_enabled(vcpu))
kvm_flush_dcache_to_poc((void *)hva, size);
/*
* If we are going to insert an instruction page and the icache is
* either VIPT or PIPT, there is a potential problem where the host
......@@ -139,9 +162,10 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, hva_t hva,
}
}
#define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l))
#define kvm_virt_to_phys(x) virt_to_idmap((unsigned long)(x))
void stage2_flush_vm(struct kvm *kvm);
#endif /* !__ASSEMBLY__ */
#endif /* __ARM_KVM_MMU_H__ */
......@@ -174,6 +174,7 @@ int main(void)
DEFINE(VCPU_FIQ_REGS, offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr));
DEFINE(VCPU_IRQ_LINES, offsetof(struct kvm_vcpu, arch.irq_lines));
DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr));
DEFINE(VCPU_HxFAR, offsetof(struct kvm_vcpu, arch.fault.hxfar));
......
......@@ -23,6 +23,7 @@
#include <asm/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_coproc.h>
#include <asm/kvm_mmu.h>
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <trace/events/kvm.h>
......@@ -204,6 +205,44 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
return true;
}
/*
* Generic accessor for VM registers. Only called as long as HCR_TVM
* is set.
*/
static bool access_vm_reg(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r)
{
BUG_ON(!p->is_write);
vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1);
if (p->is_64bit)
vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2);
return true;
}
/*
* SCTLR accessor. Only called as long as HCR_TVM is set. If the
* guest enables the MMU, we stop trapping the VM sys_regs and leave
* it in complete control of the caches.
*
* Used by the cpu-specific code.
*/
bool access_sctlr(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r)
{
access_vm_reg(vcpu, p, r);
if (vcpu_has_cache_enabled(vcpu)) { /* MMU+Caches enabled? */
vcpu->arch.hcr &= ~HCR_TVM;
stage2_flush_vm(vcpu->kvm);
}
return true;
}
/*
* We could trap ID_DFR0 and tell the guest we don't support performance
* monitoring. Unfortunately the patch to make the kernel check ID_DFR0 was
......@@ -261,33 +300,36 @@ static const struct coproc_reg cp15_regs[] = {
{ CRn( 1), CRm( 0), Op1( 0), Op2( 2), is32,
NULL, reset_val, c1_CPACR, 0x00000000 },
/* TTBR0/TTBR1: swapped by interrupt.S. */
{ CRm64( 2), Op1( 0), is64, NULL, reset_unknown64, c2_TTBR0 },
{ CRm64( 2), Op1( 1), is64, NULL, reset_unknown64, c2_TTBR1 },
/* TTBCR: swapped by interrupt.S. */
/* TTBR0/TTBR1/TTBCR: swapped by interrupt.S. */
{ CRm64( 2), Op1( 0), is64, access_vm_reg, reset_unknown64, c2_TTBR0 },
{ CRn(2), CRm( 0), Op1( 0), Op2( 0), is32,
access_vm_reg, reset_unknown, c2_TTBR0 },
{ CRn(2), CRm( 0), Op1( 0), Op2( 1), is32,
access_vm_reg, reset_unknown, c2_TTBR1 },
{ CRn( 2), CRm( 0), Op1( 0), Op2( 2), is32,
NULL, reset_val, c2_TTBCR, 0x00000000 },
access_vm_reg, reset_val, c2_TTBCR, 0x00000000 },
{ CRm64( 2), Op1( 1), is64, access_vm_reg, reset_unknown64, c2_TTBR1 },
/* DACR: swapped by interrupt.S. */
{ CRn( 3), CRm( 0), Op1( 0), Op2( 0), is32,
NULL, reset_unknown, c3_DACR },
access_vm_reg, reset_unknown, c3_DACR },
/* DFSR/IFSR/ADFSR/AIFSR: swapped by interrupt.S. */
{ CRn( 5), CRm( 0), Op1( 0), Op2( 0), is32,
NULL, reset_unknown, c5_DFSR },
access_vm_reg, reset_unknown, c5_DFSR },
{ CRn( 5), CRm( 0), Op1( 0), Op2( 1), is32,
NULL, reset_unknown, c5_IFSR },
access_vm_reg, reset_unknown, c5_IFSR },
{ CRn( 5), CRm( 1), Op1( 0), Op2( 0), is32,
NULL, reset_unknown, c5_ADFSR },
access_vm_reg, reset_unknown, c5_ADFSR },
{ CRn( 5), CRm( 1), Op1( 0), Op2( 1), is32,
NULL, reset_unknown, c5_AIFSR },
access_vm_reg, reset_unknown, c5_AIFSR },
/* DFAR/IFAR: swapped by interrupt.S. */
{ CRn( 6), CRm( 0), Op1( 0), Op2( 0), is32,
NULL, reset_unknown, c6_DFAR },
access_vm_reg, reset_unknown, c6_DFAR },
{ CRn( 6), CRm( 0), Op1( 0), Op2( 2), is32,
NULL, reset_unknown, c6_IFAR },
access_vm_reg, reset_unknown, c6_IFAR },
/* PAR swapped by interrupt.S */
{ CRm64( 7), Op1( 0), is64, NULL, reset_unknown64, c7_PAR },
......@@ -324,9 +366,15 @@ static const struct coproc_reg cp15_regs[] = {
/* PRRR/NMRR (aka MAIR0/MAIR1): swapped by interrupt.S. */
{ CRn(10), CRm( 2), Op1( 0), Op2( 0), is32,
NULL, reset_unknown, c10_PRRR},
access_vm_reg, reset_unknown, c10_PRRR},
{ CRn(10), CRm( 2), Op1( 0), Op2( 1), is32,
NULL, reset_unknown, c10_NMRR},
access_vm_reg, reset_unknown, c10_NMRR},
/* AMAIR0/AMAIR1: swapped by interrupt.S. */
{ CRn(10), CRm( 3), Op1( 0), Op2( 0), is32,
access_vm_reg, reset_unknown, c10_AMAIR0},
{ CRn(10), CRm( 3), Op1( 0), Op2( 1), is32,
access_vm_reg, reset_unknown, c10_AMAIR1},
/* VBAR: swapped by interrupt.S. */
{ CRn(12), CRm( 0), Op1( 0), Op2( 0), is32,
......@@ -334,7 +382,7 @@ static const struct coproc_reg cp15_regs[] = {
/* CONTEXTIDR/TPIDRURW/TPIDRURO/TPIDRPRW: swapped by interrupt.S. */
{ CRn(13), CRm( 0), Op1( 0), Op2( 1), is32,
NULL, reset_val, c13_CID, 0x00000000 },
access_vm_reg, reset_val, c13_CID, 0x00000000 },
{ CRn(13), CRm( 0), Op1( 0), Op2( 2), is32,
NULL, reset_unknown, c13_TID_URW },
{ CRn(13), CRm( 0), Op1( 0), Op2( 3), is32,
......@@ -443,7 +491,7 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct coproc_params params;
params.CRm = (kvm_vcpu_get_hsr(vcpu) >> 1) & 0xf;
params.CRn = (kvm_vcpu_get_hsr(vcpu) >> 1) & 0xf;
params.Rt1 = (kvm_vcpu_get_hsr(vcpu) >> 5) & 0xf;
params.is_write = ((kvm_vcpu_get_hsr(vcpu) & 1) == 0);
params.is_64bit = true;
......@@ -451,7 +499,7 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
params.Op1 = (kvm_vcpu_get_hsr(vcpu) >> 16) & 0xf;
params.Op2 = 0;
params.Rt2 = (kvm_vcpu_get_hsr(vcpu) >> 10) & 0xf;
params.CRn = 0;
params.CRm = 0;
return emulate_cp15(vcpu, &params);
}
......
......@@ -58,8 +58,8 @@ static inline void print_cp_instr(const struct coproc_params *p)
{
/* Look, we even formatted it for you to paste into the table! */
if (p->is_64bit) {
kvm_pr_unimpl(" { CRm(%2lu), Op1(%2lu), is64, func_%s },\n",
p->CRm, p->Op1, p->is_write ? "write" : "read");
kvm_pr_unimpl(" { CRm64(%2lu), Op1(%2lu), is64, func_%s },\n",
p->CRn, p->Op1, p->is_write ? "write" : "read");
} else {
kvm_pr_unimpl(" { CRn(%2lu), CRm(%2lu), Op1(%2lu), Op2(%2lu), is32,"
" func_%s },\n",
......@@ -135,13 +135,13 @@ static inline int cmp_reg(const struct coproc_reg *i1,
return -1;
if (i1->CRn != i2->CRn)
return i1->CRn - i2->CRn;
if (i1->is_64 != i2->is_64)
return i2->is_64 - i1->is_64;
if (i1->CRm != i2->CRm)
return i1->CRm - i2->CRm;
if (i1->Op1 != i2->Op1)
return i1->Op1 - i2->Op1;
return i1->Op2 - i2->Op2;
if (i1->Op2 != i2->Op2)
return i1->Op2 - i2->Op2;
return i2->is_64 - i1->is_64;
}
......@@ -153,4 +153,8 @@ static inline int cmp_reg(const struct coproc_reg *i1,
#define is64 .is_64 = true
#define is32 .is_64 = false
bool access_sctlr(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r);
#endif /* __ARM_KVM_COPROC_LOCAL_H__ */
......@@ -34,7 +34,7 @@
static const struct coproc_reg a15_regs[] = {
/* SCTLR: swapped by interrupt.S. */
{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
NULL, reset_val, c1_SCTLR, 0x00C50078 },
access_sctlr, reset_val, c1_SCTLR, 0x00C50078 },
};
static struct kvm_coproc_target_table a15_target_table = {
......
......@@ -37,7 +37,7 @@
static const struct coproc_reg a7_regs[] = {
/* SCTLR: swapped by interrupt.S. */
{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
NULL, reset_val, c1_SCTLR, 0x00C50878 },
access_sctlr, reset_val, c1_SCTLR, 0x00C50878 },
};
static struct kvm_coproc_target_table a7_target_table = {
......
......@@ -38,6 +38,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr = HCR_GUEST_MASK;
return 0;
}
......
......@@ -303,13 +303,17 @@ vcpu .req r0 @ vcpu pointer always in r0
mrc p15, 0, r2, c14, c1, 0 @ CNTKCTL
mrrc p15, 0, r4, r5, c7 @ PAR
mrc p15, 0, r6, c10, c3, 0 @ AMAIR0
mrc p15, 0, r7, c10, c3, 1 @ AMAIR1
.if \store_to_vcpu == 0
push {r2,r4-r5}
push {r2,r4-r7}
.else
str r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
add r12, vcpu, #CP15_OFFSET(c7_PAR)
strd r4, r5, [r12]
str r6, [vcpu, #CP15_OFFSET(c10_AMAIR0)]
str r7, [vcpu, #CP15_OFFSET(c10_AMAIR1)]
.endif
.endm
......@@ -322,15 +326,19 @@ vcpu .req r0 @ vcpu pointer always in r0
*/
.macro write_cp15_state read_from_vcpu
.if \read_from_vcpu == 0
pop {r2,r4-r5}
pop {r2,r4-r7}
.else
ldr r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)]
add r12, vcpu, #CP15_OFFSET(c7_PAR)
ldrd r4, r5, [r12]
ldr r6, [vcpu, #CP15_OFFSET(c10_AMAIR0)]
ldr r7, [vcpu, #CP15_OFFSET(c10_AMAIR1)]
.endif
mcr p15, 0, r2, c14, c1, 0 @ CNTKCTL
mcrr p15, 0, r4, r5, c7 @ PAR
mcr p15, 0, r6, c10, c3, 0 @ AMAIR0
mcr p15, 0, r7, c10, c3, 1 @ AMAIR1
.if \read_from_vcpu == 0
pop {r2-r12}
......@@ -597,17 +605,14 @@ vcpu .req r0 @ vcpu pointer always in r0
/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
.macro configure_hyp_role operation
mrc p15, 4, r2, c1, c1, 0 @ HCR
bic r2, r2, #HCR_VIRT_EXCP_MASK
ldr r3, =HCR_GUEST_MASK
.if \operation == vmentry
orr r2, r2, r3
ldr r2, [vcpu, #VCPU_HCR]
ldr r3, [vcpu, #VCPU_IRQ_LINES]
orr r2, r2, r3
.else
bic r2, r2, r3
mov r2, #0
.endif
mcr p15, 4, r2, c1, c1, 0
mcr p15, 4, r2, c1, c1, 0 @ HCR
.endm
.macro load_vcpu
......
......@@ -144,8 +144,9 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
while (addr < end) {
pgd = pgdp + pgd_index(addr);
pud = pud_offset(pgd, addr);
pte = NULL;
if (pud_none(*pud)) {
addr = pud_addr_end(addr, end);
addr = kvm_pud_addr_end(addr, end);
continue;
}
......@@ -155,13 +156,13 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
* move on.
*/
clear_pud_entry(kvm, pud, addr);
addr = pud_addr_end(addr, end);
addr = kvm_pud_addr_end(addr, end);
continue;
}
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd)) {
addr = pmd_addr_end(addr, end);
addr = kvm_pmd_addr_end(addr, end);
continue;
}
......@@ -174,12 +175,12 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
/*
* If the pmd entry is to be cleared, walk back up the ladder
*/
if (kvm_pmd_huge(*pmd) || page_empty(pte)) {
if (kvm_pmd_huge(*pmd) || (pte && page_empty(pte))) {
clear_pmd_entry(kvm, pmd, addr);
next = pmd_addr_end(addr, end);
next = kvm_pmd_addr_end(addr, end);
if (page_empty(pmd) && !page_empty(pud)) {
clear_pud_entry(kvm, pud, addr);
next = pud_addr_end(addr, end);
next = kvm_pud_addr_end(addr, end);
}
}
......@@ -187,6 +188,99 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
}
}
static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
phys_addr_t addr, phys_addr_t end)
{
pte_t *pte;
pte = pte_offset_kernel(pmd, addr);
do {
if (!pte_none(*pte)) {
hva_t hva = gfn_to_hva(kvm, addr >> PAGE_SHIFT);
kvm_flush_dcache_to_poc((void*)hva, PAGE_SIZE);
}
} while (pte++, addr += PAGE_SIZE, addr != end);
}
static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
phys_addr_t addr, phys_addr_t end)
{
pmd_t *pmd;
phys_addr_t next;
pmd = pmd_offset(pud, addr);
do {
next = kvm_pmd_addr_end(addr, end);
if (!pmd_none(*pmd)) {
if (kvm_pmd_huge(*pmd)) {
hva_t hva = gfn_to_hva(kvm, addr >> PAGE_SHIFT);
kvm_flush_dcache_to_poc((void*)hva, PMD_SIZE);
} else {
stage2_flush_ptes(kvm, pmd, addr, next);
}
}
} while (pmd++, addr = next, addr != end);
}
static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
phys_addr_t addr, phys_addr_t end)
{
pud_t *pud;
phys_addr_t next;
pud = pud_offset(pgd, addr);
do {
next = kvm_pud_addr_end(addr, end);
if (!pud_none(*pud)) {
if (pud_huge(*pud)) {
hva_t hva = gfn_to_hva(kvm, addr >> PAGE_SHIFT);
kvm_flush_dcache_to_poc((void*)hva, PUD_SIZE);
} else {
stage2_flush_pmds(kvm, pud, addr, next);
}
}
} while (pud++, addr = next, addr != end);
}
static void stage2_flush_memslot(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
phys_addr_t next;
pgd_t *pgd;
pgd = kvm->arch.pgd + pgd_index(addr);
do {
next = kvm_pgd_addr_end(addr, end);
stage2_flush_puds(kvm, pgd, addr, next);
} while (pgd++, addr = next, addr != end);
}
/**
* stage2_flush_vm - Invalidate cache for pages mapped in stage 2
* @kvm: The struct kvm pointer
*
* Go through the stage 2 page tables and invalidate any cache lines
* backing memory already mapped to the VM.
*/
void stage2_flush_vm(struct kvm *kvm)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int idx;
idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots)
stage2_flush_memslot(kvm, memslot);
spin_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
}
/**
* free_boot_hyp_pgd - free HYP boot page tables
*
......@@ -715,7 +809,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_set_s2pmd_writable(&new_pmd);
kvm_set_pfn_dirty(pfn);
}
coherent_icache_guest_page(kvm, hva & PMD_MASK, PMD_SIZE);
coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE);
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
pte_t new_pte = pfn_pte(pfn, PAGE_S2);
......@@ -723,7 +817,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_set_s2pte_writable(&new_pte);
kvm_set_pfn_dirty(pfn);
}
coherent_icache_guest_page(kvm, hva, PAGE_SIZE);
coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
}
......
......@@ -62,6 +62,7 @@
* RW: 64bit by default, can be overriden for 32bit VMs
* TAC: Trap ACTLR
* TSC: Trap SMC
* TVM: Trap VM ops (until M+C set in SCTLR_EL1)
* TSW: Trap cache operations by set/way
* TWE: Trap WFE
* TWI: Trap WFI
......@@ -74,7 +75,7 @@
* SWIO: Turn set/way invalidates into set/way clean+invalidate
*/
#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
HCR_BSU_IS | HCR_FB | HCR_TAC | \
HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
HCR_AMO | HCR_IMO | HCR_FMO | \
HCR_SWIO | HCR_TIDCP | HCR_RW)
#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
......
......@@ -79,7 +79,8 @@
#define c13_TID_URW (TPIDR_EL0 * 2) /* Thread ID, User R/W */
#define c13_TID_URO (TPIDRRO_EL0 * 2)/* Thread ID, User R/O */
#define c13_TID_PRIV (TPIDR_EL1 * 2) /* Thread ID, Privileged */
#define c10_AMAIR (AMAIR_EL1 * 2) /* Aux Memory Attr Indirection Reg */
#define c10_AMAIR0 (AMAIR_EL1 * 2) /* Aux Memory Attr Indirection Reg */
#define c10_AMAIR1 (c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
#define c14_CNTKCTL (CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
#define NR_CP15_REGS (NR_SYS_REGS * 2)
......
......@@ -106,7 +106,6 @@ static inline bool kvm_is_write_fault(unsigned long esr)
return true;
}
static inline void kvm_clean_dcache_area(void *addr, size_t size) {}
static inline void kvm_clean_pgd(pgd_t *pgd) {}
static inline void kvm_clean_pmd_entry(pmd_t *pmd) {}
static inline void kvm_clean_pte(pte_t *pte) {}
......@@ -122,11 +121,25 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
pmd_val(*pmd) |= PMD_S2_RDWR;
}
#define kvm_pgd_addr_end(addr, end) pgd_addr_end(addr, end)
#define kvm_pud_addr_end(addr, end) pud_addr_end(addr, end)
#define kvm_pmd_addr_end(addr, end) pmd_addr_end(addr, end)
struct kvm;
static inline void coherent_icache_guest_page(struct kvm *kvm, hva_t hva,
unsigned long size)
#define kvm_flush_dcache_to_poc(a,l) __flush_dcache_area((a), (l))
static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
{
return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
}
static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
unsigned long size)
{
if (!vcpu_has_cache_enabled(vcpu))
kvm_flush_dcache_to_poc((void *)hva, size);
if (!icache_is_aliasing()) { /* PIPT */
flush_icache_range(hva, hva + size);
} else if (!icache_is_aivivt()) { /* non ASID-tagged VIVT */
......@@ -135,8 +148,9 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, hva_t hva,
}
}
#define kvm_flush_dcache_to_poc(a,l) __flush_dcache_area((a), (l))
#define kvm_virt_to_phys(x) __virt_to_phys((unsigned long)(x))
void stage2_flush_vm(struct kvm *kvm);
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
......@@ -27,6 +27,7 @@
#include <asm/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_coproc.h>
#include <asm/kvm_mmu.h>
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <trace/events/kvm.h>
......@@ -120,6 +121,48 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
return true;
}
/*
* Generic accessor for VM registers. Only called as long as HCR_TVM
* is set.
*/
static bool access_vm_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
unsigned long val;
BUG_ON(!p->is_write);
val = *vcpu_reg(vcpu, p->Rt);
if (!p->is_aarch32) {
vcpu_sys_reg(vcpu, r->reg) = val;
} else {
vcpu_cp15(vcpu, r->reg) = val & 0xffffffffUL;
if (!p->is_32bit)
vcpu_cp15(vcpu, r->reg + 1) = val >> 32;
}
return true;
}
/*
* SCTLR_EL1 accessor. Only called as long as HCR_TVM is set. If the
* guest enables the MMU, we stop trapping the VM sys_regs and leave
* it in complete control of the caches.
*/
static bool access_sctlr(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
access_vm_reg(vcpu, p, r);
if (vcpu_has_cache_enabled(vcpu)) { /* MMU+Caches enabled? */
vcpu->arch.hcr_el2 &= ~HCR_TVM;
stage2_flush_vm(vcpu->kvm);
}
return true;
}
/*
* We could trap ID_DFR0 and tell the guest we don't support performance
* monitoring. Unfortunately the patch to make the kernel check ID_DFR0 was
......@@ -185,32 +228,32 @@ static const struct sys_reg_desc sys_reg_descs[] = {
NULL, reset_mpidr, MPIDR_EL1 },
/* SCTLR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b000),
NULL, reset_val, SCTLR_EL1, 0x00C50078 },
access_sctlr, reset_val, SCTLR_EL1, 0x00C50078 },
/* CPACR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b010),
NULL, reset_val, CPACR_EL1, 0 },
/* TTBR0_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b0000), Op2(0b000),
NULL, reset_unknown, TTBR0_EL1 },
access_vm_reg, reset_unknown, TTBR0_EL1 },
/* TTBR1_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b0000), Op2(0b001),
NULL, reset_unknown, TTBR1_EL1 },
access_vm_reg, reset_unknown, TTBR1_EL1 },
/* TCR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b0000), Op2(0b010),
NULL, reset_val, TCR_EL1, 0 },
access_vm_reg, reset_val, TCR_EL1, 0 },
/* AFSR0_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0001), Op2(0b000),
NULL, reset_unknown, AFSR0_EL1 },
access_vm_reg, reset_unknown, AFSR0_EL1 },
/* AFSR1_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0001), Op2(0b001),
NULL, reset_unknown, AFSR1_EL1 },
access_vm_reg, reset_unknown, AFSR1_EL1 },
/* ESR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000),
NULL, reset_unknown, ESR_EL1 },
access_vm_reg, reset_unknown, ESR_EL1 },
/* FAR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b0000), Op2(0b000),
NULL, reset_unknown, FAR_EL1 },
access_vm_reg, reset_unknown, FAR_EL1 },
/* PAR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000),
NULL, reset_unknown, PAR_EL1 },
......@@ -224,17 +267,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
NULL, reset_unknown, MAIR_EL1 },
access_vm_reg, reset_unknown, MAIR_EL1 },
/* AMAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0011), Op2(0b000),
NULL, reset_amair_el1, AMAIR_EL1 },
access_vm_reg, reset_amair_el1, AMAIR_EL1 },
/* VBAR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b0000), Op2(0b000),
NULL, reset_val, VBAR_EL1, 0 },
/* CONTEXTIDR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1101), CRm(0b0000), Op2(0b001),
NULL, reset_val, CONTEXTIDR_EL1, 0 },
access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
/* TPIDR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1101), CRm(0b0000), Op2(0b100),
NULL, reset_unknown, TPIDR_EL1 },
......@@ -305,14 +348,32 @@ static const struct sys_reg_desc sys_reg_descs[] = {
NULL, reset_val, FPEXC32_EL2, 0x70 },
};
/* Trapped cp15 registers */
/*
* Trapped cp15 registers. TTBR0/TTBR1 get a double encoding,
* depending on the way they are accessed (as a 32bit or a 64bit
* register).
*/
static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_sctlr, NULL, c1_SCTLR },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 2), access_vm_reg, NULL, c2_TTBCR },
{ Op1( 0), CRn( 3), CRm( 0), Op2( 0), access_vm_reg, NULL, c3_DACR },
{ Op1( 0), CRn( 5), CRm( 0), Op2( 0), access_vm_reg, NULL, c5_DFSR },
{ Op1( 0), CRn( 5), CRm( 0), Op2( 1), access_vm_reg, NULL, c5_IFSR },
{ Op1( 0), CRn( 5), CRm( 1), Op2( 0), access_vm_reg, NULL, c5_ADFSR },
{ Op1( 0), CRn( 5), CRm( 1), Op2( 1), access_vm_reg, NULL, c5_AIFSR },
{ Op1( 0), CRn( 6), CRm( 0), Op2( 0), access_vm_reg, NULL, c6_DFAR },
{ Op1( 0), CRn( 6), CRm( 0), Op2( 2), access_vm_reg, NULL, c6_IFAR },
/*
* DC{C,I,CI}SW operations:
*/
{ Op1( 0), CRn( 7), CRm( 6), Op2( 2), access_dcsw },
{ Op1( 0), CRn( 7), CRm(10), Op2( 2), access_dcsw },
{ Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw },
{ Op1( 0), CRn( 9), CRm(12), Op2( 0), pm_fake },
{ Op1( 0), CRn( 9), CRm(12), Op2( 1), pm_fake },
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), pm_fake },
......@@ -326,6 +387,14 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(14), Op2( 0), pm_fake },
{ Op1( 0), CRn( 9), CRm(14), Op2( 1), pm_fake },
{ Op1( 0), CRn( 9), CRm(14), Op2( 2), pm_fake },
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
{ Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, c10_AMAIR0 },
{ Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, c10_AMAIR1 },
{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
{ Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, c2_TTBR1 },
};
/* Target specific emulation tables */
......@@ -437,6 +506,8 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
u32 hsr = kvm_vcpu_get_hsr(vcpu);
int Rt2 = (hsr >> 10) & 0xf;
params.is_aarch32 = true;
params.is_32bit = false;
params.CRm = (hsr >> 1) & 0xf;
params.Rt = (hsr >> 5) & 0xf;
params.is_write = ((hsr & 1) == 0);
......@@ -480,6 +551,8 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
struct sys_reg_params params;
u32 hsr = kvm_vcpu_get_hsr(vcpu);
params.is_aarch32 = true;
params.is_32bit = true;
params.CRm = (hsr >> 1) & 0xf;
params.Rt = (hsr >> 5) & 0xf;
params.is_write = ((hsr & 1) == 0);
......@@ -549,6 +622,8 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
struct sys_reg_params params;
unsigned long esr = kvm_vcpu_get_hsr(vcpu);
params.is_aarch32 = false;
params.is_32bit = false;
params.Op0 = (esr >> 20) & 3;
params.Op1 = (esr >> 14) & 0x7;
params.CRn = (esr >> 10) & 0xf;
......
......@@ -30,6 +30,8 @@ struct sys_reg_params {
u8 Op2;
u8 Rt;
bool is_write;
bool is_aarch32;
bool is_32bit; /* Only valid if is_aarch32 is true */
};
struct sys_reg_desc {
......
......@@ -199,6 +199,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_IRQCHIP:
case KVM_CAP_MP_STATE:
case KVM_CAP_IRQ_INJECT_STATUS:
case KVM_CAP_IOAPIC_POLARITY_IGNORED:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
......
此差异已折叠。
......@@ -436,13 +436,6 @@ kvm_mips_emulate_CP0(uint32_t inst, uint32_t *opc, uint32_t cause,
sel = inst & 0x7;
co_bit = (inst >> 25) & 1;
/* Verify that the register is valid */
if (rd > MIPS_CP0_DESAVE) {
printk("Invalid rd: %d\n", rd);
er = EMULATE_FAIL;
goto done;
}
if (co_bit) {
op = (inst) & 0xff;
......@@ -1542,8 +1535,15 @@ kvm_mips_handle_ri(unsigned long cause, uint32_t *opc,
}
if ((inst & OPCODE) == SPEC3 && (inst & FUNC) == RDHWR) {
int usermode = !KVM_GUEST_KERNEL_MODE(vcpu);
int rd = (inst & RD) >> 11;
int rt = (inst & RT) >> 16;
/* If usermode, check RDHWR rd is allowed by guest HWREna */
if (usermode && !(kvm_read_c0_guest_hwrena(cop0) & BIT(rd))) {
kvm_debug("RDHWR %#x disallowed by HWREna @ %p\n",
rd, opc);
goto emulate_ri;
}
switch (rd) {
case 0: /* CPU number */
arch->gprs[rt] = 0;
......@@ -1567,31 +1567,27 @@ kvm_mips_handle_ri(unsigned long cause, uint32_t *opc,
}
break;
case 29:
#if 1
arch->gprs[rt] = kvm_read_c0_guest_userlocal(cop0);
#else
/* UserLocal not implemented */
er = kvm_mips_emulate_ri_exc(cause, opc, run, vcpu);
#endif
break;
default:
printk("RDHWR not supported\n");
er = EMULATE_FAIL;
break;
kvm_debug("RDHWR %#x not supported @ %p\n", rd, opc);
goto emulate_ri;
}
} else {
printk("Emulate RI not supported @ %p: %#x\n", opc, inst);
er = EMULATE_FAIL;
kvm_debug("Emulate RI not supported @ %p: %#x\n", opc, inst);
goto emulate_ri;
}
return EMULATE_DONE;
emulate_ri:
/*
* Rollback PC only if emulation was unsuccessful
* Rollback PC (if in branch delay slot then the PC already points to
* branch target), and pass the RI exception to the guest OS.
*/
if (er == EMULATE_FAIL) {
vcpu->arch.pc = curr_pc;
}
return er;
vcpu->arch.pc = curr_pc;
return kvm_mips_emulate_ri_exc(cause, opc, run, vcpu);
}
enum emulation_result
......
......@@ -304,6 +304,11 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
return vcpu->arch.fault_dar;
}
static inline bool is_kvmppc_resume_guest(int r)
{
return (r == RESUME_GUEST || r == RESUME_GUEST_NV);
}
/* Magic register values loaded into r3 and r4 before the 'sc' assembly
* instruction for the OSI hypercalls */
#define OSI_SC_MAGIC_R3 0x113724FA
......
......@@ -289,6 +289,18 @@ static inline void note_hpte_modification(struct kvm *kvm,
if (atomic_read(&kvm->arch.hpte_mod_interest))
rev->guest_rpte |= HPTE_GR_MODIFIED;
}
/*
* Like kvm_memslots(), but for use in real mode when we can't do
* any RCU stuff (since the secondary threads are offline from the
* kernel's point of view), and we can't print anything.
* Thus we use rcu_dereference_raw() rather than rcu_dereference_check().
*/
static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
{
return rcu_dereference_raw_notrace(kvm->memslots);
}
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
#endif /* __ASM_KVM_BOOK3S_64_H__ */
......@@ -94,7 +94,7 @@ struct kvmppc_host_state {
unsigned long xics_phys;
u32 saved_xirr;
u64 dabr;
u64 host_mmcr[3];
u64 host_mmcr[7]; /* MMCR 0,1,A, SIAR, SDAR, MMCR2, SIER */
u32 host_pmc[8];
u64 host_purr;
u64 host_spurr;
......
......@@ -129,6 +129,8 @@ extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba, unsigned long tce);
extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba);
extern struct kvm_rma_info *kvm_alloc_rma(void);
extern void kvm_release_rma(struct kvm_rma_info *ri);
extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
......
......@@ -213,6 +213,7 @@
#define SPRN_ACOP 0x1F /* Available Coprocessor Register */
#define SPRN_TFIAR 0x81 /* Transaction Failure Inst Addr */
#define SPRN_TEXASR 0x82 /* Transaction EXception & Summary */
#define TEXASR_FS __MASK(63-36) /* Transaction Failure Summary */
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
#define SPRN_CTRLF 0x088
......
......@@ -7,6 +7,8 @@
#include <uapi/asm/tm.h>
#ifndef __ASSEMBLY__
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
extern void do_load_up_transact_fpu(struct thread_struct *thread);
extern void do_load_up_transact_altivec(struct thread_struct *thread);
......@@ -21,3 +23,5 @@ extern void tm_recheckpoint(struct thread_struct *thread,
extern void tm_abort(uint8_t cause);
extern void tm_save_sprs(struct thread_struct *thread);
extern void tm_restore_sprs(struct thread_struct *thread);
#endif /* __ASSEMBLY__ */
......@@ -262,7 +262,14 @@ int kvmppc_mmu_hv_init(void)
static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
{
kvmppc_set_msr(vcpu, vcpu->arch.intr_msr);
unsigned long msr = vcpu->arch.intr_msr;
/* If transactional, change to suspend mode on IRQ delivery */
if (MSR_TM_TRANSACTIONAL(vcpu->arch.shregs.msr))
msr |= MSR_TS_S;
else
msr |= vcpu->arch.shregs.msr & MSR_TS_MASK;
kvmppc_set_msr(vcpu, msr);
}
/*
......
......@@ -75,3 +75,31 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
return H_TOO_HARD;
}
EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba)
{
struct kvm *kvm = vcpu->kvm;
struct kvmppc_spapr_tce_table *stt;
list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
if (stt->liobn == liobn) {
unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
struct page *page;
u64 *tbl;
if (ioba >= stt->window_size)
return H_PARAMETER;
page = stt->pages[idx / TCES_PER_PAGE];
tbl = (u64 *)page_address(page);
vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
return H_SUCCESS;
}
}
/* Didn't find the liobn, punt it to userspace */
return H_TOO_HARD;
}
EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
......@@ -86,7 +86,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
/* CPU points to the first thread of the core */
if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
#ifdef CONFIG_KVM_XICS
#ifdef CONFIG_PPC_ICP_NATIVE
int real_cpu = cpu + vcpu->arch.ptid;
if (paca[real_cpu].kvm_hstate.xics_phys)
xics_wake_cpu(real_cpu);
......@@ -879,17 +879,6 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_IAMR:
*val = get_reg_val(id, vcpu->arch.iamr);
break;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case KVM_REG_PPC_TFHAR:
*val = get_reg_val(id, vcpu->arch.tfhar);
break;
case KVM_REG_PPC_TFIAR:
*val = get_reg_val(id, vcpu->arch.tfiar);
break;
case KVM_REG_PPC_TEXASR:
*val = get_reg_val(id, vcpu->arch.texasr);
break;
#endif
case KVM_REG_PPC_FSCR:
*val = get_reg_val(id, vcpu->arch.fscr);
break;
......@@ -970,6 +959,69 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_PPR:
*val = get_reg_val(id, vcpu->arch.ppr);
break;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case KVM_REG_PPC_TFHAR:
*val = get_reg_val(id, vcpu->arch.tfhar);
break;
case KVM_REG_PPC_TFIAR:
*val = get_reg_val(id, vcpu->arch.tfiar);
break;
case KVM_REG_PPC_TEXASR:
*val = get_reg_val(id, vcpu->arch.texasr);
break;
case KVM_REG_PPC_TM_GPR0 ... KVM_REG_PPC_TM_GPR31:
i = id - KVM_REG_PPC_TM_GPR0;
*val = get_reg_val(id, vcpu->arch.gpr_tm[i]);
break;
case KVM_REG_PPC_TM_VSR0 ... KVM_REG_PPC_TM_VSR63:
{
int j;
i = id - KVM_REG_PPC_TM_VSR0;
if (i < 32)
for (j = 0; j < TS_FPRWIDTH; j++)
val->vsxval[j] = vcpu->arch.fp_tm.fpr[i][j];
else {
if (cpu_has_feature(CPU_FTR_ALTIVEC))
val->vval = vcpu->arch.vr_tm.vr[i-32];
else
r = -ENXIO;
}
break;
}
case KVM_REG_PPC_TM_CR:
*val = get_reg_val(id, vcpu->arch.cr_tm);
break;
case KVM_REG_PPC_TM_LR:
*val = get_reg_val(id, vcpu->arch.lr_tm);
break;
case KVM_REG_PPC_TM_CTR:
*val = get_reg_val(id, vcpu->arch.ctr_tm);
break;
case KVM_REG_PPC_TM_FPSCR:
*val = get_reg_val(id, vcpu->arch.fp_tm.fpscr);
break;
case KVM_REG_PPC_TM_AMR:
*val = get_reg_val(id, vcpu->arch.amr_tm);
break;
case KVM_REG_PPC_TM_PPR:
*val = get_reg_val(id, vcpu->arch.ppr_tm);
break;
case KVM_REG_PPC_TM_VRSAVE:
*val = get_reg_val(id, vcpu->arch.vrsave_tm);
break;
case KVM_REG_PPC_TM_VSCR:
if (cpu_has_feature(CPU_FTR_ALTIVEC))
*val = get_reg_val(id, vcpu->arch.vr_tm.vscr.u[3]);
else
r = -ENXIO;
break;
case KVM_REG_PPC_TM_DSCR:
*val = get_reg_val(id, vcpu->arch.dscr_tm);
break;
case KVM_REG_PPC_TM_TAR:
*val = get_reg_val(id, vcpu->arch.tar_tm);
break;
#endif
case KVM_REG_PPC_ARCH_COMPAT:
*val = get_reg_val(id, vcpu->arch.vcore->arch_compat);
break;
......@@ -1039,17 +1091,6 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_IAMR:
vcpu->arch.iamr = set_reg_val(id, *val);
break;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case KVM_REG_PPC_TFHAR:
vcpu->arch.tfhar = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TFIAR:
vcpu->arch.tfiar = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TEXASR:
vcpu->arch.texasr = set_reg_val(id, *val);
break;
#endif
case KVM_REG_PPC_FSCR:
vcpu->arch.fscr = set_reg_val(id, *val);
break;
......@@ -1144,6 +1185,68 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_PPR:
vcpu->arch.ppr = set_reg_val(id, *val);
break;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case KVM_REG_PPC_TFHAR:
vcpu->arch.tfhar = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TFIAR:
vcpu->arch.tfiar = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TEXASR:
vcpu->arch.texasr = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_GPR0 ... KVM_REG_PPC_TM_GPR31:
i = id - KVM_REG_PPC_TM_GPR0;
vcpu->arch.gpr_tm[i] = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_VSR0 ... KVM_REG_PPC_TM_VSR63:
{
int j;
i = id - KVM_REG_PPC_TM_VSR0;
if (i < 32)
for (j = 0; j < TS_FPRWIDTH; j++)
vcpu->arch.fp_tm.fpr[i][j] = val->vsxval[j];
else
if (cpu_has_feature(CPU_FTR_ALTIVEC))
vcpu->arch.vr_tm.vr[i-32] = val->vval;
else
r = -ENXIO;
break;
}
case KVM_REG_PPC_TM_CR:
vcpu->arch.cr_tm = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_LR:
vcpu->arch.lr_tm = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_CTR:
vcpu->arch.ctr_tm = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_FPSCR:
vcpu->arch.fp_tm.fpscr = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_AMR:
vcpu->arch.amr_tm = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_PPR:
vcpu->arch.ppr_tm = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_VRSAVE:
vcpu->arch.vrsave_tm = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_VSCR:
if (cpu_has_feature(CPU_FTR_ALTIVEC))
vcpu->arch.vr.vscr.u[3] = set_reg_val(id, *val);
else
r = - ENXIO;
break;
case KVM_REG_PPC_TM_DSCR:
vcpu->arch.dscr_tm = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TM_TAR:
vcpu->arch.tar_tm = set_reg_val(id, *val);
break;
#endif
case KVM_REG_PPC_ARCH_COMPAT:
r = kvmppc_set_arch_compat(vcpu, set_reg_val(id, *val));
break;
......@@ -1360,9 +1463,7 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
smp_wmb();
#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
if (cpu != smp_processor_id()) {
#ifdef CONFIG_KVM_XICS
xics_wake_cpu(cpu);
#endif
if (vcpu->arch.ptid)
++vc->n_woken;
}
......@@ -1530,7 +1631,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
vcpu->arch.trap = 0;
if (vcpu->arch.ceded) {
if (ret != RESUME_GUEST)
if (!is_kvmppc_resume_guest(ret))
kvmppc_end_cede(vcpu);
else
kvmppc_set_timer(vcpu);
......@@ -1541,7 +1642,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
vc->vcore_state = VCORE_INACTIVE;
list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
arch.run_list) {
if (vcpu->arch.ret != RESUME_GUEST) {
if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
kvmppc_remove_runnable(vc, vcpu);
wake_up(&vcpu->arch.cpu_run);
}
......@@ -1731,7 +1832,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
}
} while (r == RESUME_GUEST);
} while (is_kvmppc_resume_guest(r));
out:
vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
......@@ -2366,7 +2467,7 @@ static int kvmppc_book3s_init_hv(void)
*/
r = kvmppc_core_check_processor_compat_hv();
if (r < 0)
return r;
return -ENODEV;
kvm_ops_hv.owner = THIS_MODULE;
kvmppc_hv_ops = &kvm_ops_hv;
......
......@@ -71,6 +71,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtmsrd r10,1
/* Save host PMU registers */
BEGIN_FTR_SECTION
/* Work around P8 PMAE bug */
li r3, -1
clrrdi r3, r3, 10
mfspr r8, SPRN_MMCR2
mtspr SPRN_MMCR2, r3 /* freeze all counters using MMCR2 */
isync
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
li r3, 1
sldi r3, r3, 31 /* MMCR0_FC (freeze counters) bit */
mfspr r7, SPRN_MMCR0 /* save MMCR0 */
......@@ -87,9 +95,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
cmpwi r5, 0
beq 31f /* skip if not */
mfspr r5, SPRN_MMCR1
mfspr r9, SPRN_SIAR
mfspr r10, SPRN_SDAR
std r7, HSTATE_MMCR(r13)
std r5, HSTATE_MMCR + 8(r13)
std r6, HSTATE_MMCR + 16(r13)
std r9, HSTATE_MMCR + 24(r13)
std r10, HSTATE_MMCR + 32(r13)
BEGIN_FTR_SECTION
mfspr r9, SPRN_SIER
std r8, HSTATE_MMCR + 40(r13)
std r9, HSTATE_MMCR + 48(r13)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mfspr r3, SPRN_PMC1
mfspr r5, SPRN_PMC2
mfspr r6, SPRN_PMC3
......@@ -110,6 +127,11 @@ BEGIN_FTR_SECTION
stw r10, HSTATE_PMC + 24(r13)
stw r11, HSTATE_PMC + 28(r13)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
BEGIN_FTR_SECTION
mfspr r9, SPRN_SIER
std r8, HSTATE_MMCR + 40(r13)
std r9, HSTATE_MMCR + 48(r13)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
31:
/*
......
......@@ -111,7 +111,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
rcbits = hpte_r & (HPTE_R_R | HPTE_R_C);
ptel = rev->guest_rpte |= rcbits;
gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
if (!memslot)
return;
......@@ -192,7 +192,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
/* Find the memslot (if any) for this address */
gpa = (ptel & HPTE_R_RPN) & ~(psize - 1);
gfn = gpa >> PAGE_SHIFT;
memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
pa = 0;
is_io = ~0ul;
rmap = NULL;
......@@ -670,7 +670,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
psize = hpte_page_size(v, r);
gfn = ((r & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
if (memslot) {
hva = __gfn_to_hva_memslot(memslot, gfn);
pte = lookup_linux_pte_and_update(pgdir, hva,
......
......@@ -28,6 +28,9 @@
#include <asm/exception-64s.h>
#include <asm/kvm_book3s_asm.h>
#include <asm/mmu-hash64.h>
#include <asm/tm.h>
#define VCPU_GPRS_TM(reg) (((reg) * ULONG_SIZE) + VCPU_GPR_TM)
#ifdef __LITTLE_ENDIAN__
#error Need to fix lppaca and SLB shadow accesses in little endian mode
......@@ -106,8 +109,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
ld r3, HSTATE_MMCR(r13)
ld r4, HSTATE_MMCR + 8(r13)
ld r5, HSTATE_MMCR + 16(r13)
ld r6, HSTATE_MMCR + 24(r13)
ld r7, HSTATE_MMCR + 32(r13)
mtspr SPRN_MMCR1, r4
mtspr SPRN_MMCRA, r5
mtspr SPRN_SIAR, r6
mtspr SPRN_SDAR, r7
BEGIN_FTR_SECTION
ld r8, HSTATE_MMCR + 40(r13)
ld r9, HSTATE_MMCR + 48(r13)
mtspr SPRN_MMCR2, r8
mtspr SPRN_SIER, r9
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr SPRN_MMCR0, r3
isync
23:
......@@ -597,6 +610,116 @@ BEGIN_FTR_SECTION
END_FTR_SECTION_NESTED(CPU_FTR_ARCH_206, CPU_FTR_ARCH_206, 89)
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
BEGIN_FTR_SECTION
b skip_tm
END_FTR_SECTION_IFCLR(CPU_FTR_TM)
/* Turn on TM/FP/VSX/VMX so we can restore them. */
mfmsr r5
li r6, MSR_TM >> 32
sldi r6, r6, 32
or r5, r5, r6
ori r5, r5, MSR_FP
oris r5, r5, (MSR_VEC | MSR_VSX)@h
mtmsrd r5
/*
* The user may change these outside of a transaction, so they must
* always be context switched.
*/
ld r5, VCPU_TFHAR(r4)
ld r6, VCPU_TFIAR(r4)
ld r7, VCPU_TEXASR(r4)
mtspr SPRN_TFHAR, r5
mtspr SPRN_TFIAR, r6
mtspr SPRN_TEXASR, r7
ld r5, VCPU_MSR(r4)
rldicl. r5, r5, 64 - MSR_TS_S_LG, 62
beq skip_tm /* TM not active in guest */
/* Make sure the failure summary is set, otherwise we'll program check
* when we trechkpt. It's possible that this might have been not set
* on a kvmppc_set_one_reg() call but we shouldn't let this crash the
* host.
*/
oris r7, r7, (TEXASR_FS)@h
mtspr SPRN_TEXASR, r7
/*
* We need to load up the checkpointed state for the guest.
* We need to do this early as it will blow away any GPRs, VSRs and
* some SPRs.
*/
mr r31, r4
addi r3, r31, VCPU_FPRS_TM
bl .load_fp_state
addi r3, r31, VCPU_VRS_TM
bl .load_vr_state
mr r4, r31
lwz r7, VCPU_VRSAVE_TM(r4)
mtspr SPRN_VRSAVE, r7
ld r5, VCPU_LR_TM(r4)
lwz r6, VCPU_CR_TM(r4)
ld r7, VCPU_CTR_TM(r4)
ld r8, VCPU_AMR_TM(r4)
ld r9, VCPU_TAR_TM(r4)
mtlr r5
mtcr r6
mtctr r7
mtspr SPRN_AMR, r8
mtspr SPRN_TAR, r9
/*
* Load up PPR and DSCR values but don't put them in the actual SPRs
* till the last moment to avoid running with userspace PPR and DSCR for
* too long.
*/
ld r29, VCPU_DSCR_TM(r4)
ld r30, VCPU_PPR_TM(r4)
std r2, PACATMSCRATCH(r13) /* Save TOC */
/* Clear the MSR RI since r1, r13 are all going to be foobar. */
li r5, 0
mtmsrd r5, 1
/* Load GPRs r0-r28 */
reg = 0
.rept 29
ld reg, VCPU_GPRS_TM(reg)(r31)
reg = reg + 1
.endr
mtspr SPRN_DSCR, r29
mtspr SPRN_PPR, r30
/* Load final GPRs */
ld 29, VCPU_GPRS_TM(29)(r31)
ld 30, VCPU_GPRS_TM(30)(r31)
ld 31, VCPU_GPRS_TM(31)(r31)
/* TM checkpointed state is now setup. All GPRs are now volatile. */
TRECHKPT
/* Now let's get back the state we need. */
HMT_MEDIUM
GET_PACA(r13)
ld r29, HSTATE_DSCR(r13)
mtspr SPRN_DSCR, r29
ld r4, HSTATE_KVM_VCPU(r13)
ld r1, HSTATE_HOST_R1(r13)
ld r2, PACATMSCRATCH(r13)
/* Set the MSR RI since we have our registers back. */
li r5, MSR_RI
mtmsrd r5, 1
skip_tm:
#endif
/* Load guest PMU registers */
/* R4 is live here (vcpu pointer) */
li r3, 1
......@@ -704,14 +827,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
ld r6, VCPU_VTB(r4)
mtspr SPRN_IC, r5
mtspr SPRN_VTB, r6
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
ld r5, VCPU_TFHAR(r4)
ld r6, VCPU_TFIAR(r4)
ld r7, VCPU_TEXASR(r4)
mtspr SPRN_TFHAR, r5
mtspr SPRN_TFIAR, r6
mtspr SPRN_TEXASR, r7
#endif
ld r8, VCPU_EBBHR(r4)
mtspr SPRN_EBBHR, r8
ld r5, VCPU_EBBRR(r4)
......@@ -736,6 +851,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
* Set the decrementer to the guest decrementer.
*/
ld r8,VCPU_DEC_EXPIRES(r4)
/* r8 is a host timebase value here, convert to guest TB */
ld r5,HSTATE_KVM_VCORE(r13)
ld r6,VCORE_TB_OFFSET(r5)
add r8,r8,r6
mftb r7
subf r3,r7,r8
mtspr SPRN_DEC,r3
......@@ -817,7 +936,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
12: mtspr SPRN_SRR0, r10
mr r10,r0
mtspr SPRN_SRR1, r11
ld r11, VCPU_INTR_MSR(r4)
mr r9, r4
bl kvmppc_msr_interrupt
5:
/*
......@@ -1098,17 +1218,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_201)
mftb r6
extsw r5,r5
add r5,r5,r6
/* r5 is a guest timebase value here, convert to host TB */
ld r3,HSTATE_KVM_VCORE(r13)
ld r4,VCORE_TB_OFFSET(r3)
subf r5,r4,r5
std r5,VCPU_DEC_EXPIRES(r9)
BEGIN_FTR_SECTION
b 8f
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
/* Turn on TM so we can access TFHAR/TFIAR/TEXASR */
mfmsr r8
li r0, 1
rldimi r8, r0, MSR_TM_LG, 63-MSR_TM_LG
mtmsrd r8
/* Save POWER8-specific registers */
mfspr r5, SPRN_IAMR
mfspr r6, SPRN_PSPB
......@@ -1122,14 +1240,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
std r5, VCPU_IC(r9)
std r6, VCPU_VTB(r9)
std r7, VCPU_TAR(r9)
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
mfspr r5, SPRN_TFHAR
mfspr r6, SPRN_TFIAR
mfspr r7, SPRN_TEXASR
std r5, VCPU_TFHAR(r9)
std r6, VCPU_TFIAR(r9)
std r7, VCPU_TEXASR(r9)
#endif
mfspr r8, SPRN_EBBHR
std r8, VCPU_EBBHR(r9)
mfspr r5, SPRN_EBBRR
......@@ -1387,7 +1497,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
ld r8,VCORE_TB_OFFSET(r5)
cmpdi r8,0
beq 17f
mftb r6 /* current host timebase */
mftb r6 /* current guest timebase */
subf r8,r8,r6
mtspr SPRN_TBU40,r8 /* update upper 40 bits */
mftb r7 /* check if lower 24 bits overflowed */
......@@ -1557,7 +1667,7 @@ kvmppc_hdsi:
mtspr SPRN_SRR0, r10
mtspr SPRN_SRR1, r11
li r10, BOOK3S_INTERRUPT_DATA_STORAGE
ld r11, VCPU_INTR_MSR(r9)
bl kvmppc_msr_interrupt
fast_interrupt_c_return:
6: ld r7, VCPU_CTR(r9)
lwz r8, VCPU_XER(r9)
......@@ -1626,7 +1736,7 @@ kvmppc_hisi:
1: mtspr SPRN_SRR0, r10
mtspr SPRN_SRR1, r11
li r10, BOOK3S_INTERRUPT_INST_STORAGE
ld r11, VCPU_INTR_MSR(r9)
bl kvmppc_msr_interrupt
b fast_interrupt_c_return
3: ld r6, VCPU_KVM(r9) /* not relocated, use VRMA */
......@@ -1669,7 +1779,7 @@ sc_1_fast_return:
mtspr SPRN_SRR0,r10
mtspr SPRN_SRR1,r11
li r10, BOOK3S_INTERRUPT_SYSCALL
ld r11, VCPU_INTR_MSR(r9)
bl kvmppc_msr_interrupt
mr r4,r9
b fast_guest_return
......@@ -1691,7 +1801,7 @@ hcall_real_table:
.long 0 /* 0x10 - H_CLEAR_MOD */
.long 0 /* 0x14 - H_CLEAR_REF */
.long .kvmppc_h_protect - hcall_real_table
.long 0 /* 0x1c - H_GET_TCE */
.long .kvmppc_h_get_tce - hcall_real_table
.long .kvmppc_h_put_tce - hcall_real_table
.long 0 /* 0x24 - H_SET_SPRG0 */
.long .kvmppc_h_set_dabr - hcall_real_table
......@@ -1997,7 +2107,7 @@ machine_check_realmode:
beq mc_cont
/* If not, deliver a machine check. SRR0/1 are already set */
li r10, BOOK3S_INTERRUPT_MACHINE_CHECK
ld r11, VCPU_INTR_MSR(r9)
bl kvmppc_msr_interrupt
b fast_interrupt_c_return
/*
......@@ -2138,8 +2248,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
mfspr r6,SPRN_VRSAVE
stw r6,VCPU_VRSAVE(r31)
mtlr r30
mtmsrd r5
isync
blr
/*
......@@ -2186,3 +2294,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
*/
kvmppc_bad_host_intr:
b .
/*
* This mimics the MSR transition on IRQ delivery. The new guest MSR is taken
* from VCPU_INTR_MSR and is modified based on the required TM state changes.
* r11 has the guest MSR value (in/out)
* r9 has a vcpu pointer (in)
* r0 is used as a scratch register
*/
kvmppc_msr_interrupt:
rldicl r0, r11, 64 - MSR_TS_S_LG, 62
cmpwi r0, 2 /* Check if we are in transactional state.. */
ld r11, VCPU_INTR_MSR(r9)
bne 1f
/* ... if transactional, change to suspended */
li r0, 1
1: rldimi r11, r0, MSR_TS_S_LG, 63 - MSR_TS_T_LG
blr
......@@ -213,8 +213,11 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
gpa_t args_phys;
int rc;
/* r4 contains the guest physical address of the RTAS args */
args_phys = kvmppc_get_gpr(vcpu, 4);
/*
* r4 contains the guest physical address of the RTAS args
* Mask off the top 4 bits since this is a guest real address
*/
args_phys = kvmppc_get_gpr(vcpu, 4) & KVM_PAM;
rc = kvm_read_guest(vcpu->kvm, args_phys, &args, sizeof(args));
if (rc)
......
......@@ -53,6 +53,7 @@ enum interruption_class {
IRQIO_PCI,
IRQIO_MSI,
IRQIO_VIR,
IRQIO_VAI,
NMI_NMI,
CPU_RST,
NR_ARCH_IRQS
......
......@@ -16,12 +16,22 @@
#include <linux/hrtimer.h>
#include <linux/interrupt.h>
#include <linux/kvm_host.h>
#include <linux/kvm.h>
#include <asm/debug.h>
#include <asm/cpu.h>
#include <asm/isc.h>
#define KVM_MAX_VCPUS 64
#define KVM_USER_MEM_SLOTS 32
/*
* These seem to be used for allocating ->chip in the routing table,
* which we don't use. 4096 is an out-of-thin-air value. If we need
* to look at ->chip later on, we'll need to revisit this.
*/
#define KVM_NR_IRQCHIPS 1
#define KVM_IRQCHIP_NUM_PINS 4096
struct sca_entry {
atomic_t scn;
__u32 reserved;
......@@ -108,7 +118,9 @@ struct kvm_s390_sie_block {
__u32 fac; /* 0x01a0 */
__u8 reserved1a4[20]; /* 0x01a4 */
__u64 cbrlo; /* 0x01b8 */
__u8 reserved1c0[40]; /* 0x01c0 */
__u8 reserved1c0[30]; /* 0x01c0 */
__u64 pp; /* 0x01de */
__u8 reserved1e6[2]; /* 0x01e6 */
__u64 itdba; /* 0x01e8 */
__u8 reserved1f0[16]; /* 0x01f0 */
} __attribute__((packed));
......@@ -171,18 +183,6 @@ struct kvm_vcpu_stat {
u32 diagnose_9c;
};
struct kvm_s390_io_info {
__u16 subchannel_id; /* 0x0b8 */
__u16 subchannel_nr; /* 0x0ba */
__u32 io_int_parm; /* 0x0bc */
__u32 io_int_word; /* 0x0c0 */
};
struct kvm_s390_ext_info {
__u32 ext_params;
__u64 ext_params2;
};
#define PGM_OPERATION 0x01
#define PGM_PRIVILEGED_OP 0x02
#define PGM_EXECUTE 0x03
......@@ -191,27 +191,6 @@ struct kvm_s390_ext_info {
#define PGM_SPECIFICATION 0x06
#define PGM_DATA 0x07
struct kvm_s390_pgm_info {
__u16 code;
};
struct kvm_s390_prefix_info {
__u32 address;
};
struct kvm_s390_extcall_info {
__u16 code;
};
struct kvm_s390_emerg_info {
__u16 code;
};
struct kvm_s390_mchk_info {
__u64 cr14;
__u64 mcic;
};
struct kvm_s390_interrupt_info {
struct list_head list;
u64 type;
......@@ -246,9 +225,8 @@ struct kvm_s390_float_interrupt {
struct list_head list;
atomic_t active;
int next_rr_cpu;
unsigned long idle_mask[(KVM_MAX_VCPUS + sizeof(long) - 1)
/ sizeof(long)];
struct kvm_s390_local_interrupt *local_int[KVM_MAX_VCPUS];
unsigned long idle_mask[BITS_TO_LONGS(KVM_MAX_VCPUS)];
unsigned int irq_count;
};
......@@ -265,6 +243,10 @@ struct kvm_vcpu_arch {
u64 stidp_data;
};
struct gmap *gmap;
#define KVM_S390_PFAULT_TOKEN_INVALID (-1UL)
unsigned long pfault_token;
unsigned long pfault_select;
unsigned long pfault_compare;
};
struct kvm_vm_stat {
......@@ -274,12 +256,36 @@ struct kvm_vm_stat {
struct kvm_arch_memory_slot {
};
struct s390_map_info {
struct list_head list;
__u64 guest_addr;
__u64 addr;
struct page *page;
};
struct s390_io_adapter {
unsigned int id;
int isc;
bool maskable;
bool masked;
bool swap;
struct rw_semaphore maps_lock;
struct list_head maps;
atomic_t nr_maps;
};
#define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
#define MAX_S390_ADAPTER_MAPS 256
struct kvm_arch{
struct sca_block *sca;
debug_info_t *dbf;
struct kvm_s390_float_interrupt float_int;
struct kvm_device *flic;
struct gmap *gmap;
int css_support;
int use_irqchip;
struct s390_io_adapter *adapters[MAX_S390_IO_ADAPTERS];
};
#define KVM_HVA_ERR_BAD (-1UL)
......@@ -290,6 +296,24 @@ static inline bool kvm_is_error_hva(unsigned long addr)
return IS_ERR_VALUE(addr);
}
#define ASYNC_PF_PER_VCPU 64
struct kvm_vcpu;
struct kvm_async_pf;
struct kvm_arch_async_pf {
unsigned long pfault_token;
};
bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work);
void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work);
void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work);
extern int sie64a(struct kvm_s390_sie_block *, u64 *);
extern char sie_exit;
#endif
......@@ -782,6 +782,7 @@ static inline void pgste_set_pte(pte_t *ptep, pte_t entry)
* @table: pointer to the page directory
* @asce: address space control element for gmap page table
* @crst_list: list of all crst tables used in the guest address space
* @pfault_enabled: defines if pfaults are applicable for the guest
*/
struct gmap {
struct list_head list;
......@@ -790,6 +791,7 @@ struct gmap {
unsigned long asce;
void *private;
struct list_head crst_list;
bool pfault_enabled;
};
/**
......
......@@ -79,6 +79,7 @@ struct thread_struct {
unsigned long ksp; /* kernel stack pointer */
mm_segment_t mm_segment;
unsigned long gmap_addr; /* address of last gmap fault. */
unsigned int gmap_pfault; /* signal of a pending guest pfault */
struct per_regs per_user; /* User specified PER registers */
struct per_event per_event; /* Cause of the last PER trap */
unsigned long per_flags; /* Flags to control debug behavior */
......
......@@ -16,6 +16,44 @@
#define __KVM_S390
/* Device control API: s390-specific devices */
#define KVM_DEV_FLIC_GET_ALL_IRQS 1
#define KVM_DEV_FLIC_ENQUEUE 2
#define KVM_DEV_FLIC_CLEAR_IRQS 3
#define KVM_DEV_FLIC_APF_ENABLE 4
#define KVM_DEV_FLIC_APF_DISABLE_WAIT 5
#define KVM_DEV_FLIC_ADAPTER_REGISTER 6
#define KVM_DEV_FLIC_ADAPTER_MODIFY 7
/*
* We can have up to 4*64k pending subchannels + 8 adapter interrupts,
* as well as up to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts.
* There are also sclp and machine checks. This gives us
* sizeof(kvm_s390_irq)*(4*65536+8+64*64+1+1) = 72 * 266250 = 19170000
* Lets round up to 8192 pages.
*/
#define KVM_S390_MAX_FLOAT_IRQS 266250
#define KVM_S390_FLIC_MAX_BUFFER 0x2000000
struct kvm_s390_io_adapter {
__u32 id;
__u8 isc;
__u8 maskable;
__u8 swap;
__u8 pad;
};
#define KVM_S390_IO_ADAPTER_MASK 1
#define KVM_S390_IO_ADAPTER_MAP 2
#define KVM_S390_IO_ADAPTER_UNMAP 3
struct kvm_s390_io_adapter_req {
__u32 id;
__u8 type;
__u8 mask;
__u16 pad0;
__u64 addr;
};
/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
/* general purpose regs for s390 */
......@@ -57,4 +95,9 @@ struct kvm_sync_regs {
#define KVM_REG_S390_EPOCHDIFF (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x2)
#define KVM_REG_S390_CPU_TIMER (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x3)
#define KVM_REG_S390_CLOCK_COMP (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x4)
#define KVM_REG_S390_PFTOKEN (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x5)
#define KVM_REG_S390_PFCOMPARE (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x6)
#define KVM_REG_S390_PFSELECT (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x7)
#define KVM_REG_S390_PP (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x8)
#define KVM_REG_S390_GBEA (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x9)
#endif
......@@ -85,6 +85,7 @@ static const struct irq_class irqclass_sub_desc[NR_ARCH_IRQS] = {
[IRQIO_PCI] = {.name = "PCI", .desc = "[I/O] PCI Interrupt" },
[IRQIO_MSI] = {.name = "MSI", .desc = "[I/O] MSI Interrupt" },
[IRQIO_VIR] = {.name = "VIR", .desc = "[I/O] Virtual I/O Devices"},
[IRQIO_VAI] = {.name = "VAI", .desc = "[I/O] Virtual I/O Devices AI"},
[NMI_NMI] = {.name = "NMI", .desc = "[NMI] Machine Check"},
[CPU_RST] = {.name = "RST", .desc = "[CPU] CPU Restart"},
};
......
......@@ -23,6 +23,10 @@ config KVM
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_EVENTFD
select KVM_ASYNC_PF
select KVM_ASYNC_PF_SYNC
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQ_ROUTING
---help---
Support hosting paravirtualized guest machines using the SIE
virtualization capability on the mainframe. This should work
......
......@@ -7,7 +7,7 @@
# as published by the Free Software Foundation.
KVM := ../../../virt/kvm
common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o
common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/async_pf.o $(KVM)/irqchip.o
ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
......
......@@ -18,6 +18,7 @@
#include "kvm-s390.h"
#include "trace.h"
#include "trace-s390.h"
#include "gaccess.h"
static int diag_release_pages(struct kvm_vcpu *vcpu)
{
......@@ -47,6 +48,87 @@ static int diag_release_pages(struct kvm_vcpu *vcpu)
return 0;
}
static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
{
struct prs_parm {
u16 code;
u16 subcode;
u16 parm_len;
u16 parm_version;
u64 token_addr;
u64 select_mask;
u64 compare_mask;
u64 zarch;
};
struct prs_parm parm;
int rc;
u16 rx = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
u16 ry = (vcpu->arch.sie_block->ipa & 0x0f);
unsigned long hva_token = KVM_HVA_ERR_BAD;
if (vcpu->run->s.regs.gprs[rx] & 7)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
if (copy_from_guest(vcpu, &parm, vcpu->run->s.regs.gprs[rx], sizeof(parm)))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
if (parm.parm_version != 2 || parm.parm_len < 5 || parm.code != 0x258)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
switch (parm.subcode) {
case 0: /* TOKEN */
if (vcpu->arch.pfault_token != KVM_S390_PFAULT_TOKEN_INVALID) {
/*
* If the pagefault handshake is already activated,
* the token must not be changed. We have to return
* decimal 8 instead, as mandated in SC24-6084.
*/
vcpu->run->s.regs.gprs[ry] = 8;
return 0;
}
if ((parm.compare_mask & parm.select_mask) != parm.compare_mask ||
parm.token_addr & 7 || parm.zarch != 0x8000000000000000ULL)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
hva_token = gfn_to_hva(vcpu->kvm, gpa_to_gfn(parm.token_addr));
if (kvm_is_error_hva(hva_token))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
vcpu->arch.pfault_token = parm.token_addr;
vcpu->arch.pfault_select = parm.select_mask;
vcpu->arch.pfault_compare = parm.compare_mask;
vcpu->run->s.regs.gprs[ry] = 0;
rc = 0;
break;
case 1: /*
* CANCEL
* Specification allows to let already pending tokens survive
* the cancel, therefore to reduce code complexity, we assume
* all outstanding tokens are already pending.
*/
if (parm.token_addr || parm.select_mask ||
parm.compare_mask || parm.zarch)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
vcpu->run->s.regs.gprs[ry] = 0;
/*
* If the pfault handling was not established or is already
* canceled SC24-6084 requests to return decimal 4.
*/
if (vcpu->arch.pfault_token == KVM_S390_PFAULT_TOKEN_INVALID)
vcpu->run->s.regs.gprs[ry] = 4;
else
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
rc = 0;
break;
default:
rc = -EOPNOTSUPP;
break;
}
return rc;
}
static int __diag_time_slice_end(struct kvm_vcpu *vcpu)
{
VCPU_EVENT(vcpu, 5, "%s", "diag time slice end");
......@@ -153,6 +235,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
return __diag_time_slice_end(vcpu);
case 0x9c:
return __diag_time_slice_end_directed(vcpu);
case 0x258:
return __diag_page_ref_service(vcpu);
case 0x308:
return __diag_ipl_functions(vcpu);
case 0x500:
......
此差异已折叠。
/*
* s390 irqchip routines
*
* Copyright IBM Corp. 2014
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Cornelia Huck <cornelia.huck@de.ibm.com>
*/
#ifndef __KVM_IRQ_H
#define __KVM_IRQ_H
#include <linux/kvm_host.h>
static inline int irqchip_in_kernel(struct kvm *kvm)
{
return 1;
}
#endif
......@@ -153,11 +153,14 @@ int kvm_dev_ioctl_check_extension(long ext)
#ifdef CONFIG_KVM_S390_UCONTROL
case KVM_CAP_S390_UCONTROL:
#endif
case KVM_CAP_ASYNC_PF:
case KVM_CAP_SYNC_REGS:
case KVM_CAP_ONE_REG:
case KVM_CAP_ENABLE_CAP:
case KVM_CAP_S390_CSS_SUPPORT:
case KVM_CAP_IOEVENTFD:
case KVM_CAP_DEVICE_CTRL:
case KVM_CAP_ENABLE_CAP_VM:
r = 1;
break;
case KVM_CAP_NR_VCPUS:
......@@ -186,6 +189,25 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
return 0;
}
static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
{
int r;
if (cap->flags)
return -EINVAL;
switch (cap->cap) {
case KVM_CAP_S390_IRQCHIP:
kvm->arch.use_irqchip = 1;
r = 0;
break;
default:
r = -EINVAL;
break;
}
return r;
}
long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
......@@ -203,6 +225,26 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_s390_inject_vm(kvm, &s390int);
break;
}
case KVM_ENABLE_CAP: {
struct kvm_enable_cap cap;
r = -EFAULT;
if (copy_from_user(&cap, argp, sizeof(cap)))
break;
r = kvm_vm_ioctl_enable_cap(kvm, &cap);
break;
}
case KVM_CREATE_IRQCHIP: {
struct kvm_irq_routing_entry routing;
r = -EINVAL;
if (kvm->arch.use_irqchip) {
/* Set up dummy routing. */
memset(&routing, 0, sizeof(routing));
kvm_set_irq_routing(kvm, &routing, 0, 0);
r = 0;
}
break;
}
default:
r = -ENOTTY;
}
......@@ -214,6 +256,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
int rc;
char debug_name[16];
static unsigned long sca_offset;
rc = -EINVAL;
#ifdef CONFIG_KVM_S390_UCONTROL
......@@ -235,6 +278,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.sca = (struct sca_block *) get_zeroed_page(GFP_KERNEL);
if (!kvm->arch.sca)
goto out_err;
spin_lock(&kvm_lock);
sca_offset = (sca_offset + 16) & 0x7f0;
kvm->arch.sca = (struct sca_block *) ((char *) kvm->arch.sca + sca_offset);
spin_unlock(&kvm_lock);
sprintf(debug_name, "kvm-%u", current->pid);
......@@ -255,9 +302,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (!kvm->arch.gmap)
goto out_nogmap;
kvm->arch.gmap->private = kvm;
kvm->arch.gmap->pfault_enabled = 0;
}
kvm->arch.css_support = 0;
kvm->arch.use_irqchip = 0;
return 0;
out_nogmap:
......@@ -272,6 +321,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
VCPU_EVENT(vcpu, 3, "%s", "free cpu");
trace_kvm_s390_destroy_vcpu(vcpu->vcpu_id);
kvm_clear_async_pf_completion_queue(vcpu);
if (!kvm_is_ucontrol(vcpu->kvm)) {
clear_bit(63 - vcpu->vcpu_id,
(unsigned long *) &vcpu->kvm->arch.sca->mcn);
......@@ -320,11 +370,14 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
debug_unregister(kvm->arch.dbf);
if (!kvm_is_ucontrol(kvm))
gmap_free(kvm->arch.gmap);
kvm_s390_destroy_adapters(kvm);
}
/* Section: vcpu related */
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
kvm_clear_async_pf_completion_queue(vcpu);
if (kvm_is_ucontrol(vcpu->kvm)) {
vcpu->arch.gmap = gmap_alloc(current->mm);
if (!vcpu->arch.gmap)
......@@ -385,7 +438,11 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
vcpu->arch.guest_fpregs.fpc = 0;
asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc));
vcpu->arch.sie_block->gbea = 1;
vcpu->arch.sie_block->pp = 0;
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
kvm_clear_async_pf_completion_queue(vcpu);
atomic_set_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
kvm_s390_clear_local_irqs(vcpu);
}
int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
......@@ -466,11 +523,8 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
spin_lock_init(&vcpu->arch.local_int.lock);
INIT_LIST_HEAD(&vcpu->arch.local_int.list);
vcpu->arch.local_int.float_int = &kvm->arch.float_int;
spin_lock(&kvm->arch.float_int.lock);
kvm->arch.float_int.local_int[id] = &vcpu->arch.local_int;
vcpu->arch.local_int.wq = &vcpu->wq;
vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags;
spin_unlock(&kvm->arch.float_int.lock);
rc = kvm_vcpu_init(vcpu, kvm, id);
if (rc)
......@@ -490,9 +544,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
/* kvm common code refers to this, but never calls it */
BUG();
return 0;
return kvm_cpu_has_interrupt(vcpu);
}
void s390_vcpu_block(struct kvm_vcpu *vcpu)
......@@ -568,6 +620,26 @@ static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
r = put_user(vcpu->arch.sie_block->ckc,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PFTOKEN:
r = put_user(vcpu->arch.pfault_token,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PFCOMPARE:
r = put_user(vcpu->arch.pfault_compare,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PFSELECT:
r = put_user(vcpu->arch.pfault_select,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PP:
r = put_user(vcpu->arch.sie_block->pp,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_GBEA:
r = put_user(vcpu->arch.sie_block->gbea,
(u64 __user *)reg->addr);
break;
default:
break;
}
......@@ -597,6 +669,26 @@ static int kvm_arch_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu,
r = get_user(vcpu->arch.sie_block->ckc,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PFTOKEN:
r = get_user(vcpu->arch.pfault_token,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PFCOMPARE:
r = get_user(vcpu->arch.pfault_compare,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PFSELECT:
r = get_user(vcpu->arch.pfault_select,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_PP:
r = get_user(vcpu->arch.sie_block->pp,
(u64 __user *)reg->addr);
break;
case KVM_REG_S390_GBEA:
r = get_user(vcpu->arch.sie_block->gbea,
(u64 __user *)reg->addr);
break;
default:
break;
}
......@@ -715,10 +807,100 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
return 0;
}
static long kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
{
long rc;
hva_t fault = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
struct mm_struct *mm = current->mm;
down_read(&mm->mmap_sem);
rc = get_user_pages(current, mm, fault, 1, 1, 0, NULL, NULL);
up_read(&mm->mmap_sem);
return rc;
}
static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
unsigned long token)
{
struct kvm_s390_interrupt inti;
inti.parm64 = token;
if (start_token) {
inti.type = KVM_S390_INT_PFAULT_INIT;
WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &inti));
} else {
inti.type = KVM_S390_INT_PFAULT_DONE;
WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
}
}
void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work)
{
trace_kvm_s390_pfault_init(vcpu, work->arch.pfault_token);
__kvm_inject_pfault_token(vcpu, true, work->arch.pfault_token);
}
void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work)
{
trace_kvm_s390_pfault_done(vcpu, work->arch.pfault_token);
__kvm_inject_pfault_token(vcpu, false, work->arch.pfault_token);
}
void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work)
{
/* s390 will always inject the page directly */
}
bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
{
/*
* s390 will always inject the page directly,
* but we still want check_async_completion to cleanup
*/
return true;
}
static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu)
{
hva_t hva;
struct kvm_arch_async_pf arch;
int rc;
if (vcpu->arch.pfault_token == KVM_S390_PFAULT_TOKEN_INVALID)
return 0;
if ((vcpu->arch.sie_block->gpsw.mask & vcpu->arch.pfault_select) !=
vcpu->arch.pfault_compare)
return 0;
if (psw_extint_disabled(vcpu))
return 0;
if (kvm_cpu_has_interrupt(vcpu))
return 0;
if (!(vcpu->arch.sie_block->gcr[0] & 0x200ul))
return 0;
if (!vcpu->arch.gmap->pfault_enabled)
return 0;
hva = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
if (copy_from_guest(vcpu, &arch.pfault_token, vcpu->arch.pfault_token, 8))
return 0;
rc = kvm_setup_async_pf(vcpu, current->thread.gmap_addr, hva, &arch);
return rc;
}
static int vcpu_pre_run(struct kvm_vcpu *vcpu)
{
int rc, cpuflags;
/*
* On s390 notifications for arriving pages will be delivered directly
* to the guest but the house keeping for completed pfaults is
* handled outside the worker.
*/
kvm_check_async_pf_completion(vcpu);
memcpy(&vcpu->arch.sie_block->gg14, &vcpu->run->s.regs.gprs[14], 16);
if (need_resched())
......@@ -744,7 +926,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu)
static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
{
int rc;
int rc = -1;
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
vcpu->arch.sie_block->icptcode);
......@@ -758,7 +940,16 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
current->thread.gmap_addr;
vcpu->run->s390_ucontrol.pgm_code = 0x10;
rc = -EREMOTE;
} else {
} else if (current->thread.gmap_pfault) {
trace_kvm_s390_major_guest_pfault(vcpu);
current->thread.gmap_pfault = 0;
if (kvm_arch_setup_async_pf(vcpu) ||
(kvm_arch_fault_in_sync(vcpu) >= 0))
rc = 0;
}
if (rc == -1) {
VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
trace_kvm_s390_sie_fault(vcpu);
rc = kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
......@@ -768,7 +959,8 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
if (rc == 0) {
if (kvm_is_ucontrol(vcpu->kvm))
rc = -EOPNOTSUPP;
/* Don't exit for host interrupts. */
rc = vcpu->arch.sie_block->icptcode ? -EOPNOTSUPP : 0;
else
rc = kvm_handle_sie_intercept(vcpu);
}
......@@ -831,8 +1023,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
atomic_clear_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL);
switch (kvm_run->exit_reason) {
case KVM_EXIT_S390_SIEIC:
case KVM_EXIT_UNKNOWN:
......
......@@ -129,6 +129,7 @@ enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer);
void kvm_s390_tasklet(unsigned long parm);
void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
void kvm_s390_deliver_pending_machine_checks(struct kvm_vcpu *vcpu);
void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu);
int __must_check kvm_s390_inject_vm(struct kvm *kvm,
struct kvm_s390_interrupt *s390int);
int __must_check kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
......@@ -136,6 +137,7 @@ int __must_check kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
int __must_check kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code);
struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
u64 cr6, u64 schid);
int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked);
/* implemented in priv.c */
int kvm_s390_handle_b2(struct kvm_vcpu *vcpu);
......@@ -161,4 +163,9 @@ bool kvm_enabled_cmma(void);
/* implemented in diag.c */
int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);
/* implemented in interrupt.c */
int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
int psw_extint_disabled(struct kvm_vcpu *vcpu);
void kvm_s390_destroy_adapters(struct kvm *kvm);
#endif
......@@ -396,15 +396,10 @@ static int handle_stidp(struct kvm_vcpu *vcpu)
static void handle_stsi_3_2_2(struct kvm_vcpu *vcpu, struct sysinfo_3_2_2 *mem)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
int cpus = 0;
int n;
spin_lock(&fi->lock);
for (n = 0; n < KVM_MAX_VCPUS; n++)
if (fi->local_int[n])
cpus++;
spin_unlock(&fi->lock);
cpus = atomic_read(&vcpu->kvm->online_vcpus);
/* deal with other level 3 hypervisors */
if (stsi(mem, 3, 2, 2))
......
......@@ -23,29 +23,30 @@
static int __sigp_sense(struct kvm_vcpu *vcpu, u16 cpu_addr,
u64 *reg)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_local_interrupt *li;
struct kvm_vcpu *dst_vcpu = NULL;
int cpuflags;
int rc;
if (cpu_addr >= KVM_MAX_VCPUS)
return SIGP_CC_NOT_OPERATIONAL;
spin_lock(&fi->lock);
if (fi->local_int[cpu_addr] == NULL)
rc = SIGP_CC_NOT_OPERATIONAL;
else if (!(atomic_read(fi->local_int[cpu_addr]->cpuflags)
& (CPUSTAT_ECALL_PEND | CPUSTAT_STOPPED)))
dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
li = &dst_vcpu->arch.local_int;
cpuflags = atomic_read(li->cpuflags);
if (!(cpuflags & (CPUSTAT_ECALL_PEND | CPUSTAT_STOPPED)))
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
else {
*reg &= 0xffffffff00000000UL;
if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
& CPUSTAT_ECALL_PEND)
if (cpuflags & CPUSTAT_ECALL_PEND)
*reg |= SIGP_STATUS_EXT_CALL_PENDING;
if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
& CPUSTAT_STOPPED)
if (cpuflags & CPUSTAT_STOPPED)
*reg |= SIGP_STATUS_STOPPED;
rc = SIGP_CC_STATUS_STORED;
}
spin_unlock(&fi->lock);
VCPU_EVENT(vcpu, 4, "sensed status of cpu %x rc %x", cpu_addr, rc);
return rc;
......@@ -53,12 +54,13 @@ static int __sigp_sense(struct kvm_vcpu *vcpu, u16 cpu_addr,
static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_local_interrupt *li;
struct kvm_s390_interrupt_info *inti;
int rc;
struct kvm_vcpu *dst_vcpu = NULL;
if (cpu_addr >= KVM_MAX_VCPUS)
if (cpu_addr < KVM_MAX_VCPUS)
dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
......@@ -68,13 +70,7 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr)
inti->type = KVM_S390_INT_EMERGENCY;
inti->emerg.code = vcpu->vcpu_id;
spin_lock(&fi->lock);
li = fi->local_int[cpu_addr];
if (li == NULL) {
rc = SIGP_CC_NOT_OPERATIONAL;
kfree(inti);
goto unlock;
}
li = &dst_vcpu->arch.local_int;
spin_lock_bh(&li->lock);
list_add_tail(&inti->list, &li->list);
atomic_set(&li->active, 1);
......@@ -82,11 +78,9 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr)
if (waitqueue_active(li->wq))
wake_up_interruptible(li->wq);
spin_unlock_bh(&li->lock);
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr);
unlock:
spin_unlock(&fi->lock);
return rc;
return SIGP_CC_ORDER_CODE_ACCEPTED;
}
static int __sigp_conditional_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr,
......@@ -122,12 +116,13 @@ static int __sigp_conditional_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr,
static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 cpu_addr)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_local_interrupt *li;
struct kvm_s390_interrupt_info *inti;
int rc;
struct kvm_vcpu *dst_vcpu = NULL;
if (cpu_addr >= KVM_MAX_VCPUS)
if (cpu_addr < KVM_MAX_VCPUS)
dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
......@@ -137,13 +132,7 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 cpu_addr)
inti->type = KVM_S390_INT_EXTERNAL_CALL;
inti->extcall.code = vcpu->vcpu_id;
spin_lock(&fi->lock);
li = fi->local_int[cpu_addr];
if (li == NULL) {
rc = SIGP_CC_NOT_OPERATIONAL;
kfree(inti);
goto unlock;
}
li = &dst_vcpu->arch.local_int;
spin_lock_bh(&li->lock);
list_add_tail(&inti->list, &li->list);
atomic_set(&li->active, 1);
......@@ -151,11 +140,9 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 cpu_addr)
if (waitqueue_active(li->wq))
wake_up_interruptible(li->wq);
spin_unlock_bh(&li->lock);
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr);
unlock:
spin_unlock(&fi->lock);
return rc;
return SIGP_CC_ORDER_CODE_ACCEPTED;
}
static int __inject_sigp_stop(struct kvm_s390_local_interrupt *li, int action)
......@@ -189,31 +176,26 @@ static int __inject_sigp_stop(struct kvm_s390_local_interrupt *li, int action)
static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_local_interrupt *li;
struct kvm_vcpu *dst_vcpu = NULL;
int rc;
if (cpu_addr >= KVM_MAX_VCPUS)
return SIGP_CC_NOT_OPERATIONAL;
spin_lock(&fi->lock);
li = fi->local_int[cpu_addr];
if (li == NULL) {
rc = SIGP_CC_NOT_OPERATIONAL;
goto unlock;
}
dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
li = &dst_vcpu->arch.local_int;
rc = __inject_sigp_stop(li, action);
unlock:
spin_unlock(&fi->lock);
VCPU_EVENT(vcpu, 4, "sent sigp stop to cpu %x", cpu_addr);
if ((action & ACTION_STORE_ON_STOP) != 0 && rc == -ESHUTDOWN) {
/* If the CPU has already been stopped, we still have
* to save the status when doing stop-and-store. This
* has to be done after unlocking all spinlocks. */
struct kvm_vcpu *dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
rc = kvm_s390_store_status_unloaded(dst_vcpu,
KVM_S390_STORE_STATUS_NOADDR);
}
......@@ -224,6 +206,8 @@ static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action)
static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter)
{
int rc;
unsigned int i;
struct kvm_vcpu *v;
switch (parameter & 0xff) {
case 0:
......@@ -231,6 +215,11 @@ static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter)
break;
case 1:
case 2:
kvm_for_each_vcpu(i, v, vcpu->kvm) {
v->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
kvm_clear_async_pf_completion_queue(v);
}
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
break;
default:
......@@ -242,12 +231,18 @@ static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter)
static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address,
u64 *reg)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_local_interrupt *li = NULL;
struct kvm_s390_local_interrupt *li;
struct kvm_vcpu *dst_vcpu = NULL;
struct kvm_s390_interrupt_info *inti;
int rc;
u8 tmp;
if (cpu_addr < KVM_MAX_VCPUS)
dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
li = &dst_vcpu->arch.local_int;
/* make sure that the new value is valid memory */
address = address & 0x7fffe000u;
if (copy_from_guest_absolute(vcpu, &tmp, address, 1) ||
......@@ -261,18 +256,6 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address,
if (!inti)
return SIGP_CC_BUSY;
spin_lock(&fi->lock);
if (cpu_addr < KVM_MAX_VCPUS)
li = fi->local_int[cpu_addr];
if (li == NULL) {
*reg &= 0xffffffff00000000UL;
*reg |= SIGP_STATUS_INCORRECT_STATE;
rc = SIGP_CC_STATUS_STORED;
kfree(inti);
goto out_fi;
}
spin_lock_bh(&li->lock);
/* cpu must be in stopped state */
if (!(atomic_read(li->cpuflags) & CPUSTAT_STOPPED)) {
......@@ -295,8 +278,6 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address,
VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address);
out_li:
spin_unlock_bh(&li->lock);
out_fi:
spin_unlock(&fi->lock);
return rc;
}
......@@ -334,28 +315,26 @@ static int __sigp_store_status_at_addr(struct kvm_vcpu *vcpu, u16 cpu_id,
static int __sigp_sense_running(struct kvm_vcpu *vcpu, u16 cpu_addr,
u64 *reg)
{
struct kvm_s390_local_interrupt *li;
struct kvm_vcpu *dst_vcpu = NULL;
int rc;
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
if (cpu_addr >= KVM_MAX_VCPUS)
return SIGP_CC_NOT_OPERATIONAL;
spin_lock(&fi->lock);
if (fi->local_int[cpu_addr] == NULL)
rc = SIGP_CC_NOT_OPERATIONAL;
else {
if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
& CPUSTAT_RUNNING) {
/* running */
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
} else {
/* not running */
*reg &= 0xffffffff00000000UL;
*reg |= SIGP_STATUS_NOT_RUNNING;
rc = SIGP_CC_STATUS_STORED;
}
dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
li = &dst_vcpu->arch.local_int;
if (atomic_read(li->cpuflags) & CPUSTAT_RUNNING) {
/* running */
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
} else {
/* not running */
*reg &= 0xffffffff00000000UL;
*reg |= SIGP_STATUS_NOT_RUNNING;
rc = SIGP_CC_STATUS_STORED;
}
spin_unlock(&fi->lock);
VCPU_EVENT(vcpu, 4, "sensed running status of cpu %x rc %x", cpu_addr,
rc);
......@@ -366,26 +345,22 @@ static int __sigp_sense_running(struct kvm_vcpu *vcpu, u16 cpu_addr,
/* Test whether the destination CPU is available and not busy */
static int sigp_check_callable(struct kvm_vcpu *vcpu, u16 cpu_addr)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_local_interrupt *li;
int rc = SIGP_CC_ORDER_CODE_ACCEPTED;
struct kvm_vcpu *dst_vcpu = NULL;
if (cpu_addr >= KVM_MAX_VCPUS)
return SIGP_CC_NOT_OPERATIONAL;
spin_lock(&fi->lock);
li = fi->local_int[cpu_addr];
if (li == NULL) {
rc = SIGP_CC_NOT_OPERATIONAL;
goto out;
}
dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
li = &dst_vcpu->arch.local_int;
spin_lock_bh(&li->lock);
if (li->action_bits & ACTION_STOP_ON_STOP)
rc = SIGP_CC_BUSY;
spin_unlock_bh(&li->lock);
out:
spin_unlock(&fi->lock);
return rc;
}
......
......@@ -30,6 +30,52 @@
TP_printk("%02d[%016lx-%016lx]: " p_str, __entry->id, \
__entry->pswmask, __entry->pswaddr, p_args)
TRACE_EVENT(kvm_s390_major_guest_pfault,
TP_PROTO(VCPU_PROTO_COMMON),
TP_ARGS(VCPU_ARGS_COMMON),
TP_STRUCT__entry(
VCPU_FIELD_COMMON
),
TP_fast_assign(
VCPU_ASSIGN_COMMON
),
VCPU_TP_PRINTK("%s", "major fault, maybe applicable for pfault")
);
TRACE_EVENT(kvm_s390_pfault_init,
TP_PROTO(VCPU_PROTO_COMMON, long pfault_token),
TP_ARGS(VCPU_ARGS_COMMON, pfault_token),
TP_STRUCT__entry(
VCPU_FIELD_COMMON
__field(long, pfault_token)
),
TP_fast_assign(
VCPU_ASSIGN_COMMON
__entry->pfault_token = pfault_token;
),
VCPU_TP_PRINTK("init pfault token %ld", __entry->pfault_token)
);
TRACE_EVENT(kvm_s390_pfault_done,
TP_PROTO(VCPU_PROTO_COMMON, long pfault_token),
TP_ARGS(VCPU_ARGS_COMMON, pfault_token),
TP_STRUCT__entry(
VCPU_FIELD_COMMON
__field(long, pfault_token)
),
TP_fast_assign(
VCPU_ASSIGN_COMMON
__entry->pfault_token = pfault_token;
),
VCPU_TP_PRINTK("done pfault token %ld", __entry->pfault_token)
);
/*
* Tracepoints for SIE entry and exit.
*/
......
......@@ -50,6 +50,7 @@
#define VM_FAULT_BADMAP 0x020000
#define VM_FAULT_BADACCESS 0x040000
#define VM_FAULT_SIGNAL 0x080000
#define VM_FAULT_PFAULT 0x100000
static unsigned long store_indication __read_mostly;
......@@ -227,6 +228,7 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault)
return;
}
case VM_FAULT_BADCONTEXT:
case VM_FAULT_PFAULT:
do_no_context(regs);
break;
case VM_FAULT_SIGNAL:
......@@ -264,6 +266,9 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault)
*/
static inline int do_exception(struct pt_regs *regs, int access)
{
#ifdef CONFIG_PGSTE
struct gmap *gmap;
#endif
struct task_struct *tsk;
struct mm_struct *mm;
struct vm_area_struct *vma;
......@@ -304,9 +309,10 @@ static inline int do_exception(struct pt_regs *regs, int access)
down_read(&mm->mmap_sem);
#ifdef CONFIG_PGSTE
if ((current->flags & PF_VCPU) && S390_lowcore.gmap) {
address = __gmap_fault(address,
(struct gmap *) S390_lowcore.gmap);
gmap = (struct gmap *)
((current->flags & PF_VCPU) ? S390_lowcore.gmap : 0);
if (gmap) {
address = __gmap_fault(address, gmap);
if (address == -EFAULT) {
fault = VM_FAULT_BADMAP;
goto out_up;
......@@ -315,6 +321,8 @@ static inline int do_exception(struct pt_regs *regs, int access)
fault = VM_FAULT_OOM;
goto out_up;
}
if (gmap->pfault_enabled)
flags |= FAULT_FLAG_RETRY_NOWAIT;
}
#endif
......@@ -371,9 +379,19 @@ static inline int do_exception(struct pt_regs *regs, int access)
regs, address);
}
if (fault & VM_FAULT_RETRY) {
#ifdef CONFIG_PGSTE
if (gmap && (flags & FAULT_FLAG_RETRY_NOWAIT)) {
/* FAULT_FLAG_RETRY_NOWAIT has been set,
* mmap_sem has not been released */
current->thread.gmap_pfault = 1;
fault = VM_FAULT_PFAULT;
goto out_up;
}
#endif
/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation. */
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags &= ~(FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT);
flags |= FAULT_FLAG_TRIED;
down_read(&mm->mmap_sem);
goto retry;
......
......@@ -337,6 +337,11 @@ struct kvm_pmu {
u64 reprogram_pmi;
};
enum {
KVM_DEBUGREG_BP_ENABLED = 1,
KVM_DEBUGREG_WONT_EXIT = 2,
};
struct kvm_vcpu_arch {
/*
* rip and regs accesses must go through
......@@ -444,7 +449,6 @@ struct kvm_vcpu_arch {
} st;
u64 last_guest_tsc;
u64 last_kernel_ns;
u64 last_host_tsc;
u64 tsc_offset_adjustment;
u64 this_tsc_nsec;
......@@ -464,7 +468,7 @@ struct kvm_vcpu_arch {
struct mtrr_state_type mtrr_state;
u32 pat;
int switch_db_regs;
unsigned switch_db_regs;
unsigned long db[KVM_NR_DB_REGS];
unsigned long dr6;
unsigned long dr7;
......@@ -599,6 +603,8 @@ struct kvm_arch {
bool use_master_clock;
u64 master_kernel_ns;
cycle_t master_cycle_now;
struct delayed_work kvmclock_update_work;
struct delayed_work kvmclock_sync_work;
struct kvm_xen_hvm_config xen_hvm_config;
......@@ -702,6 +708,7 @@ struct kvm_x86_ops {
void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
u64 (*get_dr6)(struct kvm_vcpu *vcpu);
void (*set_dr6)(struct kvm_vcpu *vcpu, unsigned long value);
void (*sync_dirty_debug_regs)(struct kvm_vcpu *vcpu);
void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
......@@ -728,8 +735,8 @@ struct kvm_x86_ops {
int (*nmi_allowed)(struct kvm_vcpu *vcpu);
bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked);
int (*enable_nmi_window)(struct kvm_vcpu *vcpu);
int (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
int (*vm_has_apicv)(struct kvm *kvm);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
......@@ -765,6 +772,9 @@ struct kvm_x86_ops {
struct x86_instruction_info *info,
enum x86_intercept_stage stage);
void (*handle_external_intr)(struct kvm_vcpu *vcpu);
bool (*mpx_supported)(void);
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
};
struct kvm_arch_async_pf {
......
......@@ -85,6 +85,7 @@
#define VM_EXIT_SAVE_IA32_EFER 0x00100000
#define VM_EXIT_LOAD_IA32_EFER 0x00200000
#define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
......@@ -95,6 +96,7 @@
#define VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL 0x00002000
#define VM_ENTRY_LOAD_IA32_PAT 0x00004000
#define VM_ENTRY_LOAD_IA32_EFER 0x00008000
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
......@@ -174,6 +176,8 @@ enum vmcs_field {
GUEST_PDPTR2_HIGH = 0x0000280f,
GUEST_PDPTR3 = 0x00002810,
GUEST_PDPTR3_HIGH = 0x00002811,
GUEST_BNDCFGS = 0x00002812,
GUEST_BNDCFGS_HIGH = 0x00002813,
HOST_IA32_PAT = 0x00002c00,
HOST_IA32_PAT_HIGH = 0x00002c01,
HOST_IA32_EFER = 0x00002c02,
......
......@@ -16,6 +16,8 @@
#define XSTATE_Hi16_ZMM 0x80
#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)
/* Bit 63 of XCR0 is reserved for future expansion */
#define XSTATE_EXTEND_MASK (~(XSTATE_FPSSE | (1ULL << 63)))
#define FXSAVE_SIZE 512
......
......@@ -295,6 +295,7 @@
#define MSR_SMI_COUNT 0x00000034
#define MSR_IA32_FEATURE_CONTROL 0x0000003a
#define MSR_IA32_TSC_ADJUST 0x0000003b
#define MSR_IA32_BNDCFGS 0x00000d90
#define FEATURE_CONTROL_LOCKED (1<<0)
#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1)
......
......@@ -417,7 +417,6 @@ void kvm_disable_steal_time(void)
#ifdef CONFIG_SMP
static void __init kvm_smp_prepare_boot_cpu(void)
{
WARN_ON(kvm_register_clock("primary cpu clock"));
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
kvm_spinlock_init();
......
......@@ -242,7 +242,7 @@ void __init kvmclock_init(void)
hv_clock = __va(mem);
memset(hv_clock, 0, size);
if (kvm_register_clock("boot clock")) {
if (kvm_register_clock("primary cpu clock")) {
hv_clock = NULL;
memblock_free(mem, size);
return;
......
......@@ -28,7 +28,7 @@ static u32 xstate_required_size(u64 xstate_bv)
int feature_bit = 0;
u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
xstate_bv &= ~XSTATE_FPSSE;
xstate_bv &= XSTATE_EXTEND_MASK;
while (xstate_bv) {
if (xstate_bv & 0x1) {
u32 eax, ebx, ecx, edx;
......@@ -43,6 +43,16 @@ static u32 xstate_required_size(u64 xstate_bv)
return ret;
}
u64 kvm_supported_xcr0(void)
{
u64 xcr0 = KVM_SUPPORTED_XCR0 & host_xcr0;
if (!kvm_x86_ops->mpx_supported())
xcr0 &= ~(XSTATE_BNDREGS | XSTATE_BNDCSR);
return xcr0;
}
void kvm_update_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
......@@ -73,9 +83,9 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
} else {
vcpu->arch.guest_supported_xcr0 =
(best->eax | ((u64)best->edx << 32)) &
host_xcr0 & KVM_SUPPORTED_XCR0;
vcpu->arch.guest_xstate_size =
xstate_required_size(vcpu->arch.guest_supported_xcr0);
kvm_supported_xcr0();
vcpu->arch.guest_xstate_size = best->ebx =
xstate_required_size(vcpu->arch.xcr0);
}
kvm_pmu_cpuid_update(vcpu);
......@@ -210,13 +220,6 @@ static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
entry->flags = 0;
}
static bool supported_xcr0_bit(unsigned bit)
{
u64 mask = ((u64)1 << bit);
return mask & KVM_SUPPORTED_XCR0 & host_xcr0;
}
#define F(x) bit(X86_FEATURE_##x)
static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry,
......@@ -256,6 +259,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
#endif
unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0;
unsigned f_invpcid = kvm_x86_ops->invpcid_supported() ? F(INVPCID) : 0;
unsigned f_mpx = kvm_x86_ops->mpx_supported() ? F(MPX) : 0;
/* cpuid 1.edx */
const u32 kvm_supported_word0_x86_features =
......@@ -303,7 +307,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
/* cpuid 7.0.ebx */
const u32 kvm_supported_word9_x86_features =
F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
F(ADX);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
......@@ -436,16 +441,18 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
}
case 0xd: {
int idx, i;
u64 supported = kvm_supported_xcr0();
entry->eax &= host_xcr0 & KVM_SUPPORTED_XCR0;
entry->edx &= (host_xcr0 & KVM_SUPPORTED_XCR0) >> 32;
entry->eax &= supported;
entry->edx &= supported >> 32;
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
for (idx = 1, i = 1; idx < 64; ++idx) {
u64 mask = ((u64)1 << idx);
if (*nent >= maxnent)
goto out;
do_cpuid_1_ent(&entry[i], function, idx);
if (entry[i].eax == 0 || !supported_xcr0_bit(idx))
if (entry[i].eax == 0 || !(supported & mask))
continue;
entry[i].flags |=
KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
......
......@@ -3668,6 +3668,10 @@ static const struct gprefix pfx_vmovntpx = {
I(0, em_mov), N, N, N,
};
static const struct gprefix pfx_0f_28_0f_29 = {
I(Aligned, em_mov), I(Aligned, em_mov), N, N,
};
static const struct escape escape_d9 = { {
N, N, N, N, N, N, N, I(DstMem, em_fnstcw),
}, {
......@@ -3870,7 +3874,9 @@ static const struct opcode twobyte_table[256] = {
IIP(ModRM | SrcMem | Priv | Op3264, em_cr_write, cr_write, check_cr_write),
IIP(ModRM | SrcMem | Priv | Op3264, em_dr_write, dr_write, check_dr_write),
N, N, N, N,
N, N, N, GP(ModRM | DstMem | SrcReg | Sse | Mov | Aligned, &pfx_vmovntpx),
GP(ModRM | DstReg | SrcMem | Mov | Sse, &pfx_0f_28_0f_29),
GP(ModRM | DstMem | SrcReg | Mov | Sse, &pfx_0f_28_0f_29),
N, GP(ModRM | DstMem | SrcReg | Sse | Mov | Aligned, &pfx_vmovntpx),
N, N, N, N,
/* 0x30 - 0x3F */
II(ImplicitOps | Priv, em_wrmsr, wrmsr),
......
......@@ -3329,7 +3329,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
arch.direct_map = vcpu->arch.mmu.direct_map;
arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
return kvm_setup_async_pf(vcpu, gva, gfn_to_hva(vcpu->kvm, gfn), &arch);
}
static bool can_do_async_pf(struct kvm_vcpu *vcpu)
......
......@@ -913,7 +913,8 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
* and kvm_mmu_notifier_invalidate_range_start detect the mapping page isn't
* used by guest then tlbs are not flushed, so guest is allowed to access the
* freed pages.
* And we increase kvm->tlbs_dirty to delay tlbs flush in this case.
* We set tlbs_dirty to let the notifier know this change and delay the flush
* until such a case actually happens.
*/
static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
{
......@@ -942,7 +943,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
return -EINVAL;
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
vcpu->kvm->tlbs_dirty++;
vcpu->kvm->tlbs_dirty = true;
continue;
}
......@@ -957,7 +958,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
if (gfn != sp->gfns[i]) {
drop_spte(vcpu->kvm, &sp->spt[i]);
vcpu->kvm->tlbs_dirty++;
vcpu->kvm->tlbs_dirty = true;
continue;
}
......
......@@ -34,6 +34,7 @@
#include <asm/perf_event.h>
#include <asm/tlbflush.h>
#include <asm/desc.h>
#include <asm/debugreg.h>
#include <asm/kvm_para.h>
#include <asm/virtext.h>
......@@ -303,20 +304,35 @@ static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit)
return vmcb->control.intercept_cr & (1U << bit);
}
static inline void set_dr_intercept(struct vcpu_svm *svm, int bit)
static inline void set_dr_intercepts(struct vcpu_svm *svm)
{
struct vmcb *vmcb = get_host_vmcb(svm);
vmcb->control.intercept_dr |= (1U << bit);
vmcb->control.intercept_dr = (1 << INTERCEPT_DR0_READ)
| (1 << INTERCEPT_DR1_READ)
| (1 << INTERCEPT_DR2_READ)
| (1 << INTERCEPT_DR3_READ)
| (1 << INTERCEPT_DR4_READ)
| (1 << INTERCEPT_DR5_READ)
| (1 << INTERCEPT_DR6_READ)
| (1 << INTERCEPT_DR7_READ)
| (1 << INTERCEPT_DR0_WRITE)
| (1 << INTERCEPT_DR1_WRITE)
| (1 << INTERCEPT_DR2_WRITE)
| (1 << INTERCEPT_DR3_WRITE)
| (1 << INTERCEPT_DR4_WRITE)
| (1 << INTERCEPT_DR5_WRITE)
| (1 << INTERCEPT_DR6_WRITE)
| (1 << INTERCEPT_DR7_WRITE);
recalc_intercepts(svm);
}
static inline void clr_dr_intercept(struct vcpu_svm *svm, int bit)
static inline void clr_dr_intercepts(struct vcpu_svm *svm)
{
struct vmcb *vmcb = get_host_vmcb(svm);
vmcb->control.intercept_dr &= ~(1U << bit);
vmcb->control.intercept_dr = 0;
recalc_intercepts(svm);
}
......@@ -1080,23 +1096,7 @@ static void init_vmcb(struct vcpu_svm *svm)
set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
set_dr_intercept(svm, INTERCEPT_DR0_READ);
set_dr_intercept(svm, INTERCEPT_DR1_READ);
set_dr_intercept(svm, INTERCEPT_DR2_READ);
set_dr_intercept(svm, INTERCEPT_DR3_READ);
set_dr_intercept(svm, INTERCEPT_DR4_READ);
set_dr_intercept(svm, INTERCEPT_DR5_READ);
set_dr_intercept(svm, INTERCEPT_DR6_READ);
set_dr_intercept(svm, INTERCEPT_DR7_READ);
set_dr_intercept(svm, INTERCEPT_DR0_WRITE);
set_dr_intercept(svm, INTERCEPT_DR1_WRITE);
set_dr_intercept(svm, INTERCEPT_DR2_WRITE);
set_dr_intercept(svm, INTERCEPT_DR3_WRITE);
set_dr_intercept(svm, INTERCEPT_DR4_WRITE);
set_dr_intercept(svm, INTERCEPT_DR5_WRITE);
set_dr_intercept(svm, INTERCEPT_DR6_WRITE);
set_dr_intercept(svm, INTERCEPT_DR7_WRITE);
set_dr_intercepts(svm);
set_exception_intercept(svm, PF_VECTOR);
set_exception_intercept(svm, UD_VECTOR);
......@@ -1684,6 +1684,21 @@ static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value)
mark_dirty(svm->vmcb, VMCB_DR);
}
static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
get_debugreg(vcpu->arch.db[0], 0);
get_debugreg(vcpu->arch.db[1], 1);
get_debugreg(vcpu->arch.db[2], 2);
get_debugreg(vcpu->arch.db[3], 3);
vcpu->arch.dr6 = svm_get_dr6(vcpu);
vcpu->arch.dr7 = svm->vmcb->save.dr7;
vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_WONT_EXIT;
set_dr_intercepts(svm);
}
static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
{
struct vcpu_svm *svm = to_svm(vcpu);
......@@ -2842,6 +2857,7 @@ static int iret_interception(struct vcpu_svm *svm)
clr_intercept(svm, INTERCEPT_IRET);
svm->vcpu.arch.hflags |= HF_IRET_MASK;
svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
return 1;
}
......@@ -2974,6 +2990,17 @@ static int dr_interception(struct vcpu_svm *svm)
unsigned long val;
int err;
if (svm->vcpu.guest_debug == 0) {
/*
* No more DR vmexits; force a reload of the debug registers
* and reenter on this instruction. The next vmexit will
* retrieve the full state of the debug registers.
*/
clr_dr_intercepts(svm);
svm->vcpu.arch.switch_db_regs |= KVM_DEBUGREG_WONT_EXIT;
return 1;
}
if (!boot_cpu_has(X86_FEATURE_DECODEASSISTS))
return emulate_on_interception(svm);
......@@ -3649,7 +3676,7 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
return ret;
}
static int enable_irq_window(struct kvm_vcpu *vcpu)
static void enable_irq_window(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
......@@ -3663,16 +3690,15 @@ static int enable_irq_window(struct kvm_vcpu *vcpu)
svm_set_vintr(svm);
svm_inject_irq(svm, 0x0);
}
return 0;
}
static int enable_nmi_window(struct kvm_vcpu *vcpu)
static void enable_nmi_window(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
== HF_NMI_MASK)
return 0; /* IRET will cause a vm exit */
return; /* IRET will cause a vm exit */
/*
* Something prevents NMI from been injected. Single step over possible
......@@ -3681,7 +3707,6 @@ static int enable_nmi_window(struct kvm_vcpu *vcpu)
svm->nmi_singlestep = true;
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
update_db_bp_intercept(vcpu);
return 0;
}
static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
......@@ -4064,6 +4089,11 @@ static bool svm_invpcid_supported(void)
return false;
}
static bool svm_mpx_supported(void)
{
return false;
}
static bool svm_has_wbinvd_exit(void)
{
return true;
......@@ -4302,6 +4332,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.get_dr6 = svm_get_dr6,
.set_dr6 = svm_set_dr6,
.set_dr7 = svm_set_dr7,
.sync_dirty_debug_regs = svm_sync_dirty_debug_regs,
.cache_reg = svm_cache_reg,
.get_rflags = svm_get_rflags,
.set_rflags = svm_set_rflags,
......@@ -4345,6 +4376,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.rdtscp_supported = svm_rdtscp_supported,
.invpcid_supported = svm_invpcid_supported,
.mpx_supported = svm_mpx_supported,
.set_supported_cpuid = svm_set_supported_cpuid,
......
此差异已折叠。
此差异已折叠。
......@@ -122,9 +122,12 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception);
#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \
| XSTATE_BNDREGS | XSTATE_BNDCSR)
extern u64 host_xcr0;
extern u64 kvm_supported_xcr0(void);
extern unsigned int min_timer_period_us;
extern struct static_key kvm_no_apic_vcpu;
......
此差异已折叠。
......@@ -192,7 +192,7 @@ struct kvm_async_pf {
void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
struct kvm_arch_async_pf *arch);
int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
#endif
......@@ -297,6 +297,14 @@ static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memsl
return ALIGN(memslot->npages, BITS_PER_LONG) / 8;
}
struct kvm_s390_adapter_int {
u64 ind_addr;
u64 summary_addr;
u64 ind_offset;
u32 summary_offset;
u32 adapter_id;
};
struct kvm_kernel_irq_routing_entry {
u32 gsi;
u32 type;
......@@ -309,6 +317,7 @@ struct kvm_kernel_irq_routing_entry {
unsigned pin;
} irqchip;
struct msi_msg msi;
struct kvm_s390_adapter_int adapter;
};
struct hlist_node link;
};
......@@ -401,7 +410,9 @@ struct kvm {
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
#endif
long tlbs_dirty;
/* Protected by mmu_lock */
bool tlbs_dirty;
struct list_head devices;
};
......@@ -911,7 +922,11 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
#ifdef CONFIG_S390
#define KVM_MAX_IRQ_ROUTES 4096 //FIXME: we can have more than that...
#else
#define KVM_MAX_IRQ_ROUTES 1024
#endif
int kvm_setup_default_irq_routing(struct kvm *kvm);
int kvm_set_irq_routing(struct kvm *kvm,
......@@ -1064,6 +1079,7 @@ extern struct kvm_device_ops kvm_mpic_ops;
extern struct kvm_device_ops kvm_xics_ops;
extern struct kvm_device_ops kvm_vfio_ops;
extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
extern struct kvm_device_ops kvm_flic_ops;
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
......
......@@ -413,6 +413,8 @@ struct kvm_s390_psw {
#define KVM_S390_PROGRAM_INT 0xfffe0001u
#define KVM_S390_SIGP_SET_PREFIX 0xfffe0002u
#define KVM_S390_RESTART 0xfffe0003u
#define KVM_S390_INT_PFAULT_INIT 0xfffe0004u
#define KVM_S390_INT_PFAULT_DONE 0xfffe0005u
#define KVM_S390_MCHK 0xfffe1000u
#define KVM_S390_INT_VIRTIO 0xffff2603u
#define KVM_S390_INT_SERVICE 0xffff2401u
......@@ -434,6 +436,69 @@ struct kvm_s390_interrupt {
__u64 parm64;
};
struct kvm_s390_io_info {
__u16 subchannel_id;
__u16 subchannel_nr;
__u32 io_int_parm;
__u32 io_int_word;
};
struct kvm_s390_ext_info {
__u32 ext_params;
__u32 pad;
__u64 ext_params2;
};
struct kvm_s390_pgm_info {
__u64 trans_exc_code;
__u64 mon_code;
__u64 per_address;
__u32 data_exc_code;
__u16 code;
__u16 mon_class_nr;
__u8 per_code;
__u8 per_atmid;
__u8 exc_access_id;
__u8 per_access_id;
__u8 op_access_id;
__u8 pad[3];
};
struct kvm_s390_prefix_info {
__u32 address;
};
struct kvm_s390_extcall_info {
__u16 code;
};
struct kvm_s390_emerg_info {
__u16 code;
};
struct kvm_s390_mchk_info {
__u64 cr14;
__u64 mcic;
__u64 failing_storage_address;
__u32 ext_damage_code;
__u32 pad;
__u8 fixed_logout[16];
};
struct kvm_s390_irq {
__u64 type;
union {
struct kvm_s390_io_info io;
struct kvm_s390_ext_info ext;
struct kvm_s390_pgm_info pgm;
struct kvm_s390_emerg_info emerg;
struct kvm_s390_extcall_info extcall;
struct kvm_s390_prefix_info prefix;
struct kvm_s390_mchk_info mchk;
char reserved[64];
} u;
};
/* for KVM_SET_GUEST_DEBUG */
#define KVM_GUESTDBG_ENABLE 0x00000001
......@@ -675,6 +740,9 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_SPAPR_MULTITCE 94
#define KVM_CAP_EXT_EMUL_CPUID 95
#define KVM_CAP_HYPERV_TIME 96
#define KVM_CAP_IOAPIC_POLARITY_IGNORED 97
#define KVM_CAP_ENABLE_CAP_VM 98
#define KVM_CAP_S390_IRQCHIP 99
#ifdef KVM_CAP_IRQ_ROUTING
......@@ -690,9 +758,18 @@ struct kvm_irq_routing_msi {
__u32 pad;
};
struct kvm_irq_routing_s390_adapter {
__u64 ind_addr;
__u64 summary_addr;
__u64 ind_offset;
__u32 summary_offset;
__u32 adapter_id;
};
/* gsi routing entry types */
#define KVM_IRQ_ROUTING_IRQCHIP 1
#define KVM_IRQ_ROUTING_MSI 2
#define KVM_IRQ_ROUTING_S390_ADAPTER 3
struct kvm_irq_routing_entry {
__u32 gsi;
......@@ -702,6 +779,7 @@ struct kvm_irq_routing_entry {
union {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
__u32 pad[8];
} u;
};
......@@ -855,6 +933,7 @@ struct kvm_device_attr {
#define KVM_DEV_VFIO_GROUP_ADD 1
#define KVM_DEV_VFIO_GROUP_DEL 2
#define KVM_DEV_TYPE_ARM_VGIC_V2 5
#define KVM_DEV_TYPE_FLIC 6
/*
* ioctls for VM fds
......@@ -1009,6 +1088,10 @@ struct kvm_s390_ucas_mapping {
/* Available with KVM_CAP_DEBUGREGS */
#define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs)
#define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs)
/*
* vcpu version available with KVM_ENABLE_CAP
* vm version available with KVM_CAP_ENABLE_CAP_VM
*/
#define KVM_ENABLE_CAP _IOW(KVMIO, 0xa3, struct kvm_enable_cap)
/* Available with KVM_CAP_XSAVE */
#define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave)
......
......@@ -22,6 +22,10 @@ config KVM_MMIO
config KVM_ASYNC_PF
bool
# Toggle to switch between direct notification and batch job
config KVM_ASYNC_PF_SYNC
bool
config HAVE_KVM_MSI
bool
......
......@@ -28,6 +28,21 @@
#include "async_pf.h"
#include <trace/events/kvm.h>
static inline void kvm_async_page_present_sync(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work)
{
#ifdef CONFIG_KVM_ASYNC_PF_SYNC
kvm_arch_async_page_present(vcpu, work);
#endif
}
static inline void kvm_async_page_present_async(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work)
{
#ifndef CONFIG_KVM_ASYNC_PF_SYNC
kvm_arch_async_page_present(vcpu, work);
#endif
}
static struct kmem_cache *async_pf_cache;
int kvm_async_pf_init(void)
......@@ -69,6 +84,7 @@ static void async_pf_execute(struct work_struct *work)
down_read(&mm->mmap_sem);
get_user_pages(current, mm, addr, 1, 1, 0, NULL, NULL);
up_read(&mm->mmap_sem);
kvm_async_page_present_sync(vcpu, apf);
unuse_mm(mm);
spin_lock(&vcpu->async_pf.lock);
......@@ -97,11 +113,16 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
list_entry(vcpu->async_pf.queue.next,
typeof(*work), queue);
list_del(&work->queue);
#ifdef CONFIG_KVM_ASYNC_PF_SYNC
flush_work(&work->work);
#else
if (cancel_work_sync(&work->work)) {
mmdrop(work->mm);
kvm_put_kvm(vcpu->kvm); /* == work->vcpu->kvm */
kmem_cache_free(async_pf_cache, work);
}
#endif
}
spin_lock(&vcpu->async_pf.lock);
......@@ -130,7 +151,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
spin_unlock(&vcpu->async_pf.lock);
kvm_arch_async_page_ready(vcpu, work);
kvm_arch_async_page_present(vcpu, work);
kvm_async_page_present_async(vcpu, work);
list_del(&work->queue);
vcpu->async_pf.queued--;
......@@ -138,7 +159,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
}
}
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
struct kvm_arch_async_pf *arch)
{
struct kvm_async_pf *work;
......@@ -159,7 +180,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
work->wakeup_all = false;
work->vcpu = vcpu;
work->gva = gva;
work->addr = gfn_to_hva(vcpu->kvm, gfn);
work->addr = hva;
work->arch = *arch;
work->mm = current->mm;
atomic_inc(&work->mm->mm_count);
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册