提交 1ebdbeb0 编写于 作者: L Linus Torvalds

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Proper emulation of the OSLock feature of the debug architecture

   - Scalibility improvements for the MMU lock when dirty logging is on

   - New VMID allocator, which will eventually help with SVA in VMs

   - Better support for PMUs in heterogenous systems

   - PSCI 1.1 support, enabling support for SYSTEM_RESET2

   - Implement CONFIG_DEBUG_LIST at EL2

   - Make CONFIG_ARM64_ERRATUM_2077057 default y

   - Reduce the overhead of VM exit when no interrupt is pending

   - Remove traces of 32bit ARM host support from the documentation

   - Updated vgic selftests

   - Various cleanups, doc updates and spelling fixes

  RISC-V:
   - Prevent KVM_COMPAT from being selected

   - Optimize __kvm_riscv_switch_to() implementation

   - RISC-V SBI v0.3 support

  s390:
   - memop selftest

   - fix SCK locking

   - adapter interruptions virtualization for secure guests

   - add Claudio Imbrenda as maintainer

   - first step to do proper storage key checking

  x86:
   - Continue switching kvm_x86_ops to static_call(); introduce
     static_call_cond() and __static_call_ret0 when applicable.

   - Cleanup unused arguments in several functions

   - Synthesize AMD 0x80000021 leaf

   - Fixes and optimization for Hyper-V sparse-bank hypercalls

   - Implement Hyper-V's enlightened MSR bitmap for nested SVM

   - Remove MMU auditing

   - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
     page tracking is enabled

   - Cleanup the implementation of the guest PGD cache

   - Preparation for the implementation of Intel IPI virtualization

   - Fix some segment descriptor checks in the emulator

   - Allow AMD AVIC support on systems with physical APIC ID above 255

   - Better API to disable virtualization quirks

   - Fixes and optimizations for the zapping of page tables:

      - Zap roots in two passes, avoiding RCU read-side critical
        sections that last too long for very large guests backed by 4
        KiB SPTEs.

      - Zap invalid and defunct roots asynchronously via
        concurrency-managed work queue.

      - Allowing yielding when zapping TDP MMU roots in response to the
        root's last reference being put.

      - Batch more TLB flushes with an RCU trick. Whoever frees the
        paging structure now holds RCU as a proxy for all vCPUs running
        in the guest, i.e. to prolongs the grace period on their behalf.
        It then kicks the the vCPUs out of guest mode before doing
        rcu_read_unlock().

  Generic:
   - Introduce __vcalloc and use it for very large allocations that need
     memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  KVM: use kvcalloc for array allocations
  KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
  kvm: x86: Require const tsc for RT
  KVM: x86: synthesize CPUID leaf 0x80000021h if useful
  KVM: x86: add support for CPUID leaf 0x80000021
  KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
  Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
  kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
  KVM: arm64: fix typos in comments
  KVM: arm64: Generalise VM features into a set of flags
  KVM: s390: selftests: Add error memop tests
  KVM: s390: selftests: Add more copy memop tests
  KVM: s390: selftests: Add named stages for memop test
  KVM: s390: selftests: Add macro as abstraction for MEM_OP
  KVM: s390: selftests: Split memop tests
  KVM: s390x: fix SCK locking
  RISC-V: KVM: Implement SBI HSM suspend call
  RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
  RISC-V: Add SBI HSM suspend related defines
  RISC-V: KVM: Implement SBI v0.3 SRST extension
  ...
......@@ -2366,13 +2366,35 @@
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
kvm.eager_page_split=
[KVM,X86] Controls whether or not KVM will try to
proactively split all huge pages during dirty logging.
Eager page splitting reduces interruptions to vCPU
execution by eliminating the write-protection faults
and MMU lock contention that would otherwise be
required to split huge pages lazily.
VM workloads that rarely perform writes or that write
only to a small region of VM memory may benefit from
disabling eager page splitting to allow huge pages to
still be used for reads.
The behavior of eager page splitting depends on whether
KVM_DIRTY_LOG_INITIALLY_SET is enabled or disabled. If
disabled, all huge pages in a memslot will be eagerly
split when dirty logging is enabled on that memslot. If
enabled, eager page splitting will be performed during
the KVM_CLEAR_DIRTY ioctl, and only for the pages being
cleared.
Eager page splitting currently only supports splitting
huge pages mapped by the TDP MMU.
Default is Y (on).
kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
Default is false (don't support).
kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit
KVM MMU at runtime.
Default is 0 (off)
kvm.nx_huge_pages=
[KVM] Controls the software workaround for the
X86_BUG_ITLB_MULTIHIT bug.
......
此差异已折叠。
......@@ -70,7 +70,7 @@ irqchip.
-ENODEV PMUv3 not supported or GIC not initialized
-ENXIO PMUv3 not properly configured or in-kernel irqchip not
configured as required prior to calling this attribute
-EBUSY PMUv3 already initialized
-EBUSY PMUv3 already initialized or a VCPU has already run
-EINVAL Invalid filter range
======= ======================================================
......@@ -104,11 +104,43 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
isn't strictly speaking an event. Filtering the cycle counter is possible
using event 0x11 (CPU_CYCLES).
1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
------------------------------------------
:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
identifier.
:Returns:
======= ====================================================
-EBUSY PMUv3 already initialized, a VCPU has already run or
an event filter has already been set
-EFAULT Error accessing the PMU identifier
-ENXIO PMU not found
-ENODEV PMUv3 not supported or GIC not initialized
-ENOMEM Could not allocate memory
======= ====================================================
Request that the VCPU uses the specified hardware PMU when creating guest events
for the purpose of PMU emulation. The PMU identifier can be read from the "type"
file for the desired PMU instance under /sys/devices (or, equivalent,
/sys/bus/even_source). This attribute is particularly useful on heterogeneous
systems where there are at least two CPU PMUs on the system. The PMU that is set
for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU
if a PMU event filter is already present.
Note that KVM will not make any attempts to run the VCPU on the physical CPUs
associated with the PMU specified by this attribute. This is entirely left to
userspace. However, attempting to run the VCPU on a physical CPU not supported
by the PMU will fail and KVM_RUN will return with
exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
the cpu field to the processor id.
2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
=================================
:Architectures: ARM, ARM64
:Architectures: ARM64
2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER
-----------------------------------------------------------------------------
......
......@@ -112,11 +112,10 @@ KVM_REQ_TLB_FLUSH
choose to use the common kvm_flush_remote_tlbs() implementation will
need to handle this VCPU request.
KVM_REQ_MMU_RELOAD
KVM_REQ_VM_DEAD
When shadow page tables are used and memory slots are removed it's
necessary to inform each VCPU to completely refresh the tables. This
request is used for that.
This request informs all VCPUs that the VM is dead and unusable, e.g. due to
fatal error or because the VM's state has been intentionally destroyed.
KVM_REQ_UNBLOCK
......
......@@ -10598,8 +10598,8 @@ F: arch/riscv/kvm/
KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
M: Christian Borntraeger <borntraeger@linux.ibm.com>
M: Janosch Frank <frankja@linux.ibm.com>
M: Claudio Imbrenda <imbrenda@linux.ibm.com>
R: David Hildenbrand <david@redhat.com>
R: Claudio Imbrenda <imbrenda@linux.ibm.com>
L: kvm@vger.kernel.org
S: Supported
W: http://www.ibm.com/developerworks/linux/linux390/
......
......@@ -686,6 +686,7 @@ config ARM64_ERRATUM_2051678
config ARM64_ERRATUM_2077057
bool "Cortex-A510: 2077057: workaround software-step corrupting SPSR_EL2"
default y
help
This option adds the workaround for ARM Cortex-A510 erratum 2077057.
Affected Cortex-A510 may corrupt SPSR_EL2 when the a step exception is
......
......@@ -50,6 +50,8 @@
#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
KVM_DIRTY_LOG_INITIALLY_SET)
#define KVM_HAVE_MMU_RWLOCK
/*
* Mode of operation configurable with kvm-arm.mode early param.
* See Documentation/admin-guide/kernel-parameters.txt for more information.
......@@ -71,9 +73,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
struct kvm_vmid {
/* The VMID generation used for the virt. memory system */
u64 vmid_gen;
u32 vmid;
atomic64_t id;
};
struct kvm_s2_mmu {
......@@ -122,20 +122,24 @@ struct kvm_arch {
* should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
* supported.
*/
bool return_nisv_io_abort_to_user;
#define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER 0
/* Memory Tagging Extension enabled for the guest */
#define KVM_ARCH_FLAG_MTE_ENABLED 1
/* At least one vCPU has ran in the VM */
#define KVM_ARCH_FLAG_HAS_RAN_ONCE 2
unsigned long flags;
/*
* VM-wide PMU filter, implemented as a bitmap and big enough for
* up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
*/
unsigned long *pmu_filter;
unsigned int pmuver;
struct arm_pmu *arm_pmu;
cpumask_var_t supported_cpus;
u8 pfr0_csv2;
u8 pfr0_csv3;
/* Memory Tagging Extension enabled for the guest */
bool mte_enabled;
};
struct kvm_vcpu_fault_info {
......@@ -171,6 +175,7 @@ enum vcpu_sysreg {
PAR_EL1, /* Physical Address Register */
MDSCR_EL1, /* Monitor Debug System Control Register */
MDCCINT_EL1, /* Monitor Debug Comms Channel Interrupt Enable Reg */
OSLSR_EL1, /* OS Lock Status Register */
DISR_EL1, /* Deferred Interrupt Status Register */
/* Performance Monitors Registers */
......@@ -435,6 +440,7 @@ struct kvm_vcpu_arch {
#define KVM_ARM64_DEBUG_STATE_SAVE_SPE (1 << 12) /* Save SPE context if active */
#define KVM_ARM64_DEBUG_STATE_SAVE_TRBE (1 << 13) /* Save TRBE context if active */
#define KVM_ARM64_FP_FOREIGN_FPSTATE (1 << 14)
#define KVM_ARM64_ON_UNSUPPORTED_CPU (1 << 15) /* Physical CPU not in supported_cpus */
#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
KVM_GUESTDBG_USE_SW_BP | \
......@@ -453,6 +459,15 @@ struct kvm_vcpu_arch {
#define vcpu_has_ptrauth(vcpu) false
#endif
#define vcpu_on_unsupported_cpu(vcpu) \
((vcpu)->arch.flags & KVM_ARM64_ON_UNSUPPORTED_CPU)
#define vcpu_set_on_unsupported_cpu(vcpu) \
((vcpu)->arch.flags |= KVM_ARM64_ON_UNSUPPORTED_CPU)
#define vcpu_clear_on_unsupported_cpu(vcpu) \
((vcpu)->arch.flags &= ~KVM_ARM64_ON_UNSUPPORTED_CPU)
#define vcpu_gp_regs(v) (&(v)->arch.ctxt.regs)
/*
......@@ -692,6 +707,12 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
extern unsigned int kvm_arm_vmid_bits;
int kvm_arm_vmid_alloc_init(void);
void kvm_arm_vmid_alloc_free(void);
void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
void kvm_arm_vmid_clear_active(void);
static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
{
vcpu_arch->steal.base = GPA_INVALID;
......@@ -730,6 +751,10 @@ void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
#define kvm_vcpu_os_lock_enabled(vcpu) \
(!!(__vcpu_sys_reg(vcpu, OSLSR_EL1) & SYS_OSLSR_OSLK))
int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
......@@ -791,7 +816,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
#define kvm_arm_vcpu_sve_finalized(vcpu) \
((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
#define kvm_has_mte(kvm) \
(system_supports_mte() && \
test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))
#define kvm_vcpu_has_pmu(vcpu) \
(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
......
......@@ -115,6 +115,7 @@ alternative_cb_end
#include <asm/cache.h>
#include <asm/cacheflush.h>
#include <asm/mmu_context.h>
#include <asm/kvm_host.h>
void kvm_update_va_mask(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
......@@ -266,7 +267,8 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
baddr = mmu->pgd_phys;
vmid_field = (u64)READ_ONCE(vmid->vmid) << VTTBR_VMID_SHIFT;
vmid_field = atomic64_read(&vmid->id) << VTTBR_VMID_SHIFT;
vmid_field &= VTTBR_VMID_MASK(kvm_arm_vmid_bits);
return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
}
......
......@@ -128,8 +128,16 @@
#define SYS_DBGWVRn_EL1(n) sys_reg(2, 0, 0, n, 6)
#define SYS_DBGWCRn_EL1(n) sys_reg(2, 0, 0, n, 7)
#define SYS_MDRAR_EL1 sys_reg(2, 0, 1, 0, 0)
#define SYS_OSLAR_EL1 sys_reg(2, 0, 1, 0, 4)
#define SYS_OSLAR_OSLK BIT(0)
#define SYS_OSLSR_EL1 sys_reg(2, 0, 1, 1, 4)
#define SYS_OSLSR_OSLM_MASK (BIT(3) | BIT(0))
#define SYS_OSLSR_OSLM_NI 0
#define SYS_OSLSR_OSLM_IMPLEMENTED BIT(3)
#define SYS_OSLSR_OSLK BIT(1)
#define SYS_OSDLR_EL1 sys_reg(2, 0, 1, 3, 4)
#define SYS_DBGPRCR_EL1 sys_reg(2, 0, 1, 4, 4)
#define SYS_DBGCLAIMSET_EL1 sys_reg(2, 0, 7, 8, 6)
......
......@@ -367,6 +367,7 @@ struct kvm_arm_copy_mte_tags {
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
#define KVM_ARM_VCPU_PMU_V3_INIT 1
#define KVM_ARM_VCPU_PMU_V3_FILTER 2
#define KVM_ARM_VCPU_PMU_V3_SET_PMU 3
#define KVM_ARM_VCPU_TIMER_CTRL 1
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
......@@ -418,6 +419,16 @@ struct kvm_arm_copy_mte_tags {
#define KVM_PSCI_RET_INVAL PSCI_RET_INVALID_PARAMS
#define KVM_PSCI_RET_DENIED PSCI_RET_DENIED
/* arm64-specific kvm_run::system_event flags */
/*
* Reset caused by a PSCI v1.1 SYSTEM_RESET2 call.
* Valid only when the system event has a type of KVM_SYSTEM_EVENT_RESET.
*/
#define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (1ULL << 0)
/* run->fail_entry.hardware_entry_failure_reason codes. */
#define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED (1ULL << 0)
#endif
#endif /* __ARM_KVM_H__ */
......@@ -348,7 +348,13 @@ static void task_fpsimd_load(void)
/*
* Ensure FPSIMD/SVE storage in memory for the loaded context is up to
* date with respect to the CPU registers.
* date with respect to the CPU registers. Note carefully that the
* current context is the context last bound to the CPU stored in
* last, if KVM is involved this may be the guest VM context rather
* than the host thread for the VM pointed to by current. This means
* that we must always reference the state storage via last rather
* than via current, other than the TIF_ flags which KVM will
* carefully maintain for us.
*/
static void fpsimd_save(void)
{
......
......@@ -83,6 +83,9 @@ KVM_NVHE_ALIAS(__hyp_stub_vectors);
/* Kernel symbol used by icache_is_vpipt(). */
KVM_NVHE_ALIAS(__icache_flags);
/* VMID bits set by the KVM VMID allocator */
KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
/* Kernel symbols needed for cpus_have_final/const_caps checks. */
KVM_NVHE_ALIAS(arm64_const_caps_ready);
KVM_NVHE_ALIAS(cpu_hwcap_keys);
......
......@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
inject_fault.o va_layout.o handle_exit.o \
guest.o debug.o reset.o sys_regs.o \
vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
arch_timer.o trng.o\
arch_timer.o trng.o vmid.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
vgic/vgic-v3.o vgic/vgic-v4.o \
......
......@@ -53,11 +53,6 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
/* The VMID used in the VTTBR */
static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
static u32 kvm_next_vmid;
static DEFINE_SPINLOCK(kvm_vmid_lock);
static bool vgic_present;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
......@@ -89,7 +84,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
r = 0;
kvm->arch.return_nisv_io_abort_to_user = true;
set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
&kvm->arch.flags);
break;
case KVM_CAP_ARM_MTE:
mutex_lock(&kvm->lock);
......@@ -97,7 +93,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = -EINVAL;
} else {
r = 0;
kvm->arch.mte_enabled = true;
set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
}
mutex_unlock(&kvm->lock);
break;
......@@ -150,6 +146,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
goto out_free_stage2_pgd;
if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL))
goto out_free_stage2_pgd;
cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
kvm_vgic_early_init(kvm);
/* The maximum number of VCPUs is limited by the host's GIC model */
......@@ -176,6 +176,7 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
void kvm_arch_destroy_vm(struct kvm *kvm)
{
bitmap_free(kvm->arch.pmu_filter);
free_cpumask_var(kvm->arch.supported_cpus);
kvm_vgic_destroy(kvm);
......@@ -411,6 +412,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (vcpu_has_ptrauth(vcpu))
vcpu_ptrauth_disable(vcpu);
kvm_arch_vcpu_load_debug_state_flags(vcpu);
if (!cpumask_test_cpu(smp_processor_id(), vcpu->kvm->arch.supported_cpus))
vcpu_set_on_unsupported_cpu(vcpu);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
......@@ -422,7 +426,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_timer_vcpu_put(vcpu);
kvm_vgic_put(vcpu);
kvm_vcpu_pmu_restore_host(vcpu);
kvm_arm_vmid_clear_active();
vcpu_clear_on_unsupported_cpu(vcpu);
vcpu->cpu = -1;
}
......@@ -489,87 +495,6 @@ unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
}
#endif
/* Just ensure a guest exit from a particular CPU */
static void exit_vm_noop(void *info)
{
}
void force_vm_exit(const cpumask_t *mask)
{
preempt_disable();
smp_call_function_many(mask, exit_vm_noop, NULL, true);
preempt_enable();
}
/**
* need_new_vmid_gen - check that the VMID is still valid
* @vmid: The VMID to check
*
* return true if there is a new generation of VMIDs being used
*
* The hardware supports a limited set of values with the value zero reserved
* for the host, so we check if an assigned value belongs to a previous
* generation, which requires us to assign a new value. If we're the first to
* use a VMID for the new generation, we must flush necessary caches and TLBs
* on all CPUs.
*/
static bool need_new_vmid_gen(struct kvm_vmid *vmid)
{
u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
return unlikely(READ_ONCE(vmid->vmid_gen) != current_vmid_gen);
}
/**
* update_vmid - Update the vmid with a valid VMID for the current generation
* @vmid: The stage-2 VMID information struct
*/
static void update_vmid(struct kvm_vmid *vmid)
{
if (!need_new_vmid_gen(vmid))
return;
spin_lock(&kvm_vmid_lock);
/*
* We need to re-check the vmid_gen here to ensure that if another vcpu
* already allocated a valid vmid for this vm, then this vcpu should
* use the same vmid.
*/
if (!need_new_vmid_gen(vmid)) {
spin_unlock(&kvm_vmid_lock);
return;
}
/* First user of a new VMID generation? */
if (unlikely(kvm_next_vmid == 0)) {
atomic64_inc(&kvm_vmid_gen);
kvm_next_vmid = 1;
/*
* On SMP we know no other CPUs can use this CPU's or each
* other's VMID after force_vm_exit returns since the
* kvm_vmid_lock blocks them from reentry to the guest.
*/
force_vm_exit(cpu_all_mask);
/*
* Now broadcast TLB + ICACHE invalidation over the inner
* shareable domain to make sure all data structures are
* clean.
*/
kvm_call_hyp(__kvm_flush_vm_context);
}
WRITE_ONCE(vmid->vmid, kvm_next_vmid);
kvm_next_vmid++;
kvm_next_vmid &= (1 << kvm_get_vmid_bits()) - 1;
smp_wmb();
WRITE_ONCE(vmid->vmid_gen, atomic64_read(&kvm_vmid_gen));
spin_unlock(&kvm_vmid_lock);
}
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
{
return vcpu->arch.target >= 0;
......@@ -634,6 +559,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
if (kvm_vm_is_protected(kvm))
kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
mutex_lock(&kvm->lock);
set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags);
mutex_unlock(&kvm->lock);
return ret;
}
......@@ -792,8 +721,15 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret)
}
}
if (unlikely(vcpu_on_unsupported_cpu(vcpu))) {
run->exit_reason = KVM_EXIT_FAIL_ENTRY;
run->fail_entry.hardware_entry_failure_reason = KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED;
run->fail_entry.cpu = smp_processor_id();
*ret = 0;
return true;
}
return kvm_request_pending(vcpu) ||
need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
xfer_to_guest_mode_work_pending();
}
......@@ -855,8 +791,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
if (!ret)
ret = 1;
update_vmid(&vcpu->arch.hw_mmu->vmid);
check_vcpu_requests(vcpu);
/*
......@@ -866,6 +800,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
*/
preempt_disable();
/*
* The VMID allocator only tracks active VMIDs per
* physical CPU, and therefore the VMID allocated may not be
* preserved on VMID roll-over if the task was preempted,
* making a thread's VMID inactive. So we need to call
* kvm_arm_vmid_update() in non-premptible context.
*/
kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid);
kvm_pmu_flush_hwstate(vcpu);
local_irq_disable();
......@@ -945,9 +888,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* context synchronization event) is necessary to ensure that
* pending interrupts are taken.
*/
local_irq_enable();
isb();
local_irq_disable();
if (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_IRQ) {
local_irq_enable();
isb();
local_irq_disable();
}
guest_timing_exit_irqoff();
......@@ -1742,7 +1687,7 @@ static void init_cpu_logical_map(void)
/*
* Copy the MPIDR <-> logical CPU ID mapping to hyp.
* Only copy the set of online CPUs whose features have been chacked
* Only copy the set of online CPUs whose features have been checked
* against the finalized system capabilities. The hypervisor will not
* allow any other CPUs from the `possible` set to boot.
*/
......@@ -2159,6 +2104,12 @@ int kvm_arch_init(void *opaque)
if (err)
return err;
err = kvm_arm_vmid_alloc_init();
if (err) {
kvm_err("Failed to initialize VMID allocator.\n");
return err;
}
if (!in_hyp_mode) {
err = init_hyp_mode();
if (err)
......@@ -2198,6 +2149,7 @@ int kvm_arch_init(void *opaque)
if (!in_hyp_mode)
teardown_hyp_mode();
out_err:
kvm_arm_vmid_alloc_free();
return err;
}
......
......@@ -105,9 +105,11 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
* - Userspace is using the hardware to debug the guest
* (KVM_GUESTDBG_USE_HW is set).
* - The guest is not using debug (KVM_ARM64_DEBUG_DIRTY is clear).
* - The guest has enabled the OS Lock (debug exceptions are blocked).
*/
if ((vcpu->guest_debug & KVM_GUESTDBG_USE_HW) ||
!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY) ||
kvm_vcpu_os_lock_enabled(vcpu))
vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
......@@ -160,8 +162,8 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
kvm_arm_setup_mdcr_el2(vcpu);
/* Is Guest debugging in effect? */
if (vcpu->guest_debug) {
/* Check if we need to use the debug registers. */
if (vcpu->guest_debug || kvm_vcpu_os_lock_enabled(vcpu)) {
/* Save guest debug state */
save_guest_debug_regs(vcpu);
......@@ -223,6 +225,19 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
trace_kvm_arm_set_regset("WAPTS", get_num_wrps(),
&vcpu->arch.debug_ptr->dbg_wcr[0],
&vcpu->arch.debug_ptr->dbg_wvr[0]);
/*
* The OS Lock blocks debug exceptions in all ELs when it is
* enabled. If the guest has enabled the OS Lock, constrain its
* effects to the guest. Emulate the behavior by clearing
* MDSCR_EL1.MDE. In so doing, we ensure that host debug
* exceptions are unaffected by guest configuration of the OS
* Lock.
*/
} else if (kvm_vcpu_os_lock_enabled(vcpu)) {
mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
mdscr &= ~DBG_MDSCR_MDE;
vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
}
}
......@@ -244,7 +259,10 @@ void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
{
trace_kvm_arm_clear_debug(vcpu->guest_debug);
if (vcpu->guest_debug) {
/*
* Restore the guest's debug registers if we were using them.
*/
if (vcpu->guest_debug || kvm_vcpu_os_lock_enabled(vcpu)) {
restore_guest_debug_regs(vcpu);
/*
......
......@@ -84,6 +84,11 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED;
}
/*
* Called just before entering the guest once we are no longer
* preemptable. Syncs the host's TIF_FOREIGN_FPSTATE with the KVM
* mirror of the flag used by the hypervisor.
*/
void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
{
if (test_thread_flag(TIF_FOREIGN_FPSTATE))
......@@ -93,10 +98,11 @@ void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
}
/*
* If the guest FPSIMD state was loaded, update the host's context
* tracking data mark the CPU FPSIMD regs as dirty and belonging to vcpu
* so that they will be written back if the kernel clobbers them due to
* kernel-mode NEON before re-entry into the guest.
* Called just after exiting the guest. If the guest FPSIMD state
* was loaded, update the host's context tracking data mark the CPU
* FPSIMD regs as dirty and belonging to vcpu so that they will be
* written back if the kernel clobbers them due to kernel-mode NEON
* before re-entry into the guest.
*/
void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
{
......
......@@ -282,7 +282,7 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
break;
/*
* Otherwide, this is a priviledged mode, and *all* the
* Otherwise, this is a privileged mode, and *all* the
* registers must be narrowed to 32bit.
*/
default:
......
......@@ -248,7 +248,7 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
case ARM_EXCEPTION_HYP_GONE:
/*
* EL2 has been reset to the hyp-stub. This happens when a guest
* is pre-empted by kvm_reboot()'s shutdown call.
* is pre-emptied by kvm_reboot()'s shutdown call.
*/
run->exit_reason = KVM_EXIT_FAIL_ENTRY;
return 0;
......
......@@ -173,6 +173,8 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
return false;
/* Valid trap. Switch the context: */
/* First disable enough traps to allow us to update the registers */
if (has_vhe()) {
reg = CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN;
if (sve_guest)
......@@ -188,11 +190,13 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
}
isb();
/* Write out the host state if it's in the registers */
if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
}
/* Restore the guest state */
if (sve_guest)
__hyp_sve_restore_guest(vcpu);
else
......
......@@ -13,10 +13,11 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
lib-objs := $(addprefix ../../../lib/, $(lib-objs))
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
hyp-main.o hyp-smp.o psci-relay.o early_alloc.o page_alloc.o \
cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
obj-$(CONFIG_DEBUG_LIST) += list_debug.o
obj-y += $(lib-objs)
##
......
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2022 - Google LLC
* Author: Keir Fraser <keirf@google.com>
*/
#include <linux/list.h>
#include <linux/bug.h>
static inline __must_check bool nvhe_check_data_corruption(bool v)
{
return v;
}
#define NVHE_CHECK_DATA_CORRUPTION(condition) \
nvhe_check_data_corruption(({ \
bool corruption = unlikely(condition); \
if (corruption) { \
if (IS_ENABLED(CONFIG_BUG_ON_DATA_CORRUPTION)) { \
BUG_ON(1); \
} else \
WARN_ON(1); \
} \
corruption; \
}))
/* The predicates checked here are taken from lib/list_debug.c. */
bool __list_add_valid(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
if (NVHE_CHECK_DATA_CORRUPTION(next->prev != prev) ||
NVHE_CHECK_DATA_CORRUPTION(prev->next != next) ||
NVHE_CHECK_DATA_CORRUPTION(new == prev || new == next))
return false;
return true;
}
bool __list_del_entry_valid(struct list_head *entry)
{
struct list_head *prev, *next;
prev = entry->prev;
next = entry->next;
if (NVHE_CHECK_DATA_CORRUPTION(next == LIST_POISON1) ||
NVHE_CHECK_DATA_CORRUPTION(prev == LIST_POISON2) ||
NVHE_CHECK_DATA_CORRUPTION(prev->next != entry) ||
NVHE_CHECK_DATA_CORRUPTION(next->prev != entry))
return false;
return true;
}
......@@ -138,8 +138,7 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
mmu->pgt = &host_kvm.pgt;
WRITE_ONCE(mmu->vmid.vmid_gen, 0);
WRITE_ONCE(mmu->vmid.vmid, 0);
atomic64_set(&mmu->vmid.id, 0);
return 0;
}
......
......@@ -102,7 +102,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
* Only the first struct hyp_page of a high-order page (otherwise known
* as the 'head') should have p->order set. The non-head pages should
* have p->order = HYP_NO_ORDER. Here @p may no longer be the head
* after coallescing, so make sure to mark it HYP_NO_ORDER proactively.
* after coalescing, so make sure to mark it HYP_NO_ORDER proactively.
*/
p->order = HYP_NO_ORDER;
for (; (order + 1) < pool->max_order; order++) {
......@@ -110,7 +110,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
if (!buddy)
break;
/* Take the buddy out of its list, and coallesce with @p */
/* Take the buddy out of its list, and coalesce with @p */
page_remove_from_list(buddy);
buddy->order = HYP_NO_ORDER;
p = min(p, buddy);
......
// SPDX-License-Identifier: GPL-2.0-only
/*
* Stubs for out-of-line function calls caused by re-using kernel
* infrastructure at EL2.
*
* Copyright (C) 2020 - Google LLC
*/
#include <linux/list.h>
#ifdef CONFIG_DEBUG_LIST
bool __list_add_valid(struct list_head *new, struct list_head *prev,
struct list_head *next)
{
return true;
}
bool __list_del_entry_valid(struct list_head *entry)
{
return true;
}
#endif
......@@ -135,7 +135,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
* volunteered to do so, and bail out otherwise.
*/
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
&vcpu->kvm->arch.flags)) {
run->exit_reason = KVM_EXIT_ARM_NISV;
run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu);
run->arm_nisv.fault_ipa = fault_ipa;
......
......@@ -58,7 +58,7 @@ static int stage2_apply_range(struct kvm *kvm, phys_addr_t addr,
break;
if (resched && next != end)
cond_resched_lock(&kvm->mmu_lock);
cond_resched_rwlock_write(&kvm->mmu_lock);
} while (addr = next, addr != end);
return ret;
......@@ -179,7 +179,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
phys_addr_t end = start + size;
assert_spin_locked(&kvm->mmu_lock);
lockdep_assert_held_write(&kvm->mmu_lock);
WARN_ON(size & ~PAGE_MASK);
WARN_ON(stage2_apply_range(kvm, start, end, kvm_pgtable_stage2_unmap,
may_block));
......@@ -213,13 +213,13 @@ static void stage2_flush_vm(struct kvm *kvm)
int idx, bkt;
idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
write_lock(&kvm->mmu_lock);
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, bkt, slots)
stage2_flush_memslot(kvm, memslot);
spin_unlock(&kvm->mmu_lock);
write_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
}
......@@ -615,7 +615,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
};
/**
* kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
* kvm_init_stage2_mmu - Initialise a S2 MMU structure
* @kvm: The pointer to the KVM structure
* @mmu: The pointer to the s2 MMU structure
*
......@@ -653,7 +653,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
mmu->pgt = pgt;
mmu->pgd_phys = __pa(pgt->pgd);
WRITE_ONCE(mmu->vmid.vmid_gen, 0);
return 0;
out_destroy_pgtable:
......@@ -720,13 +719,13 @@ void stage2_unmap_vm(struct kvm *kvm)
idx = srcu_read_lock(&kvm->srcu);
mmap_read_lock(current->mm);
spin_lock(&kvm->mmu_lock);
write_lock(&kvm->mmu_lock);
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, bkt, slots)
stage2_unmap_memslot(kvm, memslot);
spin_unlock(&kvm->mmu_lock);
write_unlock(&kvm->mmu_lock);
mmap_read_unlock(current->mm);
srcu_read_unlock(&kvm->srcu, idx);
}
......@@ -736,14 +735,14 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
struct kvm_pgtable *pgt = NULL;
spin_lock(&kvm->mmu_lock);
write_lock(&kvm->mmu_lock);
pgt = mmu->pgt;
if (pgt) {
mmu->pgd_phys = 0;
mmu->pgt = NULL;
free_percpu(mmu->last_vcpu_ran);
}
spin_unlock(&kvm->mmu_lock);
write_unlock(&kvm->mmu_lock);
if (pgt) {
kvm_pgtable_stage2_destroy(pgt);
......@@ -783,10 +782,10 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
if (ret)
break;
spin_lock(&kvm->mmu_lock);
write_lock(&kvm->mmu_lock);
ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
&cache);
spin_unlock(&kvm->mmu_lock);
write_unlock(&kvm->mmu_lock);
if (ret)
break;
......@@ -834,9 +833,9 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
start = memslot->base_gfn << PAGE_SHIFT;
end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
spin_lock(&kvm->mmu_lock);
write_lock(&kvm->mmu_lock);
stage2_wp_range(&kvm->arch.mmu, start, end);
spin_unlock(&kvm->mmu_lock);
write_unlock(&kvm->mmu_lock);
kvm_flush_remote_tlbs(kvm);
}
......@@ -1080,6 +1079,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
gfn_t gfn;
kvm_pfn_t pfn;
bool logging_active = memslot_is_logging(memslot);
bool logging_perm_fault = false;
unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
unsigned long vma_pagesize, fault_granule;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
......@@ -1114,6 +1114,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (logging_active) {
force_pte = true;
vma_shift = PAGE_SHIFT;
logging_perm_fault = (fault_status == FSC_PERM && write_fault);
} else {
vma_shift = get_vma_page_shift(vma, hva);
}
......@@ -1212,7 +1213,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (exec_fault && device)
return -ENOEXEC;
spin_lock(&kvm->mmu_lock);
/*
* To reduce MMU contentions and enhance concurrency during dirty
* logging dirty logging, only acquire read lock for permission
* relaxation.
*/
if (logging_perm_fault)
read_lock(&kvm->mmu_lock);
else
write_lock(&kvm->mmu_lock);
pgt = vcpu->arch.hw_mmu->pgt;
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;
......@@ -1271,7 +1280,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}
out_unlock:
spin_unlock(&kvm->mmu_lock);
if (logging_perm_fault)
read_unlock(&kvm->mmu_lock);
else
write_unlock(&kvm->mmu_lock);
kvm_set_pfn_accessed(pfn);
kvm_release_pfn_clean(pfn);
return ret != -EAGAIN ? ret : 0;
......@@ -1286,10 +1298,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
trace_kvm_access_fault(fault_ipa);
spin_lock(&vcpu->kvm->mmu_lock);
write_lock(&vcpu->kvm->mmu_lock);
mmu = vcpu->arch.hw_mmu;
kpte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
spin_unlock(&vcpu->kvm->mmu_lock);
write_unlock(&vcpu->kvm->mmu_lock);
pte = __pte(kpte);
if (pte_valid(pte))
......@@ -1692,9 +1704,9 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
gpa_t gpa = slot->base_gfn << PAGE_SHIFT;
phys_addr_t size = slot->npages << PAGE_SHIFT;
spin_lock(&kvm->mmu_lock);
write_lock(&kvm->mmu_lock);
unmap_stage2_range(&kvm->arch.mmu, gpa, size);
spin_unlock(&kvm->mmu_lock);
write_unlock(&kvm->mmu_lock);
}
/*
......
......@@ -7,6 +7,7 @@
#include <linux/cpu.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/list.h>
#include <linux/perf_event.h>
#include <linux/perf/arm_pmu.h>
#include <linux/uaccess.h>
......@@ -16,6 +17,9 @@
DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
static LIST_HEAD(arm_pmus);
static DEFINE_MUTEX(arm_pmus_lock);
static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 select_idx);
static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc);
......@@ -24,7 +28,11 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc);
static u32 kvm_pmu_event_mask(struct kvm *kvm)
{
switch (kvm->arch.pmuver) {
unsigned int pmuver;
pmuver = kvm->arch.arm_pmu->pmuver;
switch (pmuver) {
case ID_AA64DFR0_PMUVER_8_0:
return GENMASK(9, 0);
case ID_AA64DFR0_PMUVER_8_1:
......@@ -33,7 +41,7 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
case ID_AA64DFR0_PMUVER_8_7:
return GENMASK(15, 0);
default: /* Shouldn't be here, just for sanity */
WARN_ONCE(1, "Unknown PMU version %d\n", kvm->arch.pmuver);
WARN_ONCE(1, "Unknown PMU version %d\n", pmuver);
return 0;
}
}
......@@ -600,6 +608,7 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
*/
static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
{
struct arm_pmu *arm_pmu = vcpu->kvm->arch.arm_pmu;
struct kvm_pmu *pmu = &vcpu->arch.pmu;
struct kvm_pmc *pmc;
struct perf_event *event;
......@@ -636,7 +645,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
return;
memset(&attr, 0, sizeof(struct perf_event_attr));
attr.type = PERF_TYPE_RAW;
attr.type = arm_pmu->pmu.type;
attr.size = sizeof(attr);
attr.pinned = 1;
attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx);
......@@ -745,17 +754,33 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
void kvm_host_pmu_init(struct arm_pmu *pmu)
{
if (pmu->pmuver != 0 && pmu->pmuver != ID_AA64DFR0_PMUVER_IMP_DEF &&
!kvm_arm_support_pmu_v3() && !is_protected_kvm_enabled())
struct arm_pmu_entry *entry;
if (pmu->pmuver == 0 || pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF ||
is_protected_kvm_enabled())
return;
mutex_lock(&arm_pmus_lock);
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
if (!entry)
goto out_unlock;
entry->arm_pmu = pmu;
list_add_tail(&entry->entry, &arm_pmus);
if (list_is_singular(&arm_pmus))
static_branch_enable(&kvm_arm_pmu_available);
out_unlock:
mutex_unlock(&arm_pmus_lock);
}
static int kvm_pmu_probe_pmuver(void)
static struct arm_pmu *kvm_pmu_probe_armpmu(void)
{
struct perf_event_attr attr = { };
struct perf_event *event;
struct arm_pmu *pmu;
int pmuver = ID_AA64DFR0_PMUVER_IMP_DEF;
struct arm_pmu *pmu = NULL;
/*
* Create a dummy event that only counts user cycles. As we'll never
......@@ -780,19 +805,20 @@ static int kvm_pmu_probe_pmuver(void)
if (IS_ERR(event)) {
pr_err_once("kvm: pmu event creation failed %ld\n",
PTR_ERR(event));
return ID_AA64DFR0_PMUVER_IMP_DEF;
return NULL;
}
if (event->pmu) {
pmu = to_arm_pmu(event->pmu);
if (pmu->pmuver)
pmuver = pmu->pmuver;
if (pmu->pmuver == 0 ||
pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF)
pmu = NULL;
}
perf_event_disable(event);
perf_event_release_kernel(event);
return pmuver;
return pmu;
}
u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
......@@ -810,7 +836,7 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
* Don't advertise STALL_SLOT, as PMMIR_EL0 is handled
* as RAZ
*/
if (vcpu->kvm->arch.pmuver >= ID_AA64DFR0_PMUVER_8_4)
if (vcpu->kvm->arch.arm_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_4)
val &= ~BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT - 32);
base = 32;
}
......@@ -922,26 +948,64 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
return true;
}
static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
{
struct kvm *kvm = vcpu->kvm;
struct arm_pmu_entry *entry;
struct arm_pmu *arm_pmu;
int ret = -ENXIO;
mutex_lock(&kvm->lock);
mutex_lock(&arm_pmus_lock);
list_for_each_entry(entry, &arm_pmus, entry) {
arm_pmu = entry->arm_pmu;
if (arm_pmu->pmu.type == pmu_id) {
if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags) ||
(kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
ret = -EBUSY;
break;
}
kvm->arch.arm_pmu = arm_pmu;
cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus);
ret = 0;
break;
}
}
mutex_unlock(&arm_pmus_lock);
mutex_unlock(&kvm->lock);
return ret;
}
int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
{
struct kvm *kvm = vcpu->kvm;
if (!kvm_vcpu_has_pmu(vcpu))
return -ENODEV;
if (vcpu->arch.pmu.created)
return -EBUSY;
if (!vcpu->kvm->arch.pmuver)
vcpu->kvm->arch.pmuver = kvm_pmu_probe_pmuver();
if (vcpu->kvm->arch.pmuver == ID_AA64DFR0_PMUVER_IMP_DEF)
return -ENODEV;
mutex_lock(&kvm->lock);
if (!kvm->arch.arm_pmu) {
/* No PMU set, get the default one */
kvm->arch.arm_pmu = kvm_pmu_probe_armpmu();
if (!kvm->arch.arm_pmu) {
mutex_unlock(&kvm->lock);
return -ENODEV;
}
}
mutex_unlock(&kvm->lock);
switch (attr->attr) {
case KVM_ARM_VCPU_PMU_V3_IRQ: {
int __user *uaddr = (int __user *)(long)attr->addr;
int irq;
if (!irqchip_in_kernel(vcpu->kvm))
if (!irqchip_in_kernel(kvm))
return -EINVAL;
if (get_user(irq, uaddr))
......@@ -951,7 +1015,7 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
if (!(irq_is_ppi(irq) || irq_is_spi(irq)))
return -EINVAL;
if (!pmu_irq_is_valid(vcpu->kvm, irq))
if (!pmu_irq_is_valid(kvm, irq))
return -EINVAL;
if (kvm_arm_pmu_irq_initialized(vcpu))
......@@ -966,7 +1030,7 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
struct kvm_pmu_event_filter filter;
int nr_events;
nr_events = kvm_pmu_event_mask(vcpu->kvm) + 1;
nr_events = kvm_pmu_event_mask(kvm) + 1;
uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr;
......@@ -978,12 +1042,17 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
filter.action != KVM_PMU_EVENT_DENY))
return -EINVAL;
mutex_lock(&vcpu->kvm->lock);
mutex_lock(&kvm->lock);
if (!vcpu->kvm->arch.pmu_filter) {
vcpu->kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT);
if (!vcpu->kvm->arch.pmu_filter) {
mutex_unlock(&vcpu->kvm->lock);
if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
mutex_unlock(&kvm->lock);
return -EBUSY;
}
if (!kvm->arch.pmu_filter) {
kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT);
if (!kvm->arch.pmu_filter) {
mutex_unlock(&kvm->lock);
return -ENOMEM;
}
......@@ -994,20 +1063,29 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
* events, the default is to allow.
*/
if (filter.action == KVM_PMU_EVENT_ALLOW)
bitmap_zero(vcpu->kvm->arch.pmu_filter, nr_events);
bitmap_zero(kvm->arch.pmu_filter, nr_events);
else
bitmap_fill(vcpu->kvm->arch.pmu_filter, nr_events);
bitmap_fill(kvm->arch.pmu_filter, nr_events);
}
if (filter.action == KVM_PMU_EVENT_ALLOW)
bitmap_set(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
else
bitmap_clear(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
mutex_unlock(&vcpu->kvm->lock);
mutex_unlock(&kvm->lock);
return 0;
}
case KVM_ARM_VCPU_PMU_V3_SET_PMU: {
int __user *uaddr = (int __user *)(long)attr->addr;
int pmu_id;
if (get_user(pmu_id, uaddr))
return -EFAULT;
return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id);
}
case KVM_ARM_VCPU_PMU_V3_INIT:
return kvm_arm_pmu_v3_init(vcpu);
}
......@@ -1045,6 +1123,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
case KVM_ARM_VCPU_PMU_V3_IRQ:
case KVM_ARM_VCPU_PMU_V3_INIT:
case KVM_ARM_VCPU_PMU_V3_FILTER:
case KVM_ARM_VCPU_PMU_V3_SET_PMU:
if (kvm_vcpu_has_pmu(vcpu))
return 0;
}
......
......@@ -84,7 +84,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
if (!vcpu)
return PSCI_RET_INVALID_PARAMS;
if (!vcpu->arch.power_off) {
if (kvm_psci_version(source_vcpu, kvm) != KVM_ARM_PSCI_0_1)
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
return PSCI_RET_ALREADY_ON;
else
return PSCI_RET_INVALID_PARAMS;
......@@ -161,7 +161,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
return PSCI_0_2_AFFINITY_LEVEL_OFF;
}
static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type, u64 flags)
{
unsigned long i;
struct kvm_vcpu *tmp;
......@@ -181,17 +181,24 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
vcpu->run->system_event.type = type;
vcpu->run->system_event.flags = flags;
vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
}
static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0);
}
static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0);
}
static void kvm_psci_system_reset2(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET,
KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2);
}
static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
......@@ -304,24 +311,27 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
return ret;
}
static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
{
u32 psci_fn = smccc_get_function(vcpu);
u32 feature;
u32 arg;
unsigned long val;
int ret = 1;
if (minor > 1)
return -EINVAL;
switch(psci_fn) {
case PSCI_0_2_FN_PSCI_VERSION:
val = KVM_ARM_PSCI_1_0;
val = minor == 0 ? KVM_ARM_PSCI_1_0 : KVM_ARM_PSCI_1_1;
break;
case PSCI_1_0_FN_PSCI_FEATURES:
feature = smccc_get_arg1(vcpu);
val = kvm_psci_check_allowed_function(vcpu, feature);
arg = smccc_get_arg1(vcpu);
val = kvm_psci_check_allowed_function(vcpu, arg);
if (val)
break;
switch(feature) {
switch(arg) {
case PSCI_0_2_FN_PSCI_VERSION:
case PSCI_0_2_FN_CPU_SUSPEND:
case PSCI_0_2_FN64_CPU_SUSPEND:
......@@ -337,11 +347,36 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
case ARM_SMCCC_VERSION_FUNC_ID:
val = 0;
break;
case PSCI_1_1_FN_SYSTEM_RESET2:
case PSCI_1_1_FN64_SYSTEM_RESET2:
if (minor >= 1) {
val = 0;
break;
}
fallthrough;
default:
val = PSCI_RET_NOT_SUPPORTED;
break;
}
break;
case PSCI_1_1_FN_SYSTEM_RESET2:
kvm_psci_narrow_to_32bit(vcpu);
fallthrough;
case PSCI_1_1_FN64_SYSTEM_RESET2:
if (minor >= 1) {
arg = smccc_get_arg1(vcpu);
if (arg <= PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET ||
arg >= PSCI_1_1_RESET_TYPE_VENDOR_START) {
kvm_psci_system_reset2(vcpu);
vcpu_set_reg(vcpu, 0, PSCI_RET_INTERNAL_FAILURE);
return 0;
}
val = PSCI_RET_INVALID_PARAMS;
break;
}
fallthrough;
default:
return kvm_psci_0_2_call(vcpu);
}
......@@ -391,16 +426,18 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
*/
int kvm_psci_call(struct kvm_vcpu *vcpu)
{
switch (kvm_psci_version(vcpu, vcpu->kvm)) {
switch (kvm_psci_version(vcpu)) {
case KVM_ARM_PSCI_1_1:
return kvm_psci_1_x_call(vcpu, 1);
case KVM_ARM_PSCI_1_0:
return kvm_psci_1_0_call(vcpu);
return kvm_psci_1_x_call(vcpu, 0);
case KVM_ARM_PSCI_0_2:
return kvm_psci_0_2_call(vcpu);
case KVM_ARM_PSCI_0_1:
return kvm_psci_0_1_call(vcpu);
default:
return -EINVAL;
};
}
}
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
......@@ -484,7 +521,7 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
switch (reg->id) {
case KVM_REG_ARM_PSCI_VERSION:
val = kvm_psci_version(vcpu, vcpu->kvm);
val = kvm_psci_version(vcpu);
break;
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
......@@ -525,6 +562,7 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return 0;
case KVM_ARM_PSCI_0_2:
case KVM_ARM_PSCI_1_0:
case KVM_ARM_PSCI_1_1:
if (!wants_02)
return -EINVAL;
vcpu->kvm->arch.psci_version = val;
......
......@@ -44,6 +44,10 @@
* 64bit interface.
*/
static int reg_from_user(u64 *val, const void __user *uaddr, u64 id);
static int reg_to_user(void __user *uaddr, const u64 *val, u64 id);
static u64 sys_reg_to_index(const struct sys_reg_desc *reg);
static bool read_from_write_only(struct kvm_vcpu *vcpu,
struct sys_reg_params *params,
const struct sys_reg_desc *r)
......@@ -287,16 +291,55 @@ static bool trap_loregion(struct kvm_vcpu *vcpu,
return trap_raz_wi(vcpu, p, r);
}
static bool trap_oslar_el1(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
u64 oslsr;
if (!p->is_write)
return read_from_write_only(vcpu, p, r);
/* Forward the OSLK bit to OSLSR */
oslsr = __vcpu_sys_reg(vcpu, OSLSR_EL1) & ~SYS_OSLSR_OSLK;
if (p->regval & SYS_OSLAR_OSLK)
oslsr |= SYS_OSLSR_OSLK;
__vcpu_sys_reg(vcpu, OSLSR_EL1) = oslsr;
return true;
}
static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
if (p->is_write) {
return ignore_write(vcpu, p);
} else {
p->regval = (1 << 3);
return true;
}
if (p->is_write)
return write_to_read_only(vcpu, p, r);
p->regval = __vcpu_sys_reg(vcpu, r->reg);
return true;
}
static int set_oslsr_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr)
{
u64 id = sys_reg_to_index(rd);
u64 val;
int err;
err = reg_from_user(&val, uaddr, id);
if (err)
return err;
/*
* The only modifiable bit is the OSLK bit. Refuse the write if
* userspace attempts to change any other bit in the register.
*/
if ((val ^ rd->val) & ~SYS_OSLSR_OSLK)
return -EINVAL;
__vcpu_sys_reg(vcpu, rd->reg) = val;
return 0;
}
static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
......@@ -1169,10 +1212,6 @@ static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
return __access_id_reg(vcpu, p, r, true);
}
static int reg_from_user(u64 *val, const void __user *uaddr, u64 id);
static int reg_to_user(void __user *uaddr, const u64 *val, u64 id);
static u64 sys_reg_to_index(const struct sys_reg_desc *reg);
/* Visibility overrides for SVE-specific control registers */
static unsigned int sve_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
......@@ -1423,9 +1462,9 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
* Debug handling: We do trap most, if not all debug related system
* registers. The implementation is good enough to ensure that a guest
* can use these with minimal performance degradation. The drawback is
* that we don't implement any of the external debug, none of the
* OSlock protocol. This should be revisited if we ever encounter a
* more demanding guest...
* that we don't implement any of the external debug architecture.
* This should be revisited if we ever encounter a more demanding
* guest...
*/
static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_DC_ISW), access_dcsw },
......@@ -1452,8 +1491,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
DBG_BCR_BVR_WCR_WVR_EL1(15),
{ SYS_DESC(SYS_MDRAR_EL1), trap_raz_wi },
{ SYS_DESC(SYS_OSLAR_EL1), trap_raz_wi },
{ SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1 },
{ SYS_DESC(SYS_OSLAR_EL1), trap_oslar_el1 },
{ SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1, reset_val, OSLSR_EL1,
SYS_OSLSR_OSLM_IMPLEMENTED, .set_user = set_oslsr_el1, },
{ SYS_DESC(SYS_OSDLR_EL1), trap_raz_wi },
{ SYS_DESC(SYS_DBGPRCR_EL1), trap_raz_wi },
{ SYS_DESC(SYS_DBGCLAIMSET_EL1), trap_raz_wi },
......@@ -1925,10 +1965,10 @@ static const struct sys_reg_desc cp14_regs[] = {
DBGBXVR(0),
/* DBGOSLAR */
{ Op1( 0), CRn( 1), CRm( 0), Op2( 4), trap_raz_wi },
{ Op1( 0), CRn( 1), CRm( 0), Op2( 4), trap_oslar_el1 },
DBGBXVR(1),
/* DBGOSLSR */
{ Op1( 0), CRn( 1), CRm( 1), Op2( 4), trap_oslsr_el1 },
{ Op1( 0), CRn( 1), CRm( 1), Op2( 4), trap_oslsr_el1, NULL, OSLSR_EL1 },
DBGBXVR(2),
DBGBXVR(3),
/* DBGOSDLR */
......
......@@ -37,7 +37,7 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
* If you need to take multiple locks, always take the upper lock first,
* then the lower ones, e.g. first take the its_lock, then the irq_lock.
* If you are already holding a lock and need to take a higher one, you
* have to drop the lower ranking lock first and re-aquire it after having
* have to drop the lower ranking lock first and re-acquire it after having
* taken the upper one.
*
* When taking more than one ap_list_lock at the same time, always take the
......
// SPDX-License-Identifier: GPL-2.0
/*
* VMID allocator.
*
* Based on Arm64 ASID allocator algorithm.
* Please refer arch/arm64/mm/context.c for detailed
* comments on algorithm.
*
* Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
* Copyright (C) 2012 ARM Ltd.
*/
#include <linux/bitfield.h>
#include <linux/bitops.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>
unsigned int kvm_arm_vmid_bits;
static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
static atomic64_t vmid_generation;
static unsigned long *vmid_map;
static DEFINE_PER_CPU(atomic64_t, active_vmids);
static DEFINE_PER_CPU(u64, reserved_vmids);
#define VMID_MASK (~GENMASK(kvm_arm_vmid_bits - 1, 0))
#define VMID_FIRST_VERSION (1UL << kvm_arm_vmid_bits)
#define NUM_USER_VMIDS VMID_FIRST_VERSION
#define vmid2idx(vmid) ((vmid) & ~VMID_MASK)
#define idx2vmid(idx) vmid2idx(idx)
/*
* As vmid #0 is always reserved, we will never allocate one
* as below and can be treated as invalid. This is used to
* set the active_vmids on vCPU schedule out.
*/
#define VMID_ACTIVE_INVALID VMID_FIRST_VERSION
#define vmid_gen_match(vmid) \
(!(((vmid) ^ atomic64_read(&vmid_generation)) >> kvm_arm_vmid_bits))
static void flush_context(void)
{
int cpu;
u64 vmid;
bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
for_each_possible_cpu(cpu) {
vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0);
/* Preserve reserved VMID */
if (vmid == 0)
vmid = per_cpu(reserved_vmids, cpu);
__set_bit(vmid2idx(vmid), vmid_map);
per_cpu(reserved_vmids, cpu) = vmid;
}
/*
* Unlike ASID allocator, we expect less frequent rollover in
* case of VMIDs. Hence, instead of marking the CPU as
* flush_pending and issuing a local context invalidation on
* the next context-switch, we broadcast TLB flush + I-cache
* invalidation over the inner shareable domain on rollover.
*/
kvm_call_hyp(__kvm_flush_vm_context);
}
static bool check_update_reserved_vmid(u64 vmid, u64 newvmid)
{
int cpu;
bool hit = false;
/*
* Iterate over the set of reserved VMIDs looking for a match
* and update to use newvmid (i.e. the same VMID in the current
* generation).
*/
for_each_possible_cpu(cpu) {
if (per_cpu(reserved_vmids, cpu) == vmid) {
hit = true;
per_cpu(reserved_vmids, cpu) = newvmid;
}
}
return hit;
}
static u64 new_vmid(struct kvm_vmid *kvm_vmid)
{
static u32 cur_idx = 1;
u64 vmid = atomic64_read(&kvm_vmid->id);
u64 generation = atomic64_read(&vmid_generation);
if (vmid != 0) {
u64 newvmid = generation | (vmid & ~VMID_MASK);
if (check_update_reserved_vmid(vmid, newvmid)) {
atomic64_set(&kvm_vmid->id, newvmid);
return newvmid;
}
if (!__test_and_set_bit(vmid2idx(vmid), vmid_map)) {
atomic64_set(&kvm_vmid->id, newvmid);
return newvmid;
}
}
vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, cur_idx);
if (vmid != NUM_USER_VMIDS)
goto set_vmid;
/* We're out of VMIDs, so increment the global generation count */
generation = atomic64_add_return_relaxed(VMID_FIRST_VERSION,
&vmid_generation);
flush_context();
/* We have more VMIDs than CPUs, so this will always succeed */
vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, 1);
set_vmid:
__set_bit(vmid, vmid_map);
cur_idx = vmid;
vmid = idx2vmid(vmid) | generation;
atomic64_set(&kvm_vmid->id, vmid);
return vmid;
}
/* Called from vCPU sched out with preemption disabled */
void kvm_arm_vmid_clear_active(void)
{
atomic64_set(this_cpu_ptr(&active_vmids), VMID_ACTIVE_INVALID);
}
void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
{
unsigned long flags;
u64 vmid, old_active_vmid;
vmid = atomic64_read(&kvm_vmid->id);
/*
* Please refer comments in check_and_switch_context() in
* arch/arm64/mm/context.c.
*
* Unlike ASID allocator, we set the active_vmids to
* VMID_ACTIVE_INVALID on vCPU schedule out to avoid
* reserving the VMID space needlessly on rollover.
* Hence explicitly check here for a "!= 0" to
* handle the sync with a concurrent rollover.
*/
old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
if (old_active_vmid != 0 && vmid_gen_match(vmid) &&
0 != atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
old_active_vmid, vmid))
return;
raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
/* Check that our VMID belongs to the current generation. */
vmid = atomic64_read(&kvm_vmid->id);
if (!vmid_gen_match(vmid))
vmid = new_vmid(kvm_vmid);
atomic64_set(this_cpu_ptr(&active_vmids), vmid);
raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
}
/*
* Initialize the VMID allocator
*/
int kvm_arm_vmid_alloc_init(void)
{
kvm_arm_vmid_bits = kvm_get_vmid_bits();
/*
* Expect allocation after rollover to fail if we don't have
* at least one more VMID than CPUs. VMID #0 is always reserved.
*/
WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus());
atomic64_set(&vmid_generation, VMID_FIRST_VERSION);
vmid_map = kcalloc(BITS_TO_LONGS(NUM_USER_VMIDS),
sizeof(*vmid_map), GFP_KERNEL);
if (!vmid_map)
return -ENOMEM;
return 0;
}
void kvm_arm_vmid_alloc_free(void)
{
kfree(vmid_map);
}
......@@ -252,7 +252,7 @@ int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
p = kzalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
p->pfns = vzalloc(array_size(slot->npages, sizeof(*p->pfns)));
p->pfns = vcalloc(slot->npages, sizeof(*p->pfns));
if (!p->pfns) {
kfree(p);
return -ENOMEM;
......
......@@ -228,6 +228,7 @@ void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu);
void __kvm_riscv_unpriv_trap(void);
void kvm_riscv_vcpu_wfi(struct kvm_vcpu *vcpu);
unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu,
bool read_insn,
unsigned long guest_addr,
......
......@@ -12,7 +12,7 @@
#define KVM_SBI_IMPID 3
#define KVM_SBI_VERSION_MAJOR 0
#define KVM_SBI_VERSION_MINOR 2
#define KVM_SBI_VERSION_MINOR 3
struct kvm_vcpu_sbi_extension {
unsigned long extid_start;
......@@ -28,6 +28,9 @@ struct kvm_vcpu_sbi_extension {
};
void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run);
void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu,
struct kvm_run *run,
u32 type, u64 flags);
const struct kvm_vcpu_sbi_extension *kvm_vcpu_sbi_find_ext(unsigned long extid);
#endif /* __RISCV_KVM_VCPU_SBI_H__ */
......@@ -71,15 +71,32 @@ enum sbi_ext_hsm_fid {
SBI_EXT_HSM_HART_START = 0,
SBI_EXT_HSM_HART_STOP,
SBI_EXT_HSM_HART_STATUS,
SBI_EXT_HSM_HART_SUSPEND,
};
enum sbi_hsm_hart_status {
SBI_HSM_HART_STATUS_STARTED = 0,
SBI_HSM_HART_STATUS_STOPPED,
SBI_HSM_HART_STATUS_START_PENDING,
SBI_HSM_HART_STATUS_STOP_PENDING,
enum sbi_hsm_hart_state {
SBI_HSM_STATE_STARTED = 0,
SBI_HSM_STATE_STOPPED,
SBI_HSM_STATE_START_PENDING,
SBI_HSM_STATE_STOP_PENDING,
SBI_HSM_STATE_SUSPENDED,
SBI_HSM_STATE_SUSPEND_PENDING,
SBI_HSM_STATE_RESUME_PENDING,
};
#define SBI_HSM_SUSP_BASE_MASK 0x7fffffff
#define SBI_HSM_SUSP_NON_RET_BIT 0x80000000
#define SBI_HSM_SUSP_PLAT_BASE 0x10000000
#define SBI_HSM_SUSPEND_RET_DEFAULT 0x00000000
#define SBI_HSM_SUSPEND_RET_PLATFORM SBI_HSM_SUSP_PLAT_BASE
#define SBI_HSM_SUSPEND_RET_LAST SBI_HSM_SUSP_BASE_MASK
#define SBI_HSM_SUSPEND_NON_RET_DEFAULT SBI_HSM_SUSP_NON_RET_BIT
#define SBI_HSM_SUSPEND_NON_RET_PLATFORM (SBI_HSM_SUSP_NON_RET_BIT | \
SBI_HSM_SUSP_PLAT_BASE)
#define SBI_HSM_SUSPEND_NON_RET_LAST (SBI_HSM_SUSP_NON_RET_BIT | \
SBI_HSM_SUSP_BASE_MASK)
enum sbi_ext_srst_fid {
SBI_EXT_SRST_RESET = 0,
};
......
......@@ -111,7 +111,7 @@ static int sbi_cpu_is_stopped(unsigned int cpuid)
rc = sbi_hsm_hart_get_status(hartid);
if (rc == SBI_HSM_HART_STATUS_STOPPED)
if (rc == SBI_HSM_STATE_STOPPED)
return 0;
return rc;
}
......
......@@ -144,12 +144,7 @@ static int system_opcode_insn(struct kvm_vcpu *vcpu,
{
if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) {
vcpu->stat.wfi_exit_stat++;
if (!kvm_arch_vcpu_runnable(vcpu)) {
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
kvm_vcpu_halt(vcpu);
vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
}
kvm_riscv_vcpu_wfi(vcpu);
vcpu->arch.guest_context.sepc += INSN_LEN(insn);
return 1;
}
......@@ -453,6 +448,21 @@ static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
return 1;
}
/**
* kvm_riscv_vcpu_wfi -- Emulate wait for interrupt (WFI) behaviour
*
* @vcpu: The VCPU pointer
*/
void kvm_riscv_vcpu_wfi(struct kvm_vcpu *vcpu)
{
if (!kvm_arch_vcpu_runnable(vcpu)) {
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
kvm_vcpu_halt(vcpu);
vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
}
}
/**
* kvm_riscv_vcpu_unpriv_read -- Read machine word from Guest memory
*
......
......@@ -45,6 +45,7 @@ extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_base;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_time;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_ipi;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_srst;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_hsm;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_experimental;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_vendor;
......@@ -55,6 +56,7 @@ static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
&vcpu_sbi_ext_time,
&vcpu_sbi_ext_ipi,
&vcpu_sbi_ext_rfence,
&vcpu_sbi_ext_srst,
&vcpu_sbi_ext_hsm,
&vcpu_sbi_ext_experimental,
&vcpu_sbi_ext_vendor,
......@@ -79,6 +81,23 @@ void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run)
run->riscv_sbi.ret[1] = cp->a1;
}
void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu,
struct kvm_run *run,
u32 type, u64 flags)
{
unsigned long i;
struct kvm_vcpu *tmp;
kvm_for_each_vcpu(i, tmp, vcpu->kvm)
tmp->arch.power_off = true;
kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
memset(&run->system_event, 0, sizeof(run->system_event));
run->system_event.type = type;
run->system_event.flags = flags;
run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
}
int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
......
......@@ -60,9 +60,11 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
if (!target_vcpu)
return -EINVAL;
if (!target_vcpu->arch.power_off)
return SBI_HSM_HART_STATUS_STARTED;
return SBI_HSM_STATE_STARTED;
else if (vcpu->stat.generic.blocking)
return SBI_HSM_STATE_SUSPENDED;
else
return SBI_HSM_HART_STATUS_STOPPED;
return SBI_HSM_STATE_STOPPED;
}
static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
......@@ -91,6 +93,18 @@ static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
ret = 0;
}
break;
case SBI_EXT_HSM_HART_SUSPEND:
switch (cp->a0) {
case SBI_HSM_SUSPEND_RET_DEFAULT:
kvm_riscv_vcpu_wfi(vcpu);
break;
case SBI_HSM_SUSPEND_NON_RET_DEFAULT:
ret = -EOPNOTSUPP;
break;
default:
ret = -EINVAL;
}
break;
default:
ret = -EOPNOTSUPP;
}
......
......@@ -130,3 +130,47 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence = {
.extid_end = SBI_EXT_RFENCE,
.handler = kvm_sbi_ext_rfence_handler,
};
static int kvm_sbi_ext_srst_handler(struct kvm_vcpu *vcpu,
struct kvm_run *run,
unsigned long *out_val,
struct kvm_cpu_trap *utrap, bool *exit)
{
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
unsigned long funcid = cp->a6;
u32 reason = cp->a1;
u32 type = cp->a0;
int ret = 0;
switch (funcid) {
case SBI_EXT_SRST_RESET:
switch (type) {
case SBI_SRST_RESET_TYPE_SHUTDOWN:
kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
KVM_SYSTEM_EVENT_SHUTDOWN,
reason);
*exit = true;
break;
case SBI_SRST_RESET_TYPE_COLD_REBOOT:
case SBI_SRST_RESET_TYPE_WARM_REBOOT:
kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
KVM_SYSTEM_EVENT_RESET,
reason);
*exit = true;
break;
default:
ret = -EOPNOTSUPP;
}
break;
default:
ret = -EOPNOTSUPP;
}
return ret;
}
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_srst = {
.extid_start = SBI_EXT_SRST,
.extid_end = SBI_EXT_SRST,
.handler = kvm_sbi_ext_srst_handler,
};
......@@ -14,21 +14,6 @@
#include <asm/kvm_vcpu_timer.h>
#include <asm/kvm_vcpu_sbi.h>
static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
struct kvm_run *run, u32 type)
{
unsigned long i;
struct kvm_vcpu *tmp;
kvm_for_each_vcpu(i, tmp, vcpu->kvm)
tmp->arch.power_off = true;
kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
memset(&run->system_event, 0, sizeof(run->system_event));
run->system_event.type = type;
run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
}
static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long *out_val,
struct kvm_cpu_trap *utrap,
......@@ -80,7 +65,8 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
}
break;
case SBI_EXT_0_1_SHUTDOWN:
kvm_sbi_system_shutdown(vcpu, run, KVM_SYSTEM_EVENT_SHUTDOWN);
kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
KVM_SYSTEM_EVENT_SHUTDOWN, 0);
*exit = true;
break;
case SBI_EXT_0_1_REMOTE_FENCE_I:
......@@ -111,7 +97,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
default:
ret = -EINVAL;
break;
};
}
return ret;
}
......
......@@ -41,33 +41,37 @@ ENTRY(__kvm_riscv_switch_to)
REG_S s10, (KVM_ARCH_HOST_S10)(a0)
REG_S s11, (KVM_ARCH_HOST_S11)(a0)
/* Save Host and Restore Guest SSTATUS */
/* Load Guest CSR values */
REG_L t0, (KVM_ARCH_GUEST_SSTATUS)(a0)
REG_L t1, (KVM_ARCH_GUEST_HSTATUS)(a0)
REG_L t2, (KVM_ARCH_GUEST_SCOUNTEREN)(a0)
la t4, __kvm_switch_return
REG_L t5, (KVM_ARCH_GUEST_SEPC)(a0)
/* Save Host and Restore Guest SSTATUS */
csrrw t0, CSR_SSTATUS, t0
REG_S t0, (KVM_ARCH_HOST_SSTATUS)(a0)
/* Save Host and Restore Guest HSTATUS */
REG_L t1, (KVM_ARCH_GUEST_HSTATUS)(a0)
csrrw t1, CSR_HSTATUS, t1
REG_S t1, (KVM_ARCH_HOST_HSTATUS)(a0)
/* Save Host and Restore Guest SCOUNTEREN */
REG_L t2, (KVM_ARCH_GUEST_SCOUNTEREN)(a0)
csrrw t2, CSR_SCOUNTEREN, t2
REG_S t2, (KVM_ARCH_HOST_SCOUNTEREN)(a0)
/* Save Host SSCRATCH and change it to struct kvm_vcpu_arch pointer */
csrrw t3, CSR_SSCRATCH, a0
REG_S t3, (KVM_ARCH_HOST_SSCRATCH)(a0)
/* Save Host STVEC and change it to return path */
la t4, __kvm_switch_return
csrrw t4, CSR_STVEC, t4
REG_S t4, (KVM_ARCH_HOST_STVEC)(a0)
/* Save Host SSCRATCH and change it to struct kvm_vcpu_arch pointer */
csrrw t3, CSR_SSCRATCH, a0
/* Restore Guest SEPC */
REG_L t0, (KVM_ARCH_GUEST_SEPC)(a0)
csrw CSR_SEPC, t0
csrw CSR_SEPC, t5
/* Store Host CSR values */
REG_S t0, (KVM_ARCH_HOST_SSTATUS)(a0)
REG_S t1, (KVM_ARCH_HOST_HSTATUS)(a0)
REG_S t2, (KVM_ARCH_HOST_SCOUNTEREN)(a0)
REG_S t3, (KVM_ARCH_HOST_SSCRATCH)(a0)
REG_S t4, (KVM_ARCH_HOST_STVEC)(a0)
/* Restore Guest GPRs (except A0) */
REG_L ra, (KVM_ARCH_GUEST_RA)(a0)
......@@ -145,32 +149,36 @@ __kvm_switch_return:
REG_S t5, (KVM_ARCH_GUEST_T5)(a0)
REG_S t6, (KVM_ARCH_GUEST_T6)(a0)
/* Load Host CSR values */
REG_L t1, (KVM_ARCH_HOST_STVEC)(a0)
REG_L t2, (KVM_ARCH_HOST_SSCRATCH)(a0)
REG_L t3, (KVM_ARCH_HOST_SCOUNTEREN)(a0)
REG_L t4, (KVM_ARCH_HOST_HSTATUS)(a0)
REG_L t5, (KVM_ARCH_HOST_SSTATUS)(a0)
/* Save Guest SEPC */
csrr t0, CSR_SEPC
REG_S t0, (KVM_ARCH_GUEST_SEPC)(a0)
/* Restore Host STVEC */
REG_L t1, (KVM_ARCH_HOST_STVEC)(a0)
csrw CSR_STVEC, t1
/* Save Guest A0 and Restore Host SSCRATCH */
REG_L t2, (KVM_ARCH_HOST_SSCRATCH)(a0)
csrrw t2, CSR_SSCRATCH, t2
REG_S t2, (KVM_ARCH_GUEST_A0)(a0)
/* Restore Host STVEC */
csrw CSR_STVEC, t1
/* Save Guest and Restore Host SCOUNTEREN */
REG_L t3, (KVM_ARCH_HOST_SCOUNTEREN)(a0)
csrrw t3, CSR_SCOUNTEREN, t3
REG_S t3, (KVM_ARCH_GUEST_SCOUNTEREN)(a0)
/* Save Guest and Restore Host HSTATUS */
REG_L t4, (KVM_ARCH_HOST_HSTATUS)(a0)
csrrw t4, CSR_HSTATUS, t4
REG_S t4, (KVM_ARCH_GUEST_HSTATUS)(a0)
/* Save Guest and Restore Host SSTATUS */
REG_L t5, (KVM_ARCH_HOST_SSTATUS)(a0)
csrrw t5, CSR_SSTATUS, t5
/* Store Guest CSR values */
REG_S t0, (KVM_ARCH_GUEST_SEPC)(a0)
REG_S t2, (KVM_ARCH_GUEST_A0)(a0)
REG_S t3, (KVM_ARCH_GUEST_SCOUNTEREN)(a0)
REG_S t4, (KVM_ARCH_GUEST_HSTATUS)(a0)
REG_S t5, (KVM_ARCH_GUEST_SSTATUS)(a0)
/* Restore Host GPRs (except A0 and T0-T6) */
......
......@@ -12,6 +12,8 @@
#define CR0_CLOCK_COMPARATOR_SIGN BIT(63 - 10)
#define CR0_LOW_ADDRESS_PROTECTION BIT(63 - 35)
#define CR0_FETCH_PROTECTION_OVERRIDE BIT(63 - 38)
#define CR0_STORAGE_PROTECTION_OVERRIDE BIT(63 - 39)
#define CR0_EMERGENCY_SIGNAL_SUBMASK BIT(63 - 49)
#define CR0_EXTERNAL_CALL_SUBMASK BIT(63 - 50)
#define CR0_CLOCK_COMPARATOR_SUBMASK BIT(63 - 52)
......
......@@ -45,6 +45,8 @@
#define KVM_REQ_START_MIGRATION KVM_ARCH_REQ(3)
#define KVM_REQ_STOP_MIGRATION KVM_ARCH_REQ(4)
#define KVM_REQ_VSIE_RESTART KVM_ARCH_REQ(5)
#define KVM_REQ_REFRESH_GUEST_PREFIX \
KVM_ARCH_REQ_FLAGS(6, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define SIGP_CTRL_C 0x80
#define SIGP_CTRL_SCN_MASK 0x3f
......
......@@ -20,6 +20,8 @@
#define PAGE_SIZE _PAGE_SIZE
#define PAGE_MASK _PAGE_MASK
#define PAGE_DEFAULT_ACC 0
/* storage-protection override */
#define PAGE_SPO_ACC 9
#define PAGE_DEFAULT_KEY (PAGE_DEFAULT_ACC << 4)
#define HPAGE_SHIFT 20
......
......@@ -32,6 +32,28 @@ raw_copy_to_user(void __user *to, const void *from, unsigned long n);
#define INLINE_COPY_TO_USER
#endif
unsigned long __must_check
_copy_from_user_key(void *to, const void __user *from, unsigned long n, unsigned long key);
static __always_inline unsigned long __must_check
copy_from_user_key(void *to, const void __user *from, unsigned long n, unsigned long key)
{
if (likely(check_copy_size(to, n, false)))
n = _copy_from_user_key(to, from, n, key);
return n;
}
unsigned long __must_check
_copy_to_user_key(void __user *to, const void *from, unsigned long n, unsigned long key);
static __always_inline unsigned long __must_check
copy_to_user_key(void __user *to, const void *from, unsigned long n, unsigned long key)
{
if (likely(check_copy_size(from, n, true)))
n = _copy_to_user_key(to, from, n, key);
return n;
}
int __put_user_bad(void) __attribute__((noreturn));
int __get_user_bad(void) __attribute__((noreturn));
......
......@@ -80,6 +80,7 @@ enum uv_cmds_inst {
enum uv_feat_ind {
BIT_UV_FEAT_MISC = 0,
BIT_UV_FEAT_AIV = 1,
};
struct uv_cb_header {
......
......@@ -10,6 +10,7 @@
#include <linux/mm_types.h>
#include <linux/err.h>
#include <linux/pgtable.h>
#include <linux/bitfield.h>
#include <asm/gmap.h>
#include "kvm-s390.h"
......@@ -794,6 +795,108 @@ static int low_address_protection_enabled(struct kvm_vcpu *vcpu,
return 1;
}
static int vm_check_access_key(struct kvm *kvm, u8 access_key,
enum gacc_mode mode, gpa_t gpa)
{
u8 storage_key, access_control;
bool fetch_protected;
unsigned long hva;
int r;
if (access_key == 0)
return 0;
hva = gfn_to_hva(kvm, gpa_to_gfn(gpa));
if (kvm_is_error_hva(hva))
return PGM_ADDRESSING;
mmap_read_lock(current->mm);
r = get_guest_storage_key(current->mm, hva, &storage_key);
mmap_read_unlock(current->mm);
if (r)
return r;
access_control = FIELD_GET(_PAGE_ACC_BITS, storage_key);
if (access_control == access_key)
return 0;
fetch_protected = storage_key & _PAGE_FP_BIT;
if ((mode == GACC_FETCH || mode == GACC_IFETCH) && !fetch_protected)
return 0;
return PGM_PROTECTION;
}
static bool fetch_prot_override_applicable(struct kvm_vcpu *vcpu, enum gacc_mode mode,
union asce asce)
{
psw_t *psw = &vcpu->arch.sie_block->gpsw;
unsigned long override;
if (mode == GACC_FETCH || mode == GACC_IFETCH) {
/* check if fetch protection override enabled */
override = vcpu->arch.sie_block->gcr[0];
override &= CR0_FETCH_PROTECTION_OVERRIDE;
/* not applicable if subject to DAT && private space */
override = override && !(psw_bits(*psw).dat && asce.p);
return override;
}
return false;
}
static bool fetch_prot_override_applies(unsigned long ga, unsigned int len)
{
return ga < 2048 && ga + len <= 2048;
}
static bool storage_prot_override_applicable(struct kvm_vcpu *vcpu)
{
/* check if storage protection override enabled */
return vcpu->arch.sie_block->gcr[0] & CR0_STORAGE_PROTECTION_OVERRIDE;
}
static bool storage_prot_override_applies(u8 access_control)
{
/* matches special storage protection override key (9) -> allow */
return access_control == PAGE_SPO_ACC;
}
static int vcpu_check_access_key(struct kvm_vcpu *vcpu, u8 access_key,
enum gacc_mode mode, union asce asce, gpa_t gpa,
unsigned long ga, unsigned int len)
{
u8 storage_key, access_control;
unsigned long hva;
int r;
/* access key 0 matches any storage key -> allow */
if (access_key == 0)
return 0;
/*
* caller needs to ensure that gfn is accessible, so we can
* assume that this cannot fail
*/
hva = gfn_to_hva(vcpu->kvm, gpa_to_gfn(gpa));
mmap_read_lock(current->mm);
r = get_guest_storage_key(current->mm, hva, &storage_key);
mmap_read_unlock(current->mm);
if (r)
return r;
access_control = FIELD_GET(_PAGE_ACC_BITS, storage_key);
/* access key matches storage key -> allow */
if (access_control == access_key)
return 0;
if (mode == GACC_FETCH || mode == GACC_IFETCH) {
/* it is a fetch and fetch protection is off -> allow */
if (!(storage_key & _PAGE_FP_BIT))
return 0;
if (fetch_prot_override_applicable(vcpu, mode, asce) &&
fetch_prot_override_applies(ga, len))
return 0;
}
if (storage_prot_override_applicable(vcpu) &&
storage_prot_override_applies(access_control))
return 0;
return PGM_PROTECTION;
}
/**
* guest_range_to_gpas() - Calculate guest physical addresses of page fragments
* covering a logical range
......@@ -804,6 +907,7 @@ static int low_address_protection_enabled(struct kvm_vcpu *vcpu,
* @len: length of range in bytes
* @asce: address-space-control element to use for translation
* @mode: access mode
* @access_key: access key to mach the range's storage keys against
*
* Translate a logical range to a series of guest absolute addresses,
* such that the concatenation of page fragments starting at each gpa make up
......@@ -830,7 +934,8 @@ static int low_address_protection_enabled(struct kvm_vcpu *vcpu,
*/
static int guest_range_to_gpas(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
unsigned long *gpas, unsigned long len,
const union asce asce, enum gacc_mode mode)
const union asce asce, enum gacc_mode mode,
u8 access_key)
{
psw_t *psw = &vcpu->arch.sie_block->gpsw;
unsigned int offset = offset_in_page(ga);
......@@ -857,6 +962,10 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
}
if (rc)
return trans_exc(vcpu, rc, ga, ar, mode, prot);
rc = vcpu_check_access_key(vcpu, access_key, mode, asce, gpa, ga,
fragment_len);
if (rc)
return trans_exc(vcpu, rc, ga, ar, mode, PROT_TYPE_KEYC);
if (gpas)
*gpas++ = gpa;
offset = 0;
......@@ -880,16 +989,74 @@ static int access_guest_page(struct kvm *kvm, enum gacc_mode mode, gpa_t gpa,
return rc;
}
int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, void *data,
unsigned long len, enum gacc_mode mode)
static int
access_guest_page_with_key(struct kvm *kvm, enum gacc_mode mode, gpa_t gpa,
void *data, unsigned int len, u8 access_key)
{
struct kvm_memory_slot *slot;
bool writable;
gfn_t gfn;
hva_t hva;
int rc;
gfn = gpa >> PAGE_SHIFT;
slot = gfn_to_memslot(kvm, gfn);
hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
if (kvm_is_error_hva(hva))
return PGM_ADDRESSING;
/*
* Check if it's a ro memslot, even tho that can't occur (they're unsupported).
* Don't try to actually handle that case.
*/
if (!writable && mode == GACC_STORE)
return -EOPNOTSUPP;
hva += offset_in_page(gpa);
if (mode == GACC_STORE)
rc = copy_to_user_key((void __user *)hva, data, len, access_key);
else
rc = copy_from_user_key(data, (void __user *)hva, len, access_key);
if (rc)
return PGM_PROTECTION;
if (mode == GACC_STORE)
mark_page_dirty_in_slot(kvm, slot, gfn);
return 0;
}
int access_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, void *data,
unsigned long len, enum gacc_mode mode, u8 access_key)
{
int offset = offset_in_page(gpa);
int fragment_len;
int rc;
while (min(PAGE_SIZE - offset, len) > 0) {
fragment_len = min(PAGE_SIZE - offset, len);
rc = access_guest_page_with_key(kvm, mode, gpa, data, fragment_len, access_key);
if (rc)
return rc;
offset = 0;
len -= fragment_len;
data += fragment_len;
gpa += fragment_len;
}
return 0;
}
int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
void *data, unsigned long len, enum gacc_mode mode,
u8 access_key)
{
psw_t *psw = &vcpu->arch.sie_block->gpsw;
unsigned long nr_pages, idx;
unsigned long gpa_array[2];
unsigned int fragment_len;
unsigned long *gpas;
enum prot_type prot;
int need_ipte_lock;
union asce asce;
bool try_storage_prot_override;
bool try_fetch_prot_override;
int rc;
if (!len)
......@@ -904,16 +1071,47 @@ int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, void *data,
gpas = vmalloc(array_size(nr_pages, sizeof(unsigned long)));
if (!gpas)
return -ENOMEM;
try_fetch_prot_override = fetch_prot_override_applicable(vcpu, mode, asce);
try_storage_prot_override = storage_prot_override_applicable(vcpu);
need_ipte_lock = psw_bits(*psw).dat && !asce.r;
if (need_ipte_lock)
ipte_lock(vcpu);
rc = guest_range_to_gpas(vcpu, ga, ar, gpas, len, asce, mode);
for (idx = 0; idx < nr_pages && !rc; idx++) {
/*
* Since we do the access further down ultimately via a move instruction
* that does key checking and returns an error in case of a protection
* violation, we don't need to do the check during address translation.
* Skip it by passing access key 0, which matches any storage key,
* obviating the need for any further checks. As a result the check is
* handled entirely in hardware on access, we only need to take care to
* forego key protection checking if fetch protection override applies or
* retry with the special key 9 in case of storage protection override.
*/
rc = guest_range_to_gpas(vcpu, ga, ar, gpas, len, asce, mode, 0);
if (rc)
goto out_unlock;
for (idx = 0; idx < nr_pages; idx++) {
fragment_len = min(PAGE_SIZE - offset_in_page(gpas[idx]), len);
rc = access_guest_page(vcpu->kvm, mode, gpas[idx], data, fragment_len);
if (try_fetch_prot_override && fetch_prot_override_applies(ga, fragment_len)) {
rc = access_guest_page(vcpu->kvm, mode, gpas[idx],
data, fragment_len);
} else {
rc = access_guest_page_with_key(vcpu->kvm, mode, gpas[idx],
data, fragment_len, access_key);
}
if (rc == PGM_PROTECTION && try_storage_prot_override)
rc = access_guest_page_with_key(vcpu->kvm, mode, gpas[idx],
data, fragment_len, PAGE_SPO_ACC);
if (rc == PGM_PROTECTION)
prot = PROT_TYPE_KEYC;
if (rc)
break;
len -= fragment_len;
data += fragment_len;
ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
}
if (rc > 0)
rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
out_unlock:
if (need_ipte_lock)
ipte_unlock(vcpu);
if (nr_pages > ARRAY_SIZE(gpa_array))
......@@ -940,12 +1138,13 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
}
/**
* guest_translate_address - translate guest logical into guest absolute address
* guest_translate_address_with_key - translate guest logical into guest absolute address
* @vcpu: virtual cpu
* @gva: Guest virtual address
* @ar: Access register
* @gpa: Guest physical address
* @mode: Translation access mode
* @access_key: access key to mach the storage key with
*
* Parameter semantics are the same as the ones from guest_translate.
* The memory contents at the guest address are not changed.
......@@ -953,8 +1152,9 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
* Note: The IPTE lock is not taken during this function, so the caller
* has to take care of this.
*/
int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
unsigned long *gpa, enum gacc_mode mode)
int guest_translate_address_with_key(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
unsigned long *gpa, enum gacc_mode mode,
u8 access_key)
{
union asce asce;
int rc;
......@@ -963,7 +1163,8 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
rc = get_vcpu_asce(vcpu, &asce, gva, ar, mode);
if (rc)
return rc;
return guest_range_to_gpas(vcpu, gva, ar, gpa, 1, asce, mode);
return guest_range_to_gpas(vcpu, gva, ar, gpa, 1, asce, mode,
access_key);
}
/**
......@@ -973,9 +1174,10 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
* @ar: Access register
* @length: Length of test range
* @mode: Translation access mode
* @access_key: access key to mach the storage keys with
*/
int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
unsigned long length, enum gacc_mode mode)
unsigned long length, enum gacc_mode mode, u8 access_key)
{
union asce asce;
int rc = 0;
......@@ -984,12 +1186,36 @@ int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
if (rc)
return rc;
ipte_lock(vcpu);
rc = guest_range_to_gpas(vcpu, gva, ar, NULL, length, asce, mode);
rc = guest_range_to_gpas(vcpu, gva, ar, NULL, length, asce, mode,
access_key);
ipte_unlock(vcpu);
return rc;
}
/**
* check_gpa_range - test a range of guest physical addresses for accessibility
* @kvm: virtual machine instance
* @gpa: guest physical address
* @length: length of test range
* @mode: access mode to test, relevant for storage keys
* @access_key: access key to mach the storage keys with
*/
int check_gpa_range(struct kvm *kvm, unsigned long gpa, unsigned long length,
enum gacc_mode mode, u8 access_key)
{
unsigned int fragment_len;
int rc = 0;
while (length && !rc) {
fragment_len = min(PAGE_SIZE - offset_in_page(gpa), length);
rc = vm_check_access_key(kvm, access_key, mode, gpa);
length -= fragment_len;
gpa += fragment_len;
}
return rc;
}
/**
* kvm_s390_check_low_addr_prot_real - check for low-address protection
* @vcpu: virtual cpu
......
......@@ -186,24 +186,34 @@ enum gacc_mode {
GACC_IFETCH,
};
int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva,
u8 ar, unsigned long *gpa, enum gacc_mode mode);
int guest_translate_address_with_key(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
unsigned long *gpa, enum gacc_mode mode,
u8 access_key);
int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
unsigned long length, enum gacc_mode mode);
unsigned long length, enum gacc_mode mode, u8 access_key);
int check_gpa_range(struct kvm *kvm, unsigned long gpa, unsigned long length,
enum gacc_mode mode, u8 access_key);
int access_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, void *data,
unsigned long len, enum gacc_mode mode, u8 access_key);
int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, void *data,
unsigned long len, enum gacc_mode mode);
int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
void *data, unsigned long len, enum gacc_mode mode,
u8 access_key);
int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
void *data, unsigned long len, enum gacc_mode mode);
/**
* write_guest - copy data from kernel space to guest space
* write_guest_with_key - copy data from kernel space to guest space
* @vcpu: virtual cpu
* @ga: guest address
* @ar: access register
* @data: source address in kernel space
* @len: number of bytes to copy
* @access_key: access key the storage key needs to match
*
* Copy @len bytes from @data (kernel space) to @ga (guest address).
* In order to copy data to guest space the PSW of the vcpu is inspected:
......@@ -214,8 +224,8 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
* The addressing mode of the PSW is also inspected, so that address wrap
* around is taken into account for 24-, 31- and 64-bit addressing mode,
* if the to be copied data crosses page boundaries in guest address space.
* In addition also low address and DAT protection are inspected before
* copying any data (key protection is currently not implemented).
* In addition low address, DAT and key protection checks are performed before
* copying any data.
*
* This function modifies the 'struct kvm_s390_pgm_info pgm' member of @vcpu.
* In case of an access exception (e.g. protection exception) pgm will contain
......@@ -243,10 +253,53 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
* if data has been changed in guest space in case of an exception.
*/
static inline __must_check
int write_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
void *data, unsigned long len, u8 access_key)
{
return access_guest_with_key(vcpu, ga, ar, data, len, GACC_STORE,
access_key);
}
/**
* write_guest - copy data from kernel space to guest space
* @vcpu: virtual cpu
* @ga: guest address
* @ar: access register
* @data: source address in kernel space
* @len: number of bytes to copy
*
* The behaviour of write_guest is identical to write_guest_with_key, except
* that the PSW access key is used instead of an explicit argument.
*/
static inline __must_check
int write_guest(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, void *data,
unsigned long len)
{
return access_guest(vcpu, ga, ar, data, len, GACC_STORE);
u8 access_key = psw_bits(vcpu->arch.sie_block->gpsw).key;
return write_guest_with_key(vcpu, ga, ar, data, len, access_key);
}
/**
* read_guest_with_key - copy data from guest space to kernel space
* @vcpu: virtual cpu
* @ga: guest address
* @ar: access register
* @data: destination address in kernel space
* @len: number of bytes to copy
* @access_key: access key the storage key needs to match
*
* Copy @len bytes from @ga (guest address) to @data (kernel space).
*
* The behaviour of read_guest_with_key is identical to write_guest_with_key,
* except that data will be copied from guest space to kernel space.
*/
static inline __must_check
int read_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
void *data, unsigned long len, u8 access_key)
{
return access_guest_with_key(vcpu, ga, ar, data, len, GACC_FETCH,
access_key);
}
/**
......@@ -259,14 +312,16 @@ int write_guest(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, void *data,
*
* Copy @len bytes from @ga (guest address) to @data (kernel space).
*
* The behaviour of read_guest is identical to write_guest, except that
* data will be copied from guest space to kernel space.
* The behaviour of read_guest is identical to read_guest_with_key, except
* that the PSW access key is used instead of an explicit argument.
*/
static inline __must_check
int read_guest(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, void *data,
unsigned long len)
{
return access_guest(vcpu, ga, ar, data, len, GACC_FETCH);
u8 access_key = psw_bits(vcpu->arch.sie_block->gpsw).key;
return read_guest_with_key(vcpu, ga, ar, data, len, access_key);
}
/**
......@@ -287,7 +342,10 @@ static inline __must_check
int read_guest_instr(struct kvm_vcpu *vcpu, unsigned long ga, void *data,
unsigned long len)
{
return access_guest(vcpu, ga, 0, data, len, GACC_IFETCH);
u8 access_key = psw_bits(vcpu->arch.sie_block->gpsw).key;
return access_guest_with_key(vcpu, ga, 0, data, len, GACC_IFETCH,
access_key);
}
/**
......
......@@ -331,18 +331,18 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu)
kvm_s390_get_regs_rre(vcpu, &reg1, &reg2);
/* Make sure that the source is paged-in */
rc = guest_translate_address(vcpu, vcpu->run->s.regs.gprs[reg2],
reg2, &srcaddr, GACC_FETCH);
/* Ensure that the source is paged-in, no actual access -> no key checking */
rc = guest_translate_address_with_key(vcpu, vcpu->run->s.regs.gprs[reg2],
reg2, &srcaddr, GACC_FETCH, 0);
if (rc)
return kvm_s390_inject_prog_cond(vcpu, rc);
rc = kvm_arch_fault_in_page(vcpu, srcaddr, 0);
if (rc != 0)
return rc;
/* Make sure that the destination is paged-in */
rc = guest_translate_address(vcpu, vcpu->run->s.regs.gprs[reg1],
reg1, &dstaddr, GACC_STORE);
/* Ensure that the source is paged-in, no actual access -> no key checking */
rc = guest_translate_address_with_key(vcpu, vcpu->run->s.regs.gprs[reg1],
reg1, &dstaddr, GACC_STORE, 0);
if (rc)
return kvm_s390_inject_prog_cond(vcpu, rc);
rc = kvm_arch_fault_in_page(vcpu, dstaddr, 1);
......
......@@ -1901,13 +1901,12 @@ static int __inject_io(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
isc = int_word_to_isc(inti->io.io_int_word);
/*
* Do not make use of gisa in protected mode. We do not use the lock
* checking variant as this is just a performance optimization and we
* do not hold the lock here. This is ok as the code will pick
* interrupts from both "lists" for delivery.
* We do not use the lock checking variant as this is just a
* performance optimization and we do not hold the lock here.
* This is ok as the code will pick interrupts from both "lists"
* for delivery.
*/
if (!kvm_s390_pv_get_handle(kvm) &&
gi->origin && inti->type & KVM_S390_INT_IO_AI_MASK) {
if (gi->origin && inti->type & KVM_S390_INT_IO_AI_MASK) {
VM_EVENT(kvm, 4, "%s isc %1u", "inject: I/O (AI/gisa)", isc);
gisa_set_ipm_gisc(gi->origin, isc);
kfree(inti);
......@@ -3171,9 +3170,33 @@ void kvm_s390_gisa_init(struct kvm *kvm)
VM_EVENT(kvm, 3, "gisa 0x%pK initialized", gi->origin);
}
void kvm_s390_gisa_enable(struct kvm *kvm)
{
struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int;
struct kvm_vcpu *vcpu;
unsigned long i;
u32 gisa_desc;
if (gi->origin)
return;
kvm_s390_gisa_init(kvm);
gisa_desc = kvm_s390_get_gisa_desc(kvm);
if (!gisa_desc)
return;
kvm_for_each_vcpu(i, vcpu, kvm) {
mutex_lock(&vcpu->mutex);
vcpu->arch.sie_block->gd = gisa_desc;
vcpu->arch.sie_block->eca |= ECA_AIV;
VCPU_EVENT(vcpu, 3, "AIV gisa format-%u enabled for cpu %03u",
vcpu->arch.sie_block->gd & 0x3, vcpu->vcpu_id);
mutex_unlock(&vcpu->mutex);
}
}
void kvm_s390_gisa_destroy(struct kvm *kvm)
{
struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int;
struct kvm_s390_gisa *gisa = gi->origin;
if (!gi->origin)
return;
......@@ -3184,6 +3207,25 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
cpu_relax();
hrtimer_cancel(&gi->timer);
gi->origin = NULL;
VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);
}
void kvm_s390_gisa_disable(struct kvm *kvm)
{
struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int;
struct kvm_vcpu *vcpu;
unsigned long i;
if (!gi->origin)
return;
kvm_for_each_vcpu(i, vcpu, kvm) {
mutex_lock(&vcpu->mutex);
vcpu->arch.sie_block->eca &= ~ECA_AIV;
vcpu->arch.sie_block->gd = 0U;
mutex_unlock(&vcpu->mutex);
VCPU_EVENT(vcpu, 3, "AIV disabled for cpu %03u", vcpu->vcpu_id);
}
kvm_s390_gisa_destroy(kvm);
}
/**
......
......@@ -564,6 +564,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_VCPU_RESETS:
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_S390_DIAG318:
case KVM_CAP_S390_MEM_OP_EXTENSION:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
......@@ -2194,6 +2195,9 @@ static int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rcp, u16 *rrcp)
}
mutex_unlock(&vcpu->mutex);
}
/* Ensure that we re-enable gisa if the non-PV guest used it but the PV guest did not. */
if (use_gisa)
kvm_s390_gisa_enable(kvm);
return ret;
}
......@@ -2205,6 +2209,10 @@ static int kvm_s390_cpus_to_pv(struct kvm *kvm, u16 *rc, u16 *rrc)
struct kvm_vcpu *vcpu;
/* Disable the GISA if the ultravisor does not support AIV. */
if (!test_bit_inv(BIT_UV_FEAT_AIV, &uv_info.uv_feature_indications))
kvm_s390_gisa_disable(kvm);
kvm_for_each_vcpu(i, vcpu, kvm) {
mutex_lock(&vcpu->mutex);
r = kvm_s390_pv_create_cpu(vcpu, rc, rrc);
......@@ -2359,6 +2367,83 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
return r;
}
static bool access_key_invalid(u8 access_key)
{
return access_key > 0xf;
}
static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
{
void __user *uaddr = (void __user *)mop->buf;
u64 supported_flags;
void *tmpbuf = NULL;
int r, srcu_idx;
supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
| KVM_S390_MEMOP_F_CHECK_ONLY;
if (mop->flags & ~supported_flags || !mop->size)
return -EINVAL;
if (mop->size > MEM_OP_MAX_SIZE)
return -E2BIG;
if (kvm_s390_pv_is_protected(kvm))
return -EINVAL;
if (mop->flags & KVM_S390_MEMOP_F_SKEY_PROTECTION) {
if (access_key_invalid(mop->key))
return -EINVAL;
} else {
mop->key = 0;
}
if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
tmpbuf = vmalloc(mop->size);
if (!tmpbuf)
return -ENOMEM;
}
srcu_idx = srcu_read_lock(&kvm->srcu);
if (kvm_is_error_gpa(kvm, mop->gaddr)) {
r = PGM_ADDRESSING;
goto out_unlock;
}
switch (mop->op) {
case KVM_S390_MEMOP_ABSOLUTE_READ: {
if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_FETCH, mop->key);
} else {
r = access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf,
mop->size, GACC_FETCH, mop->key);
if (r == 0) {
if (copy_to_user(uaddr, tmpbuf, mop->size))
r = -EFAULT;
}
}
break;
}
case KVM_S390_MEMOP_ABSOLUTE_WRITE: {
if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
} else {
if (copy_from_user(tmpbuf, uaddr, mop->size)) {
r = -EFAULT;
break;
}
r = access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf,
mop->size, GACC_STORE, mop->key);
}
break;
}
default:
r = -EINVAL;
}
out_unlock:
srcu_read_unlock(&kvm->srcu, srcu_idx);
vfree(tmpbuf);
return r;
}
long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
......@@ -2483,6 +2568,15 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
break;
}
case KVM_S390_MEM_OP: {
struct kvm_s390_mem_op mem_op;
if (copy_from_user(&mem_op, argp, sizeof(mem_op)) == 0)
r = kvm_s390_vm_mem_op(kvm, &mem_op);
else
r = -EFAULT;
break;
}
default:
r = -ENOTTY;
}
......@@ -3263,9 +3357,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->icpua = vcpu->vcpu_id;
spin_lock_init(&vcpu->arch.local_int.lock);
vcpu->arch.sie_block->gd = (u32)(u64)vcpu->kvm->arch.gisa_int.origin;
if (vcpu->arch.sie_block->gd && sclp.has_gisaf)
vcpu->arch.sie_block->gd |= GISA_FORMAT1;
vcpu->arch.sie_block->gd = kvm_s390_get_gisa_desc(vcpu->kvm);
seqcount_init(&vcpu->arch.cputm_seqcount);
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
......@@ -3394,7 +3486,7 @@ static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
if (prefix <= end && start <= prefix + 2*PAGE_SIZE - 1) {
VCPU_EVENT(vcpu, 2, "gmap notifier for %lx-%lx",
start, end);
kvm_s390_sync_request(KVM_REQ_MMU_RELOAD, vcpu);
kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu);
}
}
}
......@@ -3796,19 +3888,19 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
if (!kvm_request_pending(vcpu))
return 0;
/*
* We use MMU_RELOAD just to re-arm the ipte notifier for the
* If the guest prefix changed, re-arm the ipte notifier for the
* guest prefix page. gmap_mprotect_notify will wait on the ptl lock.
* This ensures that the ipte instruction for this request has
* already finished. We might race against a second unmapper that
* wants to set the blocking bit. Lets just retry the request loop.
*/
if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) {
if (kvm_check_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu)) {
int rc;
rc = gmap_mprotect_notify(vcpu->arch.gmap,
kvm_s390_get_prefix(vcpu),
PAGE_SIZE * 2, PROT_WRITE);
if (rc) {
kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
kvm_make_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu);
return rc;
}
goto retry;
......@@ -3869,14 +3961,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
return 0;
}
void kvm_s390_set_tod_clock(struct kvm *kvm,
const struct kvm_s390_vm_tod_clock *gtod)
static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod)
{
struct kvm_vcpu *vcpu;
union tod_clock clk;
unsigned long i;
mutex_lock(&kvm->lock);
preempt_disable();
store_tod_clock_ext(&clk);
......@@ -3897,7 +3987,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm,
kvm_s390_vcpu_unblock_all(kvm);
preempt_enable();
}
void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod)
{
mutex_lock(&kvm->lock);
__kvm_s390_set_tod_clock(kvm, gtod);
mutex_unlock(&kvm->lock);
}
int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod)
{
if (!mutex_trylock(&kvm->lock))
return 0;
__kvm_s390_set_tod_clock(kvm, gtod);
mutex_unlock(&kvm->lock);
return 1;
}
/**
......@@ -4655,8 +4760,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
return r;
}
static long kvm_s390_guest_sida_op(struct kvm_vcpu *vcpu,
struct kvm_s390_mem_op *mop)
static long kvm_s390_vcpu_sida_op(struct kvm_vcpu *vcpu,
struct kvm_s390_mem_op *mop)
{
void __user *uaddr = (void __user *)mop->buf;
int r = 0;
......@@ -4685,24 +4790,29 @@ static long kvm_s390_guest_sida_op(struct kvm_vcpu *vcpu,
}
return r;
}
static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
struct kvm_s390_mem_op *mop)
static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcpu,
struct kvm_s390_mem_op *mop)
{
void __user *uaddr = (void __user *)mop->buf;
void *tmpbuf = NULL;
int r = 0;
const u64 supported_flags = KVM_S390_MEMOP_F_INJECT_EXCEPTION
| KVM_S390_MEMOP_F_CHECK_ONLY;
| KVM_S390_MEMOP_F_CHECK_ONLY
| KVM_S390_MEMOP_F_SKEY_PROTECTION;
if (mop->flags & ~supported_flags || mop->ar >= NUM_ACRS || !mop->size)
return -EINVAL;
if (mop->size > MEM_OP_MAX_SIZE)
return -E2BIG;
if (kvm_s390_pv_cpu_is_protected(vcpu))
return -EINVAL;
if (mop->flags & KVM_S390_MEMOP_F_SKEY_PROTECTION) {
if (access_key_invalid(mop->key))
return -EINVAL;
} else {
mop->key = 0;
}
if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
tmpbuf = vmalloc(mop->size);
if (!tmpbuf)
......@@ -4712,11 +4822,12 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
switch (mop->op) {
case KVM_S390_MEMOP_LOGICAL_READ:
if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
r = check_gva_range(vcpu, mop->gaddr, mop->ar,
mop->size, GACC_FETCH);
r = check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size,
GACC_FETCH, mop->key);
break;
}
r = read_guest(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size);
r = read_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf,
mop->size, mop->key);
if (r == 0) {
if (copy_to_user(uaddr, tmpbuf, mop->size))
r = -EFAULT;
......@@ -4724,15 +4835,16 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
break;
case KVM_S390_MEMOP_LOGICAL_WRITE:
if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
r = check_gva_range(vcpu, mop->gaddr, mop->ar,
mop->size, GACC_STORE);
r = check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size,
GACC_STORE, mop->key);
break;
}
if (copy_from_user(tmpbuf, uaddr, mop->size)) {
r = -EFAULT;
break;
}
r = write_guest(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size);
r = write_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf,
mop->size, mop->key);
break;
}
......@@ -4743,8 +4855,8 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
return r;
}
static long kvm_s390_guest_memsida_op(struct kvm_vcpu *vcpu,
struct kvm_s390_mem_op *mop)
static long kvm_s390_vcpu_memsida_op(struct kvm_vcpu *vcpu,
struct kvm_s390_mem_op *mop)
{
int r, srcu_idx;
......@@ -4753,12 +4865,12 @@ static long kvm_s390_guest_memsida_op(struct kvm_vcpu *vcpu,
switch (mop->op) {
case KVM_S390_MEMOP_LOGICAL_READ:
case KVM_S390_MEMOP_LOGICAL_WRITE:
r = kvm_s390_guest_mem_op(vcpu, mop);
r = kvm_s390_vcpu_mem_op(vcpu, mop);
break;
case KVM_S390_MEMOP_SIDA_READ:
case KVM_S390_MEMOP_SIDA_WRITE:
/* we are locked against sida going away by the vcpu->mutex */
r = kvm_s390_guest_sida_op(vcpu, mop);
r = kvm_s390_vcpu_sida_op(vcpu, mop);
break;
default:
r = -EINVAL;
......@@ -4921,7 +5033,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
struct kvm_s390_mem_op mem_op;
if (copy_from_user(&mem_op, argp, sizeof(mem_op)) == 0)
r = kvm_s390_guest_memsida_op(vcpu, &mem_op);
r = kvm_s390_vcpu_memsida_op(vcpu, &mem_op);
else
r = -EFAULT;
break;
......
......@@ -105,7 +105,7 @@ static inline void kvm_s390_set_prefix(struct kvm_vcpu *vcpu, u32 prefix)
prefix);
vcpu->arch.sie_block->prefix = prefix >> GUEST_PREFIX_SHIFT;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
kvm_make_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu);
}
static inline u64 kvm_s390_get_base_disp_s(struct kvm_vcpu *vcpu, u8 *ar)
......@@ -231,6 +231,15 @@ static inline unsigned long kvm_s390_get_gfn_end(struct kvm_memslots *slots)
return ms->base_gfn + ms->npages;
}
static inline u32 kvm_s390_get_gisa_desc(struct kvm *kvm)
{
u32 gd = (u32)(u64)kvm->arch.gisa_int.origin;
if (gd && sclp.has_gisaf)
gd |= GISA_FORMAT1;
return gd;
}
/* implemented in pv.c */
int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
......@@ -349,8 +358,8 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu);
int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu);
/* implemented in kvm-s390.c */
void kvm_s390_set_tod_clock(struct kvm *kvm,
const struct kvm_s390_vm_tod_clock *gtod);
void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod);
int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod);
long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable);
int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr);
int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr);
......@@ -450,6 +459,8 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu,
void kvm_s390_gisa_init(struct kvm *kvm);
void kvm_s390_gisa_clear(struct kvm *kvm);
void kvm_s390_gisa_destroy(struct kvm *kvm);
void kvm_s390_gisa_disable(struct kvm *kvm);
void kvm_s390_gisa_enable(struct kvm *kvm);
int kvm_s390_gib_init(u8 nisc);
void kvm_s390_gib_destroy(void);
......
......@@ -102,7 +102,20 @@ static int handle_set_clock(struct kvm_vcpu *vcpu)
return kvm_s390_inject_prog_cond(vcpu, rc);
VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod);
kvm_s390_set_tod_clock(vcpu->kvm, &gtod);
/*
* To set the TOD clock the kvm lock must be taken, but the vcpu lock
* is already held in handle_set_clock. The usual lock order is the
* opposite. As SCK is deprecated and should not be used in several
* cases, for example when the multiple epoch facility or TOD clock
* steering facility is installed (see Principles of Operation), a
* slow path can be used. If the lock can not be taken via try_lock,
* the instruction will be retried via -EAGAIN at a later point in
* time.
*/
if (!kvm_s390_try_set_tod_clock(vcpu->kvm, &gtod)) {
kvm_s390_retry_instr(vcpu);
return -EAGAIN;
}
kvm_s390_set_psw_cc(vcpu, 0);
return 0;
......@@ -1443,10 +1456,11 @@ int kvm_s390_handle_eb(struct kvm_vcpu *vcpu)
static int handle_tprot(struct kvm_vcpu *vcpu)
{
u64 address1, address2;
unsigned long hva, gpa;
int ret = 0, cc = 0;
u64 address, operand2;
unsigned long gpa;
u8 access_key;
bool writable;
int ret, cc;
u8 ar;
vcpu->stat.instruction_tprot++;
......@@ -1454,43 +1468,46 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
kvm_s390_get_base_disp_sse(vcpu, &address1, &address2, &ar, NULL);
kvm_s390_get_base_disp_sse(vcpu, &address, &operand2, &ar, NULL);
access_key = (operand2 & 0xf0) >> 4;
/* we only handle the Linux memory detection case:
* access key == 0
* everything else goes to userspace. */
if (address2 & 0xf0)
return -EOPNOTSUPP;
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
ipte_lock(vcpu);
ret = guest_translate_address(vcpu, address1, ar, &gpa, GACC_STORE);
if (ret == PGM_PROTECTION) {
ret = guest_translate_address_with_key(vcpu, address, ar, &gpa,
GACC_STORE, access_key);
if (ret == 0) {
gfn_to_hva_prot(vcpu->kvm, gpa_to_gfn(gpa), &writable);
} else if (ret == PGM_PROTECTION) {
writable = false;
/* Write protected? Try again with read-only... */
cc = 1;
ret = guest_translate_address(vcpu, address1, ar, &gpa,
GACC_FETCH);
ret = guest_translate_address_with_key(vcpu, address, ar, &gpa,
GACC_FETCH, access_key);
}
if (ret) {
if (ret == PGM_ADDRESSING || ret == PGM_TRANSLATION_SPEC) {
ret = kvm_s390_inject_program_int(vcpu, ret);
} else if (ret > 0) {
/* Translation not available */
kvm_s390_set_psw_cc(vcpu, 3);
if (ret >= 0) {
cc = -1;
/* Fetching permitted; storing permitted */
if (ret == 0 && writable)
cc = 0;
/* Fetching permitted; storing not permitted */
else if (ret == 0 && !writable)
cc = 1;
/* Fetching not permitted; storing not permitted */
else if (ret == PGM_PROTECTION)
cc = 2;
/* Translation not available */
else if (ret != PGM_ADDRESSING && ret != PGM_TRANSLATION_SPEC)
cc = 3;
if (cc != -1) {
kvm_s390_set_psw_cc(vcpu, cc);
ret = 0;
} else {
ret = kvm_s390_inject_program_int(vcpu, ret);
}
goto out_unlock;
}
hva = gfn_to_hva_prot(vcpu->kvm, gpa_to_gfn(gpa), &writable);
if (kvm_is_error_hva(hva)) {
ret = kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
} else {
if (!writable)
cc = 1; /* Write not permitted ==> read-only */
kvm_s390_set_psw_cc(vcpu, cc);
/* Note: CC2 only occurs for storage keys (not supported yet) */
}
out_unlock:
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
ipte_unlock(vcpu);
return ret;
......
......@@ -59,11 +59,13 @@ static inline int copy_with_mvcos(void)
#endif
static inline unsigned long copy_from_user_mvcos(void *x, const void __user *ptr,
unsigned long size)
unsigned long size, unsigned long key)
{
unsigned long tmp1, tmp2;
union oac spec = {
.oac2.key = key,
.oac2.as = PSW_BITS_AS_SECONDARY,
.oac2.k = 1,
.oac2.a = 1,
};
......@@ -94,19 +96,19 @@ static inline unsigned long copy_from_user_mvcos(void *x, const void __user *ptr
}
static inline unsigned long copy_from_user_mvcp(void *x, const void __user *ptr,
unsigned long size)
unsigned long size, unsigned long key)
{
unsigned long tmp1, tmp2;
tmp1 = -256UL;
asm volatile(
" sacf 0\n"
"0: mvcp 0(%0,%2),0(%1),%3\n"
"0: mvcp 0(%0,%2),0(%1),%[key]\n"
"7: jz 5f\n"
"1: algr %0,%3\n"
" la %1,256(%1)\n"
" la %2,256(%2)\n"
"2: mvcp 0(%0,%2),0(%1),%3\n"
"2: mvcp 0(%0,%2),0(%1),%[key]\n"
"8: jnz 1b\n"
" j 5f\n"
"3: la %4,255(%1)\n" /* %4 = ptr + 255 */
......@@ -115,7 +117,7 @@ static inline unsigned long copy_from_user_mvcp(void *x, const void __user *ptr,
" slgr %4,%1\n"
" clgr %0,%4\n" /* copy crosses next page boundary? */
" jnh 6f\n"
"4: mvcp 0(%4,%2),0(%1),%3\n"
"4: mvcp 0(%4,%2),0(%1),%[key]\n"
"9: slgr %0,%4\n"
" j 6f\n"
"5: slgr %0,%0\n"
......@@ -123,24 +125,49 @@ static inline unsigned long copy_from_user_mvcp(void *x, const void __user *ptr,
EX_TABLE(0b,3b) EX_TABLE(2b,3b) EX_TABLE(4b,6b)
EX_TABLE(7b,3b) EX_TABLE(8b,3b) EX_TABLE(9b,6b)
: "+a" (size), "+a" (ptr), "+a" (x), "+a" (tmp1), "=a" (tmp2)
: : "cc", "memory");
: [key] "d" (key << 4)
: "cc", "memory");
return size;
}
unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n)
static unsigned long raw_copy_from_user_key(void *to, const void __user *from,
unsigned long n, unsigned long key)
{
if (copy_with_mvcos())
return copy_from_user_mvcos(to, from, n);
return copy_from_user_mvcp(to, from, n);
return copy_from_user_mvcos(to, from, n, key);
return copy_from_user_mvcp(to, from, n, key);
}
unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n)
{
return raw_copy_from_user_key(to, from, n, 0);
}
EXPORT_SYMBOL(raw_copy_from_user);
unsigned long _copy_from_user_key(void *to, const void __user *from,
unsigned long n, unsigned long key)
{
unsigned long res = n;
might_fault();
if (!should_fail_usercopy()) {
instrument_copy_from_user(to, from, n);
res = raw_copy_from_user_key(to, from, n, key);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
return res;
}
EXPORT_SYMBOL(_copy_from_user_key);
static inline unsigned long copy_to_user_mvcos(void __user *ptr, const void *x,
unsigned long size)
unsigned long size, unsigned long key)
{
unsigned long tmp1, tmp2;
union oac spec = {
.oac1.key = key,
.oac1.as = PSW_BITS_AS_SECONDARY,
.oac1.k = 1,
.oac1.a = 1,
};
......@@ -171,19 +198,19 @@ static inline unsigned long copy_to_user_mvcos(void __user *ptr, const void *x,
}
static inline unsigned long copy_to_user_mvcs(void __user *ptr, const void *x,
unsigned long size)
unsigned long size, unsigned long key)
{
unsigned long tmp1, tmp2;
tmp1 = -256UL;
asm volatile(
" sacf 0\n"
"0: mvcs 0(%0,%1),0(%2),%3\n"
"0: mvcs 0(%0,%1),0(%2),%[key]\n"
"7: jz 5f\n"
"1: algr %0,%3\n"
" la %1,256(%1)\n"
" la %2,256(%2)\n"
"2: mvcs 0(%0,%1),0(%2),%3\n"
"2: mvcs 0(%0,%1),0(%2),%[key]\n"
"8: jnz 1b\n"
" j 5f\n"
"3: la %4,255(%1)\n" /* %4 = ptr + 255 */
......@@ -192,7 +219,7 @@ static inline unsigned long copy_to_user_mvcs(void __user *ptr, const void *x,
" slgr %4,%1\n"
" clgr %0,%4\n" /* copy crosses next page boundary? */
" jnh 6f\n"
"4: mvcs 0(%4,%1),0(%2),%3\n"
"4: mvcs 0(%4,%1),0(%2),%[key]\n"
"9: slgr %0,%4\n"
" j 6f\n"
"5: slgr %0,%0\n"
......@@ -200,18 +227,36 @@ static inline unsigned long copy_to_user_mvcs(void __user *ptr, const void *x,
EX_TABLE(0b,3b) EX_TABLE(2b,3b) EX_TABLE(4b,6b)
EX_TABLE(7b,3b) EX_TABLE(8b,3b) EX_TABLE(9b,6b)
: "+a" (size), "+a" (ptr), "+a" (x), "+a" (tmp1), "=a" (tmp2)
: : "cc", "memory");
: [key] "d" (key << 4)
: "cc", "memory");
return size;
}
unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n)
static unsigned long raw_copy_to_user_key(void __user *to, const void *from,
unsigned long n, unsigned long key)
{
if (copy_with_mvcos())
return copy_to_user_mvcos(to, from, n);
return copy_to_user_mvcs(to, from, n);
return copy_to_user_mvcos(to, from, n, key);
return copy_to_user_mvcs(to, from, n, key);
}
unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n)
{
return raw_copy_to_user_key(to, from, n, 0);
}
EXPORT_SYMBOL(raw_copy_to_user);
unsigned long _copy_to_user_key(void __user *to, const void *from,
unsigned long n, unsigned long key)
{
might_fault();
if (should_fail_usercopy())
return n;
instrument_copy_to_user(to, from, n);
return raw_copy_to_user_key(to, from, n, key);
}
EXPORT_SYMBOL(_copy_to_user_key);
static inline unsigned long clear_user_mvcos(void __user *to, unsigned long size)
{
unsigned long tmp1, tmp2;
......
/* SPDX-License-Identifier: GPL-2.0 */
#if !defined(KVM_X86_OP) || !defined(KVM_X86_OP_NULL)
#if !defined(KVM_X86_OP) || !defined(KVM_X86_OP_OPTIONAL)
BUILD_BUG_ON(1)
#endif
/*
* KVM_X86_OP() and KVM_X86_OP_NULL() are used to help generate
* "static_call()"s. They are also intended for use when defining
* the vmx/svm kvm_x86_ops. KVM_X86_OP() can be used for those
* functions that follow the [svm|vmx]_func_name convention.
* KVM_X86_OP_NULL() can leave a NULL definition for the
* case where there is no definition or a function name that
* doesn't match the typical naming convention is supplied.
* KVM_X86_OP() and KVM_X86_OP_OPTIONAL() are used to help generate
* both DECLARE/DEFINE_STATIC_CALL() invocations and
* "static_call_update()" calls.
*
* KVM_X86_OP_OPTIONAL() can be used for those functions that can have
* a NULL definition, for example if "static_call_cond()" will be used
* at the call sites. KVM_X86_OP_OPTIONAL_RET0() can be used likewise
* to make a definition optional, but in this case the default will
* be __static_call_return0.
*/
KVM_X86_OP_NULL(hardware_enable)
KVM_X86_OP_NULL(hardware_disable)
KVM_X86_OP_NULL(hardware_unsetup)
KVM_X86_OP_NULL(cpu_has_accelerated_tpr)
KVM_X86_OP(hardware_enable)
KVM_X86_OP(hardware_disable)
KVM_X86_OP(hardware_unsetup)
KVM_X86_OP(has_emulated_msr)
KVM_X86_OP(vcpu_after_set_cpuid)
KVM_X86_OP(vm_init)
KVM_X86_OP_NULL(vm_destroy)
KVM_X86_OP_OPTIONAL(vm_destroy)
KVM_X86_OP(vcpu_create)
KVM_X86_OP(vcpu_free)
KVM_X86_OP(vcpu_reset)
KVM_X86_OP(prepare_guest_switch)
KVM_X86_OP(prepare_switch_to_guest)
KVM_X86_OP(vcpu_load)
KVM_X86_OP(vcpu_put)
KVM_X86_OP(update_exception_bitmap)
......@@ -33,9 +34,9 @@ KVM_X86_OP(get_segment_base)
KVM_X86_OP(get_segment)
KVM_X86_OP(get_cpl)
KVM_X86_OP(set_segment)
KVM_X86_OP_NULL(get_cs_db_l_bits)
KVM_X86_OP(get_cs_db_l_bits)
KVM_X86_OP(set_cr0)
KVM_X86_OP_NULL(post_set_cr3)
KVM_X86_OP_OPTIONAL(post_set_cr3)
KVM_X86_OP(is_valid_cr4)
KVM_X86_OP(set_cr4)
KVM_X86_OP(set_efer)
......@@ -49,22 +50,22 @@ KVM_X86_OP(cache_reg)
KVM_X86_OP(get_rflags)
KVM_X86_OP(set_rflags)
KVM_X86_OP(get_if_flag)
KVM_X86_OP(tlb_flush_all)
KVM_X86_OP(tlb_flush_current)
KVM_X86_OP_NULL(tlb_remote_flush)
KVM_X86_OP_NULL(tlb_remote_flush_with_range)
KVM_X86_OP(tlb_flush_gva)
KVM_X86_OP(tlb_flush_guest)
KVM_X86_OP(flush_tlb_all)
KVM_X86_OP(flush_tlb_current)
KVM_X86_OP_OPTIONAL(tlb_remote_flush)
KVM_X86_OP_OPTIONAL(tlb_remote_flush_with_range)
KVM_X86_OP(flush_tlb_gva)
KVM_X86_OP(flush_tlb_guest)
KVM_X86_OP(vcpu_pre_run)
KVM_X86_OP(run)
KVM_X86_OP_NULL(handle_exit)
KVM_X86_OP_NULL(skip_emulated_instruction)
KVM_X86_OP_NULL(update_emulated_instruction)
KVM_X86_OP(vcpu_run)
KVM_X86_OP(handle_exit)
KVM_X86_OP(skip_emulated_instruction)
KVM_X86_OP_OPTIONAL(update_emulated_instruction)
KVM_X86_OP(set_interrupt_shadow)
KVM_X86_OP(get_interrupt_shadow)
KVM_X86_OP(patch_hypercall)
KVM_X86_OP(set_irq)
KVM_X86_OP(set_nmi)
KVM_X86_OP(inject_irq)
KVM_X86_OP(inject_nmi)
KVM_X86_OP(queue_exception)
KVM_X86_OP(cancel_injection)
KVM_X86_OP(interrupt_allowed)
......@@ -73,22 +74,22 @@ KVM_X86_OP(get_nmi_mask)
KVM_X86_OP(set_nmi_mask)
KVM_X86_OP(enable_nmi_window)
KVM_X86_OP(enable_irq_window)
KVM_X86_OP(update_cr8_intercept)
KVM_X86_OP_OPTIONAL(update_cr8_intercept)
KVM_X86_OP(check_apicv_inhibit_reasons)
KVM_X86_OP(refresh_apicv_exec_ctrl)
KVM_X86_OP(hwapic_irr_update)
KVM_X86_OP(hwapic_isr_update)
KVM_X86_OP_NULL(guest_apic_has_interrupt)
KVM_X86_OP(load_eoi_exitmap)
KVM_X86_OP(set_virtual_apic_mode)
KVM_X86_OP_NULL(set_apic_access_page_addr)
KVM_X86_OP_OPTIONAL(hwapic_irr_update)
KVM_X86_OP_OPTIONAL(hwapic_isr_update)
KVM_X86_OP_OPTIONAL_RET0(guest_apic_has_interrupt)
KVM_X86_OP_OPTIONAL(load_eoi_exitmap)
KVM_X86_OP_OPTIONAL(set_virtual_apic_mode)
KVM_X86_OP_OPTIONAL(set_apic_access_page_addr)
KVM_X86_OP(deliver_interrupt)
KVM_X86_OP_NULL(sync_pir_to_irr)
KVM_X86_OP(set_tss_addr)
KVM_X86_OP(set_identity_map_addr)
KVM_X86_OP_OPTIONAL(sync_pir_to_irr)
KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
KVM_X86_OP(get_mt_mask)
KVM_X86_OP(load_mmu_pgd)
KVM_X86_OP_NULL(has_wbinvd_exit)
KVM_X86_OP(has_wbinvd_exit)
KVM_X86_OP(get_l2_tsc_offset)
KVM_X86_OP(get_l2_tsc_multiplier)
KVM_X86_OP(write_tsc_offset)
......@@ -96,32 +97,36 @@ KVM_X86_OP(write_tsc_multiplier)
KVM_X86_OP(get_exit_info)
KVM_X86_OP(check_intercept)
KVM_X86_OP(handle_exit_irqoff)
KVM_X86_OP_NULL(request_immediate_exit)
KVM_X86_OP(request_immediate_exit)
KVM_X86_OP(sched_in)
KVM_X86_OP_NULL(update_cpu_dirty_logging)
KVM_X86_OP_NULL(vcpu_blocking)
KVM_X86_OP_NULL(vcpu_unblocking)
KVM_X86_OP_NULL(update_pi_irte)
KVM_X86_OP_NULL(start_assignment)
KVM_X86_OP_NULL(apicv_post_state_restore)
KVM_X86_OP_NULL(dy_apicv_has_pending_interrupt)
KVM_X86_OP_NULL(set_hv_timer)
KVM_X86_OP_NULL(cancel_hv_timer)
KVM_X86_OP_OPTIONAL(update_cpu_dirty_logging)
KVM_X86_OP_OPTIONAL(vcpu_blocking)
KVM_X86_OP_OPTIONAL(vcpu_unblocking)
KVM_X86_OP_OPTIONAL(pi_update_irte)
KVM_X86_OP_OPTIONAL(pi_start_assignment)
KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
KVM_X86_OP_OPTIONAL(set_hv_timer)
KVM_X86_OP_OPTIONAL(cancel_hv_timer)
KVM_X86_OP(setup_mce)
KVM_X86_OP(smi_allowed)
KVM_X86_OP(enter_smm)
KVM_X86_OP(leave_smm)
KVM_X86_OP(enable_smi_window)
KVM_X86_OP_NULL(mem_enc_op)
KVM_X86_OP_NULL(mem_enc_reg_region)
KVM_X86_OP_NULL(mem_enc_unreg_region)
KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
KVM_X86_OP_OPTIONAL(mem_enc_register_region)
KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
KVM_X86_OP(get_msr_feature)
KVM_X86_OP(can_emulate_instruction)
KVM_X86_OP(apic_init_signal_blocked)
KVM_X86_OP_NULL(enable_direct_tlbflush)
KVM_X86_OP_NULL(migrate_timers)
KVM_X86_OP_OPTIONAL(enable_direct_tlbflush)
KVM_X86_OP_OPTIONAL(migrate_timers)
KVM_X86_OP(msr_filter_changed)
KVM_X86_OP_NULL(complete_emulated_msr)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
#undef KVM_X86_OP
#undef KVM_X86_OP_NULL
#undef KVM_X86_OP_OPTIONAL
#undef KVM_X86_OP_OPTIONAL_RET0
......@@ -15,6 +15,7 @@
#include <linux/cpumask.h>
#include <linux/irq_work.h>
#include <linux/irq.h>
#include <linux/workqueue.h>
#include <linux/kvm.h>
#include <linux/kvm_para.h>
......@@ -102,6 +103,8 @@
#define KVM_REQ_MSR_FILTER_CHANGED KVM_ARCH_REQ(29)
#define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
......@@ -432,8 +435,7 @@ struct kvm_mmu {
int (*sync_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp);
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
hpa_t root_hpa;
gpa_t root_pgd;
struct kvm_mmu_root_info root;
union kvm_mmu_role mmu_role;
u8 root_level;
u8 shadow_root_level;
......@@ -1128,10 +1130,6 @@ struct kvm_arch {
struct kvm_hv hyperv;
struct kvm_xen xen;
#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
#endif
bool backwards_tsc_observed;
bool boot_vcpu_runs_old_kvmclock;
u32 bsp_vcpu_id;
......@@ -1151,6 +1149,7 @@ struct kvm_arch {
bool exception_payload_enabled;
bool bus_lock_detection_enabled;
bool enable_pmu;
/*
* If exit_on_emulation_error is set, and the in-kernel instruction
* emulator fails to emulate an instruction, allow userspace
......@@ -1220,6 +1219,7 @@ struct kvm_arch {
* the thread holds the MMU lock in write mode.
*/
spinlock_t tdp_mmu_pages_lock;
struct workqueue_struct *tdp_mmu_zap_wq;
#endif /* CONFIG_X86_64 */
/*
......@@ -1318,7 +1318,6 @@ struct kvm_x86_ops {
int (*hardware_enable)(void);
void (*hardware_disable)(void);
void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
......@@ -1331,7 +1330,7 @@ struct kvm_x86_ops {
void (*vcpu_free)(struct kvm_vcpu *vcpu);
void (*vcpu_reset)(struct kvm_vcpu *vcpu, bool init_event);
void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
void (*prepare_switch_to_guest)(struct kvm_vcpu *vcpu);
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
......@@ -1361,8 +1360,8 @@ struct kvm_x86_ops {
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
bool (*get_if_flag)(struct kvm_vcpu *vcpu);
void (*tlb_flush_all)(struct kvm_vcpu *vcpu);
void (*tlb_flush_current)(struct kvm_vcpu *vcpu);
void (*flush_tlb_all)(struct kvm_vcpu *vcpu);
void (*flush_tlb_current)(struct kvm_vcpu *vcpu);
int (*tlb_remote_flush)(struct kvm *kvm);
int (*tlb_remote_flush_with_range)(struct kvm *kvm,
struct kvm_tlb_range *range);
......@@ -1373,16 +1372,16 @@ struct kvm_x86_ops {
* Can potentially get non-canonical addresses through INVLPGs, which
* the implementation may choose to ignore if appropriate.
*/
void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr);
void (*flush_tlb_gva)(struct kvm_vcpu *vcpu, gva_t addr);
/*
* Flush any TLB entries created by the guest. Like tlb_flush_gva(),
* does not need to flush GPA->HPA mappings.
*/
void (*tlb_flush_guest)(struct kvm_vcpu *vcpu);
void (*flush_tlb_guest)(struct kvm_vcpu *vcpu);
int (*vcpu_pre_run)(struct kvm_vcpu *vcpu);
enum exit_fastpath_completion (*run)(struct kvm_vcpu *vcpu);
enum exit_fastpath_completion (*vcpu_run)(struct kvm_vcpu *vcpu);
int (*handle_exit)(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion exit_fastpath);
int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
......@@ -1391,8 +1390,8 @@ struct kvm_x86_ops {
u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
void (*patch_hypercall)(struct kvm_vcpu *vcpu,
unsigned char *hypercall_addr);
void (*set_irq)(struct kvm_vcpu *vcpu);
void (*set_nmi)(struct kvm_vcpu *vcpu);
void (*inject_irq)(struct kvm_vcpu *vcpu);
void (*inject_nmi)(struct kvm_vcpu *vcpu);
void (*queue_exception)(struct kvm_vcpu *vcpu);
void (*cancel_injection)(struct kvm_vcpu *vcpu);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
......@@ -1459,9 +1458,9 @@ struct kvm_x86_ops {
void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
int (*pi_update_irte)(struct kvm *kvm, unsigned int host_irq,
uint32_t guest_irq, bool set);
void (*start_assignment)(struct kvm *kvm);
void (*pi_start_assignment)(struct kvm *kvm);
void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
......@@ -1476,9 +1475,9 @@ struct kvm_x86_ops {
int (*leave_smm)(struct kvm_vcpu *vcpu, const char *smstate);
void (*enable_smi_window)(struct kvm_vcpu *vcpu);
int (*mem_enc_op)(struct kvm *kvm, void __user *argp);
int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp);
int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp);
int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
......@@ -1541,15 +1540,22 @@ extern struct kvm_x86_ops kvm_x86_ops;
#define KVM_X86_OP(func) \
DECLARE_STATIC_CALL(kvm_x86_##func, *(((struct kvm_x86_ops *)0)->func));
#define KVM_X86_OP_NULL KVM_X86_OP
#define KVM_X86_OP_OPTIONAL KVM_X86_OP
#define KVM_X86_OP_OPTIONAL_RET0 KVM_X86_OP
#include <asm/kvm-x86-ops.h>
static inline void kvm_ops_static_call_update(void)
{
#define KVM_X86_OP(func) \
#define __KVM_X86_OP(func) \
static_call_update(kvm_x86_##func, kvm_x86_ops.func);
#define KVM_X86_OP_NULL KVM_X86_OP
#define KVM_X86_OP(func) \
WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func)
#define KVM_X86_OP_OPTIONAL __KVM_X86_OP
#define KVM_X86_OP_OPTIONAL_RET0(func) \
static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \
(void *)__static_call_return0);
#include <asm/kvm-x86-ops.h>
#undef __KVM_X86_OP
}
#define __KVM_HAVE_ARCH_VM_ALLOC
......@@ -1587,6 +1593,13 @@ void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
const struct kvm_memory_slot *memslot,
int start_level);
void kvm_mmu_slot_try_split_huge_pages(struct kvm *kvm,
const struct kvm_memory_slot *memslot,
int target_level);
void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
const struct kvm_memory_slot *memslot,
u64 start, u64 end,
int target_level);
void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
......@@ -1724,7 +1737,6 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
void kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val);
unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
......@@ -1769,9 +1781,9 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
void kvm_update_dr7(struct kvm_vcpu *vcpu);
int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
ulong roots_to_free);
void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu);
void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);
gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
......@@ -1878,7 +1890,7 @@ static inline bool kvm_is_supported_user_return_msr(u32 msr)
return kvm_find_user_return_msr(msr) >= 0;
}
u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, u64 ratio);
u64 kvm_scale_tsc(u64 tsc, u64 ratio);
u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
......@@ -1955,4 +1967,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
#define KVM_CLOCK_VALID_FLAGS \
(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
#define KVM_X86_VALID_QUIRKS \
(KVM_X86_QUIRK_LINT0_REENABLED | \
KVM_X86_QUIRK_CD_NW_CLEARED | \
KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \
KVM_X86_QUIRK_OUT_7E_INC_RIP | \
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
#endif /* _ASM_X86_KVM_HOST_H */
......@@ -226,7 +226,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define AVIC_LOGICAL_ID_ENTRY_VALID_BIT 31
#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK (1 << 31)
#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK (0xFFULL)
#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK GENMASK_ULL(11, 0)
#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK (0xFFFFFFFFFFULL << 12)
#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK (1ULL << 62)
#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK (1ULL << 63)
......
......@@ -126,13 +126,6 @@ config KVM_XEN
If in doubt, say "N".
config KVM_MMU_AUDIT
bool "Audit KVM MMU"
depends on KVM && TRACEPOINTS
help
This option adds a R/W kVM module parameter 'mmu_audit', which allows
auditing of KVM MMU events at runtime.
config KVM_EXTERNAL_WRITE_TRACKING
bool
......
......@@ -715,9 +715,30 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
entry = &array->entries[array->nent++];
memset(entry, 0, sizeof(*entry));
entry->function = function;
entry->index = index;
entry->flags = 0;
switch (function & 0xC0000000) {
case 0x40000000:
/* Hypervisor leaves are always synthesized by __do_cpuid_func. */
return entry;
case 0x80000000:
/*
* 0x80000021 is sometimes synthesized by __do_cpuid_func, which
* would result in out-of-bounds calls to do_host_cpuid.
*/
{
static int max_cpuid_80000000;
if (!READ_ONCE(max_cpuid_80000000))
WRITE_ONCE(max_cpuid_80000000, cpuid_eax(0x80000000));
if (function > READ_ONCE(max_cpuid_80000000))
return entry;
}
default:
break;
}
cpuid_count(entry->function, entry->index,
&entry->eax, &entry->ebx, &entry->ecx, &entry->edx);
......@@ -1061,7 +1082,15 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->edx = 0;
break;
case 0x80000000:
entry->eax = min(entry->eax, 0x8000001f);
entry->eax = min(entry->eax, 0x80000021);
/*
* Serializing LFENCE is reported in a multitude of ways,
* and NullSegClearsBase is not reported in CPUID on Zen2;
* help userspace by providing the CPUID leaf ourselves.
*/
if (static_cpu_has(X86_FEATURE_LFENCE_RDTSC)
|| !static_cpu_has_bug(X86_BUG_NULL_SEG))
entry->eax = max(entry->eax, 0x80000021);
break;
case 0x80000001:
cpuid_entry_override(entry, CPUID_8000_0001_EDX);
......@@ -1132,6 +1161,27 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->ebx &= ~GENMASK(11, 6);
}
break;
case 0x80000020:
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
break;
case 0x80000021:
entry->ebx = entry->ecx = entry->edx = 0;
/*
* Pass down these bits:
* EAX 0 NNDBP, Processor ignores nested data breakpoints
* EAX 2 LAS, LFENCE always serializing
* EAX 6 NSCB, Null selector clear base
*
* Other defined bits are for MSRs that KVM does not expose:
* EAX 3 SPCL, SMM page configuration lock
* EAX 13 PCMSR, Prefetch control MSR
*/
entry->eax &= BIT(0) | BIT(2) | BIT(6);
if (static_cpu_has(X86_FEATURE_LFENCE_RDTSC))
entry->eax |= BIT(2);
if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
entry->eax |= BIT(6);
break;
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
/*Just support up to 0xC0000004 now*/
......@@ -1241,8 +1291,7 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
if (sanity_check_entries(entries, cpuid->nent, type))
return -EINVAL;
array.entries = vzalloc(array_size(sizeof(struct kvm_cpuid_entry2),
cpuid->nent));
array.entries = kvcalloc(sizeof(struct kvm_cpuid_entry2), cpuid->nent, GFP_KERNEL);
if (!array.entries)
return -ENOMEM;
......@@ -1260,7 +1309,7 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
r = -EFAULT;
out_free:
vfree(array.entries);
kvfree(array.entries);
return r;
}
......
......@@ -1623,11 +1623,6 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
goto exception;
}
if (!seg_desc.p) {
err_vec = (seg == VCPU_SREG_SS) ? SS_VECTOR : NP_VECTOR;
goto exception;
}
dpl = seg_desc.dpl;
switch (seg) {
......@@ -1643,14 +1638,34 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
if (!(seg_desc.type & 8))
goto exception;
if (seg_desc.type & 4) {
/* conforming */
if (dpl > cpl)
goto exception;
} else {
/* nonconforming */
if (rpl > cpl || dpl != cpl)
if (transfer == X86_TRANSFER_RET) {
/* RET can never return to an inner privilege level. */
if (rpl < cpl)
goto exception;
/* Outer-privilege level return is not implemented */
if (rpl > cpl)
return X86EMUL_UNHANDLEABLE;
}
if (transfer == X86_TRANSFER_RET || transfer == X86_TRANSFER_TASK_SWITCH) {
if (seg_desc.type & 4) {
/* conforming */
if (dpl > rpl)
goto exception;
} else {
/* nonconforming */
if (dpl != rpl)
goto exception;
}
} else { /* X86_TRANSFER_CALL_JMP */
if (seg_desc.type & 4) {
/* conforming */
if (dpl > cpl)
goto exception;
} else {
/* nonconforming */
if (rpl > cpl || dpl != cpl)
goto exception;
}
}
/* in long-mode d/b must be clear if l is set */
if (seg_desc.d && seg_desc.l) {
......@@ -1667,6 +1682,10 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
case VCPU_SREG_TR:
if (seg_desc.s || (seg_desc.type != 1 && seg_desc.type != 9))
goto exception;
if (!seg_desc.p) {
err_vec = NP_VECTOR;
goto exception;
}
old_desc = seg_desc;
seg_desc.type |= 2; /* busy */
ret = ctxt->ops->cmpxchg_emulated(ctxt, desc_addr, &old_desc, &seg_desc,
......@@ -1691,6 +1710,11 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
break;
}
if (!seg_desc.p) {
err_vec = (seg == VCPU_SREG_SS) ? SS_VECTOR : NP_VECTOR;
goto exception;
}
if (seg_desc.s) {
/* mark segment as accessed */
if (!(seg_desc.type & 1)) {
......@@ -2216,9 +2240,6 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
rc = emulate_pop(ctxt, &cs, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
/* Outer-privilege level return is not implemented */
if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl)
return X86EMUL_UNHANDLEABLE;
rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, cpl,
X86_TRANSFER_RET,
&new_desc);
......@@ -2615,8 +2636,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
}
static void
setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
struct desc_struct *cs, struct desc_struct *ss)
setup_syscalls_segments(struct desc_struct *cs, struct desc_struct *ss)
{
cs->l = 0; /* will be adjusted later */
set_desc_base(cs, 0); /* flat segment */
......@@ -2705,7 +2725,7 @@ static int em_syscall(struct x86_emulate_ctxt *ctxt)
if (!(efer & EFER_SCE))
return emulate_ud(ctxt);
setup_syscalls_segments(ctxt, &cs, &ss);
setup_syscalls_segments(&cs, &ss);
ops->get_msr(ctxt, MSR_STAR, &msr_data);
msr_data >>= 32;
cs_sel = (u16)(msr_data & 0xfffc);
......@@ -2773,7 +2793,7 @@ static int em_sysenter(struct x86_emulate_ctxt *ctxt)
if ((msr_data & 0xfffc) == 0x0)
return emulate_gp(ctxt, 0);
setup_syscalls_segments(ctxt, &cs, &ss);
setup_syscalls_segments(&cs, &ss);
ctxt->eflags &= ~(X86_EFLAGS_VM | X86_EFLAGS_IF);
cs_sel = (u16)msr_data & ~SEGMENT_RPL_MASK;
ss_sel = cs_sel + 8;
......@@ -2810,7 +2830,7 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
ctxt->mode == X86EMUL_MODE_VM86)
return emulate_gp(ctxt, 0);
setup_syscalls_segments(ctxt, &cs, &ss);
setup_syscalls_segments(&cs, &ss);
if ((ctxt->rex_prefix & 0x8) != 0x0)
usermode = X86EMUL_MODE_PROT64;
......@@ -3028,8 +3048,7 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;
}
static int task_switch_16(struct x86_emulate_ctxt *ctxt,
u16 tss_selector, u16 old_tss_sel,
static int task_switch_16(struct x86_emulate_ctxt *ctxt, u16 old_tss_sel,
ulong old_tss_base, struct desc_struct *new_desc)
{
struct tss_segment_16 tss_seg;
......@@ -3167,8 +3186,7 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,
return ret;
}
static int task_switch_32(struct x86_emulate_ctxt *ctxt,
u16 tss_selector, u16 old_tss_sel,
static int task_switch_32(struct x86_emulate_ctxt *ctxt, u16 old_tss_sel,
ulong old_tss_base, struct desc_struct *new_desc)
{
struct tss_segment_32 tss_seg;
......@@ -3276,10 +3294,9 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
old_tss_sel = 0xffff;
if (next_tss_desc.type & 8)
ret = task_switch_32(ctxt, tss_selector, old_tss_sel,
old_tss_base, &next_tss_desc);
ret = task_switch_32(ctxt, old_tss_sel, old_tss_base, &next_tss_desc);
else
ret = task_switch_16(ctxt, tss_selector, old_tss_sel,
ret = task_switch_16(ctxt, old_tss_sel,
old_tss_base, &next_tss_desc);
if (ret != X86EMUL_CONTINUE)
return ret;
......
......@@ -112,6 +112,9 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
if (!!auto_eoi_old == !!auto_eoi_new)
return;
if (!enable_apicv)
return;
down_write(&vcpu->kvm->arch.apicv_update_lock);
if (auto_eoi_new)
......@@ -1710,32 +1713,47 @@ int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
return kvm_hv_get_msr(vcpu, msr, pdata, host);
}
static __always_inline unsigned long *sparse_set_to_vcpu_mask(
struct kvm *kvm, u64 *sparse_banks, u64 valid_bank_mask,
u64 *vp_bitmap, unsigned long *vcpu_bitmap)
static void sparse_set_to_vcpu_mask(struct kvm *kvm, u64 *sparse_banks,
u64 valid_bank_mask, unsigned long *vcpu_mask)
{
struct kvm_hv *hv = to_kvm_hv(kvm);
bool has_mismatch = atomic_read(&hv->num_mismatched_vp_indexes);
u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
struct kvm_vcpu *vcpu;
int bank, sbank = 0;
unsigned long i;
u64 *bitmap;
BUILD_BUG_ON(sizeof(vp_bitmap) >
sizeof(*vcpu_mask) * BITS_TO_LONGS(KVM_MAX_VCPUS));
/*
* If vp_index == vcpu_idx for all vCPUs, fill vcpu_mask directly, else
* fill a temporary buffer and manually test each vCPU's VP index.
*/
if (likely(!has_mismatch))
bitmap = (u64 *)vcpu_mask;
else
bitmap = vp_bitmap;
memset(vp_bitmap, 0,
KVM_HV_MAX_SPARSE_VCPU_SET_BITS * sizeof(*vp_bitmap));
/*
* Each set of 64 VPs is packed into sparse_banks, with valid_bank_mask
* having a '1' for each bank that exists in sparse_banks. Sets must
* be in ascending order, i.e. bank0..bankN.
*/
memset(bitmap, 0, sizeof(vp_bitmap));
for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
KVM_HV_MAX_SPARSE_VCPU_SET_BITS)
vp_bitmap[bank] = sparse_banks[sbank++];
bitmap[bank] = sparse_banks[sbank++];
if (likely(!atomic_read(&hv->num_mismatched_vp_indexes))) {
/* for all vcpus vp_index == vcpu_idx */
return (unsigned long *)vp_bitmap;
}
if (likely(!has_mismatch))
return;
bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
bitmap_zero(vcpu_mask, KVM_MAX_VCPUS);
kvm_for_each_vcpu(i, vcpu, kvm) {
if (test_bit(kvm_hv_get_vpindex(vcpu), (unsigned long *)vp_bitmap))
__set_bit(i, vcpu_bitmap);
__set_bit(i, vcpu_mask);
}
return vcpu_bitmap;
}
struct kvm_hv_hcall {
......@@ -1743,6 +1761,7 @@ struct kvm_hv_hcall {
u64 ingpa;
u64 outgpa;
u16 code;
u16 var_cnt;
u16 rep_cnt;
u16 rep_idx;
bool fast;
......@@ -1750,22 +1769,60 @@ struct kvm_hv_hcall {
sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
};
static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool ex)
static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
int consumed_xmm_halves,
u64 *sparse_banks, gpa_t offset)
{
u16 var_cnt;
int i;
gpa_t gpa;
if (hc->var_cnt > 64)
return -EINVAL;
/* Ignore banks that cannot possibly contain a legal VP index. */
var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS);
if (hc->fast) {
/*
* Each XMM holds two sparse banks, but do not count halves that
* have already been consumed for hypercall parameters.
*/
if (hc->var_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
return HV_STATUS_INVALID_HYPERCALL_INPUT;
for (i = 0; i < var_cnt; i++) {
int j = i + consumed_xmm_halves;
if (j % 2)
sparse_banks[i] = sse128_hi(hc->xmm[j / 2]);
else
sparse_banks[i] = sse128_lo(hc->xmm[j / 2]);
}
return 0;
}
return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks,
var_cnt * sizeof(*sparse_banks));
}
static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
{
struct kvm *kvm = vcpu->kvm;
struct hv_tlb_flush_ex flush_ex;
struct hv_tlb_flush flush;
u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
unsigned long *vcpu_mask;
DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
u64 valid_bank_mask;
u64 sparse_banks[64];
int sparse_banks_len;
u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
bool all_cpus;
if (!ex) {
/*
* The Hyper-V TLFS doesn't allow more than 64 sparse banks, e.g. the
* valid mask is a u64. Fail the build if KVM's max allowed number of
* vCPUs (>4096) would exceed this limit, KVM will additional changes
* for Hyper-V support to avoid setting the guest up to fail.
*/
BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64);
if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE) {
if (hc->fast) {
flush.address_space = hc->ingpa;
flush.flags = hc->outgpa;
......@@ -1812,30 +1869,22 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
all_cpus = flush_ex.hv_vp_set.format !=
HV_GENERIC_SET_SPARSE_4K;
sparse_banks_len = bitmap_weight((unsigned long *)&valid_bank_mask, 64);
if (hc->var_cnt != bitmap_weight((unsigned long *)&valid_bank_mask, 64))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
if (!sparse_banks_len && !all_cpus)
if (all_cpus)
goto do_flush;
if (!hc->var_cnt)
goto ret_success;
if (!all_cpus) {
if (hc->fast) {
if (sparse_banks_len > HV_HYPERCALL_MAX_XMM_REGISTERS - 1)
return HV_STATUS_INVALID_HYPERCALL_INPUT;
for (i = 0; i < sparse_banks_len; i += 2) {
sparse_banks[i] = sse128_lo(hc->xmm[i / 2 + 1]);
sparse_banks[i + 1] = sse128_hi(hc->xmm[i / 2 + 1]);
}
} else {
gpa = hc->ingpa + offsetof(struct hv_tlb_flush_ex,
hv_vp_set.bank_contents);
if (unlikely(kvm_read_guest(kvm, gpa, sparse_banks,
sparse_banks_len *
sizeof(sparse_banks[0]))))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
}
}
if (kvm_get_sparse_vp_set(kvm, hc, 2, sparse_banks,
offsetof(struct hv_tlb_flush_ex,
hv_vp_set.bank_contents)))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
}
do_flush:
/*
* vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
* analyze it here, flush TLB regardless of the specified address space.
......@@ -1843,11 +1892,9 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
if (all_cpus) {
kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);
} else {
vcpu_mask = sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
vp_bitmap, vcpu_bitmap);
sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST,
vcpu_mask);
kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST, vcpu_mask);
}
ret_success:
......@@ -1875,21 +1922,18 @@ static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
}
}
static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool ex)
static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
{
struct kvm *kvm = vcpu->kvm;
struct hv_send_ipi_ex send_ipi_ex;
struct hv_send_ipi send_ipi;
u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
unsigned long *vcpu_mask;
DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
unsigned long valid_bank_mask;
u64 sparse_banks[64];
int sparse_banks_len;
u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
u32 vector;
bool all_cpus;
if (!ex) {
if (hc->code == HVCALL_SEND_IPI) {
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi,
sizeof(send_ipi))))
......@@ -1908,9 +1952,15 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
trace_kvm_hv_send_ipi(vector, sparse_banks[0]);
} else {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi_ex,
sizeof(send_ipi_ex))))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi_ex,
sizeof(send_ipi_ex))))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
} else {
send_ipi_ex.vector = (u32)hc->ingpa;
send_ipi_ex.vp_set.format = hc->outgpa;
send_ipi_ex.vp_set.valid_bank_mask = sse128_lo(hc->xmm[0]);
}
trace_kvm_hv_send_ipi_ex(send_ipi_ex.vector,
send_ipi_ex.vp_set.format,
......@@ -1918,22 +1968,20 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
vector = send_ipi_ex.vector;
valid_bank_mask = send_ipi_ex.vp_set.valid_bank_mask;
sparse_banks_len = bitmap_weight(&valid_bank_mask, 64) *
sizeof(sparse_banks[0]);
all_cpus = send_ipi_ex.vp_set.format == HV_GENERIC_SET_ALL;
if (hc->var_cnt != bitmap_weight(&valid_bank_mask, 64))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
if (all_cpus)
goto check_and_send_ipi;
if (!sparse_banks_len)
if (!hc->var_cnt)
goto ret_success;
if (kvm_read_guest(kvm,
hc->ingpa + offsetof(struct hv_send_ipi_ex,
vp_set.bank_contents),
sparse_banks,
sparse_banks_len))
if (kvm_get_sparse_vp_set(kvm, hc, 1, sparse_banks,
offsetof(struct hv_send_ipi_ex,
vp_set.bank_contents)))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
}
......@@ -1941,11 +1989,13 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
vcpu_mask = all_cpus ? NULL :
sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
vp_bitmap, vcpu_bitmap);
if (all_cpus) {
kvm_send_ipi_to_many(kvm, vector, NULL);
} else {
sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
}
ret_success:
return HV_STATUS_SUCCESS;
......@@ -2017,11 +2067,6 @@ int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
return ret;
}
bool kvm_hv_hypercall_enabled(struct kvm_vcpu *vcpu)
{
return vcpu->arch.hyperv_enabled && to_kvm_hv(vcpu->kvm)->hv_guest_os_id;
}
static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
{
bool longmode;
......@@ -2096,6 +2141,7 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc)
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
case HVCALL_SEND_IPI_EX:
return true;
}
......@@ -2191,19 +2237,25 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
}
hc.code = hc.param & 0xffff;
hc.var_cnt = (hc.param & HV_HYPERCALL_VARHEAD_MASK) >> HV_HYPERCALL_VARHEAD_OFFSET;
hc.fast = !!(hc.param & HV_HYPERCALL_FAST_BIT);
hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
hc.rep = !!(hc.rep_cnt || hc.rep_idx);
trace_kvm_hv_hypercall(hc.code, hc.fast, hc.rep_cnt, hc.rep_idx,
hc.ingpa, hc.outgpa);
trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
hc.rep_idx, hc.ingpa, hc.outgpa);
if (unlikely(!hv_check_hypercall_access(hv_vcpu, hc.code))) {
ret = HV_STATUS_ACCESS_DENIED;
goto hypercall_complete;
}
if (unlikely(hc.param & HV_HYPERCALL_RSVD_MASK)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
goto hypercall_complete;
}
if (hc.fast && is_xmm_fast_hypercall(&hc)) {
if (unlikely(hv_vcpu->enforce_cpuid &&
!(hv_vcpu->cpuid_cache.features_edx &
......@@ -2217,14 +2269,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
switch (hc.code) {
case HVCALL_NOTIFY_LONG_SPIN_WAIT:
if (unlikely(hc.rep)) {
if (unlikely(hc.rep || hc.var_cnt)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
kvm_vcpu_on_spin(vcpu, true);
break;
case HVCALL_SIGNAL_EVENT:
if (unlikely(hc.rep)) {
if (unlikely(hc.rep || hc.var_cnt)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
......@@ -2234,7 +2286,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
fallthrough; /* maybe userspace knows this conn_id */
case HVCALL_POST_MESSAGE:
/* don't bother userspace if it has no way to handle it */
if (unlikely(hc.rep || !to_hv_synic(vcpu)->active)) {
if (unlikely(hc.rep || hc.var_cnt || !to_hv_synic(vcpu)->active)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
......@@ -2247,46 +2299,43 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
kvm_hv_hypercall_complete_userspace;
return 0;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST:
if (unlikely(!hc.rep_cnt || hc.rep_idx)) {
if (unlikely(hc.var_cnt)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
ret = kvm_hv_flush_tlb(vcpu, &hc, false);
break;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE:
if (unlikely(hc.rep)) {
fallthrough;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
if (unlikely(!hc.rep_cnt || hc.rep_idx)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
ret = kvm_hv_flush_tlb(vcpu, &hc, false);
ret = kvm_hv_flush_tlb(vcpu, &hc);
break;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
if (unlikely(!hc.rep_cnt || hc.rep_idx)) {
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE:
if (unlikely(hc.var_cnt)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
ret = kvm_hv_flush_tlb(vcpu, &hc, true);
break;
fallthrough;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
if (unlikely(hc.rep)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
ret = kvm_hv_flush_tlb(vcpu, &hc, true);
ret = kvm_hv_flush_tlb(vcpu, &hc);
break;
case HVCALL_SEND_IPI:
if (unlikely(hc.rep)) {
if (unlikely(hc.var_cnt)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
ret = kvm_hv_send_ipi(vcpu, &hc, false);
break;
fallthrough;
case HVCALL_SEND_IPI_EX:
if (unlikely(hc.fast || hc.rep)) {
if (unlikely(hc.rep)) {
ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
break;
}
ret = kvm_hv_send_ipi(vcpu, &hc, true);
ret = kvm_hv_send_ipi(vcpu, &hc);
break;
case HVCALL_POST_DEBUG_DATA:
case HVCALL_RETRIEVE_DEBUG_DATA:
......@@ -2417,10 +2466,6 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
if (kvm_x86_ops.nested_ops->get_evmcs_version)
evmcs_ver = kvm_x86_ops.nested_ops->get_evmcs_version(vcpu);
/* Skip NESTED_FEATURES if eVMCS is not supported */
if (!evmcs_ver)
--nent;
if (cpuid->nent < nent)
return -E2BIG;
......@@ -2520,8 +2565,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
case HYPERV_CPUID_NESTED_FEATURES:
ent->eax = evmcs_ver;
if (evmcs_ver)
ent->eax |= HV_X64_NESTED_MSR_BITMAP;
ent->eax |= HV_X64_NESTED_MSR_BITMAP;
break;
......
......@@ -89,7 +89,11 @@ static inline u32 kvm_hv_get_vpindex(struct kvm_vcpu *vcpu)
int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host);
int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host);
bool kvm_hv_hypercall_enabled(struct kvm_vcpu *vcpu);
static inline bool kvm_hv_hypercall_enabled(struct kvm_vcpu *vcpu)
{
return vcpu->arch.hyperv_enabled && to_kvm_hv(vcpu->kvm)->hv_guest_os_id;
}
int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
void kvm_hv_irq_routing_update(struct kvm *kvm);
......
......@@ -437,13 +437,13 @@ static u32 pic_ioport_read(void *opaque, u32 addr)
return ret;
}
static void elcr_ioport_write(void *opaque, u32 addr, u32 val)
static void elcr_ioport_write(void *opaque, u32 val)
{
struct kvm_kpic_state *s = opaque;
s->elcr = val & s->elcr_mask;
}
static u32 elcr_ioport_read(void *opaque, u32 addr1)
static u32 elcr_ioport_read(void *opaque)
{
struct kvm_kpic_state *s = opaque;
return s->elcr;
......@@ -474,7 +474,7 @@ static int picdev_write(struct kvm_pic *s,
case 0x4d0:
case 0x4d1:
pic_lock(s);
elcr_ioport_write(&s->pics[addr & 1], addr, data);
elcr_ioport_write(&s->pics[addr & 1], data);
pic_unlock(s);
break;
default:
......@@ -505,7 +505,7 @@ static int picdev_read(struct kvm_pic *s,
case 0x4d0:
case 0x4d1:
pic_lock(s);
*data = elcr_ioport_read(&s->pics[addr & 1], addr);
*data = elcr_ioport_read(&s->pics[addr & 1]);
pic_unlock(s);
break;
default:
......
......@@ -54,9 +54,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
int trigger_mode,
int pin);
static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
unsigned long addr,
unsigned long length)
static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic)
{
unsigned long result = 0;
......@@ -593,7 +591,7 @@ static int ioapic_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
break;
case IOAPIC_REG_WINDOW:
result = ioapic_read_indirect(ioapic, addr, len);
result = ioapic_read_indirect(ioapic);
break;
default:
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -101,7 +101,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
{
struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
if (!enable_pmu)
if (!vcpu->kvm->arch.enable_pmu)
return NULL;
switch (msr) {
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册