- 04 1月, 2022 1 次提交
-
-
由 Zenghui Yu 提交于
kvm_arm_init_arch_resources() was renamed to kvm_arm_init_sve() in commit a3be836d ("KVM: arm/arm64: Demote kvm_arm_init_arch_resources() to just set up SVE"). Fix the function name in comment of kvm_vcpu_finalize_sve(). Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211230141535.1389-1-yuzenghui@huawei.com
-
- 20 12月, 2021 2 次提交
-
-
由 Fuad Tabba 提交于
The barrier is there for power_off rather than power_state. Probably typo in commit 358b28f0 ("arm/arm64: KVM: Allow a VCPU to fully reset itself"). Signed-off-by: NFuad Tabba <tabba@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208193257.667613-3-tabba@google.com
-
由 Fuad Tabba 提交于
The comment for kvm_reset_vcpu() refers to the sysreg table as being the table above, probably because of the code extracted at commit f4672752 ("arm64: KVM: virtual CPU reset"). Fix the comment to remove the potentially confusing reference. Signed-off-by: NFuad Tabba <tabba@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com
-
- 17 12月, 2021 1 次提交
-
-
由 Marc Zyngier 提交于
Ganapatrao reported that the kvm_pgtable->mmu pointer is more or less hardcoded to the main S2 mmu structure, while the nested code needs it to point to other instances (as we have one instance per nested context). Rework the initialisation of the kvm_pgtable structure so that this assumtion doesn't hold true anymore. This requires some minor changes to the order in which things are initialised (the mmu->arch pointer being the critical one). Reported-by: NGanapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: NGanapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211129200150.351436-5-maz@kernel.org
-
- 16 12月, 2021 16 次提交
-
-
由 Quentin Perret 提交于
Make use of the newly introduced unshare hypercall during guest teardown to unmap guest-related data structures from the hyp stage-1. Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-15-qperret@google.com
-
由 Will Deacon 提交于
Introduce an unshare hypercall which can be used to unmap memory from the hypervisor stage-1 in nVHE protected mode. This will be useful to update the EL2 ownership state of pages during guest teardown, and avoids keeping dangling mappings to unreferenced portions of memory. Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-14-qperret@google.com
-
由 Will Deacon 提交于
Tearing down a previously shared memory region results in the borrower losing access to the underlying pages and returning them to the "owned" state in the owner. Implement a do_unshare() helper, along the same lines as do_share(), to provide this functionality for the host-to-hyp case. Reviewed-by: NAndrew Walbran <qwandor@google.com> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-13-qperret@google.com
-
由 Will Deacon 提交于
__pkvm_host_share_hyp() shares memory between the host and the hypervisor so implement it as an invocation of the new do_share() mechanism. Note that double-sharing is no longer permitted (as this allows us to reduce the number of page-table walks significantly), but is thankfully no longer relied upon by the host. Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-12-qperret@google.com
-
由 Will Deacon 提交于
By default, protected KVM isolates memory pages so that they are accessible only to their owner: be it the host kernel, the hypervisor at EL2 or (in future) the guest. Establishing shared-memory regions between these components therefore involves a transition for each page so that the owner can share memory with a borrower under a certain set of permissions. Introduce a do_share() helper for safely sharing a memory region between two components. Currently, only host-to-hyp sharing is implemented, but the code is easily extended to handle other combinations and the permission checks for each component are reusable. Reviewed-by: NAndrew Walbran <qwandor@google.com> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-11-qperret@google.com
-
由 Will Deacon 提交于
In preparation for adding additional locked sections for manipulating page-tables at EL2, introduce some simple wrappers around the host and hypervisor locks so that it's a bit easier to read and bit more difficult to take the wrong lock (or even take them in the wrong order). Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-10-qperret@google.com
-
由 Will Deacon 提交于
Explicitly name the combination of SW0 | SW1 as reserved in the pte and introduce a new PKVM_NOPAGE meta-state which, although not directly stored in the software bits of the pte, can be used to represent an entry for which there is no underlying page. This is distinct from an invalid pte, as stage-2 identity mappings for the host are created lazily and so an invalid pte there is the same as a valid mapping for the purposes of ownership information. This state will be used for permission checking during page transitions in later patches. Reviewed-by: NAndrew Walbran <qwandor@google.com> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-9-qperret@google.com
-
由 Quentin Perret 提交于
In order to simplify the page tracking infrastructure at EL2 in nVHE protected mode, move the responsibility of refcounting pages that are shared multiple times on the host. In order to do so, let's create a red-black tree tracking all the PFNs that have been shared, along with a refcount. Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-8-qperret@google.com
-
由 Quentin Perret 提交于
The create_hyp_mappings() function can currently be called at any point in time. However, its behaviour in protected mode changes widely depending on when it is being called. Prior to KVM init, it is used to create the temporary page-table used to bring-up the hypervisor, and later on it is transparently turned into a 'share' hypercall when the kernel has lost control over the hypervisor stage-1. In order to prepare the ground for also unsharing pages with the hypervisor during guest teardown, introduce a kvm_share_hyp() function to make it clear in which places a share hypercall should be expected, as we will soon need a matching unshare hypercall in all those places. Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com
-
由 Will Deacon 提交于
Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor stage-1 mappings at EL2. Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-6-qperret@google.com
-
由 Will Deacon 提交于
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback being provided by the memory-management operations for the page-table. Wire up this callback for the hypervisor stage-1 page-table. Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-5-qperret@google.com
-
由 Quentin Perret 提交于
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken due to the lack of refcount support in the early allocator. Fix-up the refcount in the finalize walker, once the 'hyp_vmemmap' is up and running. Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-4-qperret@google.com
-
由 Quentin Perret 提交于
To prepare the ground for allowing hyp stage-1 mappings to be removed at run-time, update the KVM page-table code to maintain a correct refcount using the ->{get,put}_page() function callbacks. Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-3-qperret@google.com
-
由 Quentin Perret 提交于
In nVHE protected mode, the EL2 code uses a temporary allocator during boot while re-creating its stage-1 page-table. Unfortunately, the hyp_vmmemap is not ready to use at this stage, so refcounting pages is not possible. That is not currently a problem because hyp stage-1 mappings are never removed, which implies refcounting of page-table pages is unnecessary. In preparation for allowing hypervisor stage-1 mappings to be removed, provide stub implementations for {get,put}_page() in the early allocator. Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-2-qperret@google.com
-
由 Marc Zyngier 提交于
Running the KVM selftests results in these messages being dumped in the kernel console: [ 188.051073] kvm [469]: VGIC redist and dist frames overlap [ 188.056820] kvm [469]: VGIC redist and dist frames overlap [ 188.076199] kvm [469]: VGIC redist and dist frames overlap Being amle to trigger this from userspace is definitely not on, so demote these warnings to kvm_debug(). Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org
-
由 Marc Zyngier 提交于
When handling an error at the point where we try and register all the redistributors, we unregister all the previously registered frames by counting down from the failing index. However, the way the code is written relies on that index being a signed value. Which won't be true once we switch to an xarray-based vcpu set. Since this code is pretty awkward the first place, and that the failure mode is hard to spot, rewrite this loop to iterate over the vcpus upwards rather than downwards. Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104526.1482124-1-maz@kernel.org
-
- 15 12月, 2021 6 次提交
-
-
由 Quentin Perret 提交于
The kvm_host_owns_hyp_mappings() function should return true if and only if the host kernel is responsible for creating the hypervisor stage-1 mappings. That is only possible in standard non-VHE mode, or during boot in protected nVHE mode. But either way, none of this makes sense in VHE, so make sure to catch this case as well, hence making the function return sensible values in any context (VHE or not). Suggested-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-7-qperret@google.com
-
由 Quentin Perret 提交于
Now that GICv2 is disabled in nVHE protected mode there should be no other reason for the host to use create_hyp_io_mappings() or kvm_phys_addr_ioremap(). Add sanity checks to make sure that assumption remains true looking forward. Signed-off-by: NQuentin Perret <qperret@google.com> Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-6-qperret@google.com
-
由 Quentin Perret 提交于
The __io_map_base variable is used at EL2 to track the end of the hypervisor's "private" VA range in nVHE protected mode. However it doesn't need to be used outside of mm.c, so let's make it static to keep all the hyp VA allocation logic in one place. Signed-off-by: NQuentin Perret <qperret@google.com> Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-5-qperret@google.com
-
由 Quentin Perret 提交于
The hyp memory pool struct is sized to fit exactly the needs of the hypervisor stage-1 page-table allocator, so it is important it is not used for anything else. As it is currently used only from setup.c, reduce its visibility by marking it static. Signed-off-by: NQuentin Perret <qperret@google.com> Reviewed-by: NAndrew Walbran <qwandor@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-4-qperret@google.com
-
由 Quentin Perret 提交于
GICv2 requires having device mappings in guests and the hypervisor, which is incompatible with the current pKVM EL2 page ownership model which only covers memory. While it would be desirable to support pKVM with GICv2, this will require a lot more work, so let's make the current assumption clear until then. Co-developed-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NQuentin Perret <qperret@google.com> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
-
由 Quentin Perret 提交于
The EL2 page allocator in protected mode maintains a per-pool max order value to optimize allocations when the memory region it covers is small. However, the max order value is currently under-estimated whenever the number of pages in the region is a power of two. Fix the estimation. Signed-off-by: NQuentin Perret <qperret@google.com> Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-2-qperret@google.com
-
- 06 12月, 2021 3 次提交
-
-
由 Will Deacon 提交于
kvm/hyp/reserved_mem.c contains host code executing at EL1 and is not linked into the hypervisor object. Move the file into kvm/pkvm.c and rework the headers so that the definitions shared between the host and the hypervisor live in asm/kvm_pkvm.h. Signed-off-by: NWill Deacon <will@kernel.org> Tested-by: NFuad Tabba <tabba@google.com> Reviewed-by: NFuad Tabba <tabba@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211202171048.26924-4-will@kernel.org
-
由 Will Deacon 提交于
In order to avoid exposing hypervisor (EL2) data structures directly to the host, generate hyp_constants.h to provide constants such as structure sizes to the host without dragging in the definitions themselves. Signed-off-by: NWill Deacon <will@kernel.org> Tested-by: NFuad Tabba <tabba@google.com> Reviewed-by: NFuad Tabba <tabba@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211202171048.26924-3-will@kernel.org
-
由 Rikard Falkeborn 提交于
The only usage of kvm_io_gic_ops is to make a comparison with its address and to pass its address to kvm_iodevice_init() which takes a pointer to const kvm_io_device_ops as input. Make it const to allow the compiler to put it in read-only memory. Signed-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
-
- 01 12月, 2021 6 次提交
-
-
由 Marc Zyngier 提交于
When running a KVM guest hosted on an ARMv8.7 machine, the host kernel complains that it doesn't know about the architected number of events. Fix it by adding the PMUver code corresponding to PMUv3 for ARMv8.7. Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com> Tested-by: NAlexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211126115533.217903-1-maz@kernel.org
-
由 Marc Zyngier 提交于
With the transition to kvm_arch_vcpu_run_pid_change() to handle the "run once" activities, it becomes obvious that has_run_once is now an exact shadow of vcpu->pid. Replace vcpu->arch.has_run_once with a new vcpu_has_run_once() helper that directly checks for vcpu->pid, and get rid of the now unused field. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Marc Zyngier 提交于
The kvm_arch_vcpu_run_pid_change() helper gets called on each PID change. The kvm_vcpu_first_run_init() helper gets run on the... first run(!) of a vcpu. As it turns out, the first run of a vcpu also triggers a PID change event (vcpu->pid is initially NULL). Use this property to merge these two helpers and get rid of another arm64-specific oddity. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Marc Zyngier 提交于
Restructure kvm_vcpu_first_run_init() to set the has_run_once flag after having completed all the "run once" activities. This includes moving the flip of the userspace irqchip static key to a point where nothing can fail. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Marc Zyngier 提交于
Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything to the table. Move it next to kvm_vcpu_first_run_init(), which will be convenient for what is next to come. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Marc Zyngier 提交于
We currently map the SVE state to HYP on detection of a PID change. Although this matches what we do for FPSIMD, this is pretty pointless for SVE, as the buffer is per-vcpu and has nothing to do with the thread that is being run. Move the mapping of the SVE state to finalize-time, which is where we allocate the state memory, and thus the most logical place to do this. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
- 23 11月, 2021 3 次提交
-
-
由 Marc Zyngier 提交于
Now that we can track an equivalent of TIF_FOREIGN_FPSTATE, drop the mapping of current's thread_info at EL2. Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Marc Zyngier 提交于
We currently have to maintain a mapping the thread_info structure at EL2 in order to be able to check the TIF_FOREIGN_FPSTATE flag. In order to eventually get rid of this, start with a vcpu flag that shadows the thread flag on each entry into the hypervisor. Reviewed-by: NMark Brown <broonie@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Marc Zyngier 提交于
Now that we don't have any users left for __sve_save_state, remove it altogether. Should we ever need to save the SVE state from the hypervisor again, we can always re-introduce it. Suggested-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
- 22 11月, 2021 1 次提交
-
-
由 Marc Zyngier 提交于
The SVE host tracking in KVM is pretty involved. It relies on a set of flags tracking the ownership of the SVE register, as well as that of the EL0 access. It is also pretty scary: __hyp_sve_save_host() computes a thread_struct pointer and obtains a sve_state which gets directly accessed without further ado, even on nVHE. How can this even work? The answer to that is that it doesn't, and that this is mostly dead code. Closer examination shows that on executing a syscall, userspace loses its SVE state entirely. This is part of the ABI. Another thing to notice is that although the kernel provides helpers such as kernel_neon_begin()/end(), they only deal with the FP/NEON state, and not SVE. Given that you can only execute a guest as the result of a syscall, and that the kernel cannot use SVE by itself, it becomes pretty obvious that there is never any host SVE state to save, and that this code is only there to increase confusion. Get rid of the TIF_SVE tracking and host save infrastructure altogether. Reviewed-by: NMark Brown <broonie@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
- 18 11月, 2021 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
Generally, it doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs. Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs depending on whether it is a system-wide ioctl or a per-VM one. Previously, KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() which is what gets returned by system-wide KVM_CAP_MAX_VCPUS. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211116163443.88707-2-vkuznets@redhat.com> Acked-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-