- 30 11月, 2015 8 次提交
-
-
This patch updates the routines (sca_*) to provide transparent access to and manipulation on the data for both Basic and Extended SCA in use. The kvm.arch.sca is generalized to (void *) to handle BSCA/ESCA cases. Also the kvm.arch.use_esca flag is provided. The actual functionality is kept the same. Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
This patch adds new structures and updates some existing ones to provide the base for Extended SCA functionality. The old sca_* structures were renamed to bsca_* to keep things uniform. The access to fields of SIGP controls were turned into bitfields instead of hardcoded bitmasks. Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
This patch provides SCA-aware helpers to create/delete a VCPU. This is to prepare for upcoming introduction of Extended SCA support. Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
This patch generalizes access to the SIGP controls, which is a part of SCA. This is to prepare for upcoming introduction of Extended SCA support. Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
This patch generalizes access to the IPTE controls, which is a part of SCA. This is to prepare for upcoming introduction of Extended SCA support. Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
Introduce sclp.has_hvs and sclp.has_esca to provide a way for kvm to check whether the extended-SCA and the home-virtual-SCA facilities are available. Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 David Hildenbrand 提交于
Let's rewrite this function to better reflect how we actually handle exit_code. By dropping out early we can save a few cycles. This especially speeds up sie exits caused by host irqs. Also, let's move the special -EOPNOTSUPP for intercepts to the place where it belongs and convert it to -EREMOTE. Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 David Hildenbrand 提交于
Let's reuse the new common function for VPCU lookup by id. Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> [split out the new function into a separate patch]
-
- 26 11月, 2015 20 次提交
-
-
由 Takuya Yoshikawa 提交于
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
As kvm_mmu_get_page() was changed so that every parent pointer would not get into the sp->parent_ptes chain before the entry pointed to by it was set properly, we can use the for_each_rmap_spte macro instead of pte_list_walk(). Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). In addition, the patch avoids calling mark_unsync() for other parents in the sp->parent_ptes chain than the newly added parent_pte, because they have been there since before the current page fault handling started. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove the extra allocation error check and zero-initialization of parent_ptes: shadow page headers allocated by kmem_cache_zalloc() are always in the per-VCPU pools. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e53 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb90 ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them without clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yaowei Bai 提交于
In another patch kvm_is_visible_gfn is maken return bool due to this function only returns zero or one as its return value, let's also make kvmppc_visible_gpa return bool to keep consistent. No functional change. Signed-off-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Commit 7a1638ce ("nEPT: Redefine EPT-specific link_shadow_page()", 2013-08-05) says: Since nEPT doesn't support A/D bit, we should not set those bit when building the shadow page table. but this is not necessary. Even though nEPT doesn't support A/D bits, and hence the vmcs12 EPT pointer will never enable them, we always use them for shadow page tables if available (see construct_eptp in vmx.c). So we can set the A/D bits freely in the shadow page table. This patch hence basically reverts commit 7a1638ce. Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Poor #AC was so unimportant until a few days ago that we were not even tracing its name correctly. But now it's all over the place. Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
RDTSCP was never supported for AMD CPUs, which nobody noticed because Linux does not use it. But exactly the fact that Linux does not use it makes the implementation very simple; we can freely trash MSR_TSC_AUX while running the guest. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
If we do not do this, it is not properly saved and restored across migration. Windows notices due to its self-protection mechanisms, and is very upset about it (blue screen of death). Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrey Smetanin 提交于
A new vcpu exit is introduced to notify the userspace of the changes in Hyper-V SynIC configuration triggered by guest writing to the corresponding MSRs. Changes v4: * exit into userspace only if guest writes into SynIC MSR's Changes v3: * added KVM_EXIT_HYPERV types and structs notes into docs Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrey Smetanin 提交于
SynIC (synthetic interrupt controller) is a lapic extension, which is controlled via MSRs and maintains for each vCPU - 16 synthetic interrupt "lines" (SINT's); each can be configured to trigger a specific interrupt vector optionally with auto-EOI semantics - a message page in the guest memory with 16 256-byte per-SINT message slots - an event flag page in the guest memory with 16 2048-bit per-SINT event flag areas The host triggers a SINT whenever it delivers a new message to the corresponding slot or flips an event flag bit in the corresponding area. The guest informs the host that it can try delivering a message by explicitly asserting EOI in lapic or writing to End-Of-Message (EOM) MSR. The userspace (qemu) triggers interrupts and receives EOM notifications via irqfd with resampler; for that, a GSI is allocated for each configured SINT, and irq_routing api is extended to support GSI-SINT mapping. Changes v4: * added activation of SynIC by vcpu KVM_ENABLE_CAP * added per SynIC active flag * added deactivation of APICv upon SynIC activation Changes v3: * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into docs Changes v2: * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors * add Hyper-V SynIC vectors into EOI exit bitmap * Hyper-V SyniIC SINT msr write logic simplified Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrey Smetanin 提交于
The decision on whether to use hardware APIC virtualization used to be taken globally, based on the availability of the feature in the CPU and the value of a module parameter. However, under certain circumstances we want to control it on per-vcpu basis. In particular, when the userspace activates HyperV synthetic interrupt controller (SynIC), APICv has to be disabled as it's incompatible with SynIC auto-EOI behavior. To achieve that, introduce 'apicv_active' flag on struct kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv off. The flag is initialized based on the module parameter and CPU capability, and consulted whenever an APICv-specific action is performed. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrey Smetanin 提交于
The function to determine if the vector is handled by ioapic used to rely on the fact that only ioapic-handled vectors were set up to cause vmexits when virtual apic was in use. We're going to break this assumption when introducing Hyper-V synthetic interrupts: they may need to cause vmexits too. To achieve that, introduce a new bitmap dedicated specifically for ioapic-handled vectors, and populate EOI exit bitmap from it for now. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrey Smetanin 提交于
Actually kvm_arch_irq_routing_update() should be kvm_arch_post_irq_routing_update() as it's called at the end of irq routing update. This renaming frees kvm_arch_irq_routing_update function name. kvm_arch_irq_routing_update() weak function which will be used to update mappings for arch-specific irq routing entries (in particular, the upcoming Hyper-V synthetic interrupts). Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 11月, 2015 7 次提交
-
-
由 Haozhong Zhang 提交于
This patch removes the vpid check when emulating nested invvpid instruction of type all-contexts invalidation. The existing code is incorrect because: (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate Translations Based on VPID", invvpid instruction does not check vpid in the invvpid descriptor when its type is all-contexts invalidation. (2) According to the same document, invvpid of type all-contexts invalidation does not require there is an active VMCS, so/and get_vmcs12() in the existing code may result in a NULL-pointer dereference. In practice, it can crash both KVM itself and L1 hypervisors that use invvpid (e.g. Xen). Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mark Rutland 提交于
If we call __kvm_hyp_panic while a guest context is active, we call __restore_sysregs before acquiring the system register values for the panic, in the process throwing away the PAR_EL1 value at the point of the panic. This patch modifies __kvm_hyp_panic to stash the PAR_EL1 value prior to restoring host register values, enabling us to report the original values at the point of the panic. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Mark Rutland 提交于
Currently __kvm_hyp_panic uses %p for values which are not pointers, such as the ESR value. This can confusingly lead to "(null)" being printed for the value. Use %x instead, and only use %p for host pointers. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Christoffer Dall 提交于
We were setting the physical active state on the GIC distributor in a preemptible section, which could cause us to set the active state on different physical CPU from the one we were actually going to run on, hacoc ensues. Since we are no longer descheduling/scheduling soft timers in the flush/sync timer functions, simply moving the timer flush into a non-preemptible section. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Marc Zyngier 提交于
Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults when a Stage 1 permission fault or device alignment fault should have been reported. This patch implements the workaround (which is to validate that the Stage-1 translation actually succeeds) by using code patching. Cc: stable@vger.kernel.org Reviewed-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Marc Zyngier 提交于
When running a 32bit guest under a 64bit hypervisor, the ARMv8 architecture defines a mapping of the 32bit registers in the 64bit space. This includes banked registers that are being demultiplexed over the 64bit ones. On exceptions caused by an operation involving a 32bit register, the HW exposes the register number in the ESR_EL2 register. It was so far understood that SW had to distinguish between AArch32 and AArch64 accesses (based on the current AArch32 mode and register number). It turns out that I misinterpreted the ARM ARM, and the clue is in D1.20.1: "For some exceptions, the exception syndrome given in the ESR_ELx identifies one or more register numbers from the issued instruction that generated the exception. Where the exception is taken from an Exception level using AArch32 these register numbers give the AArch64 view of the register." Which means that the HW is already giving us the translated version, and that we shouldn't try to interpret it at all (for example, doing an MMIO operation from the IRQ mode using the LR register leads to very unexpected behaviours). The fix is thus not to perform a call to vcpu_reg32() at all from vcpu_reg(), and use whatever register number is supplied directly. The only case we need to find out about the mapping is when we actively generate a register access, which only occurs when injecting a fault in a guest. Cc: stable@vger.kernel.org Reviewed-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Ard Biesheuvel 提交于
The open coded tests for checking whether a PTE maps a page as uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern, which is not guaranteed to work since the type of a mapping is not a set of mutually exclusive bits For HYP mappings, the type is an index into the MAIR table (i.e, the index itself does not contain any information whatsoever about the type of the mapping), and for stage-2 mappings it is a bit field where normal memory and device types are defined as follows: #define MT_S2_NORMAL 0xf #define MT_S2_DEVICE_nGnRE 0x1 I.e., masking *and* comparing with the latter matches on the former, and we have been getting lucky merely because the S2 device mappings also have the PTE_UXN bit set, or we would misidentify memory mappings as device mappings. Since the unmap_range() code path (which contains one instance of the flawed test) is used both for HYP mappings and stage-2 mappings, and considering the difference between the two, it is non-trivial to fix this by rewriting the tests in place, as it would involve passing down the type of mapping through all the functions. However, since HYP mappings and stage-2 mappings both deal with host physical addresses, we can simply check whether the mapping is backed by memory that is managed by the host kernel, and only perform the D-cache maintenance if this is the case. Cc: stable@vger.kernel.org Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: NPavel Fedin <p.fedin@samsung.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 22 11月, 2015 5 次提交
-
-
由 Helge Deller 提交于
Adjust the linker script and map_pages() to map kernel text and data on physical 1MB huge/large pages. Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Helge Deller 提交于
This patch adds huge page support to allow userspace to allocate huge pages and to use hugetlbfs filesystem on 32- and 64-bit Linux kernels. A later patch will add kernel support to map kernel text and data on huge pages. The only requirement is, that the kernel needs to be compiled for a PA8X00 CPU (PA2.0 architecture). Older PA1.X CPUs do not support variable page sizes. 64bit Kernels are compiled for PA2.0 by default. Technically on parisc multiple physical huge pages may be needed to emulate standard 2MB huge pages. Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Helge Deller 提交于
Use the 22bit instead of the 17bit branch instruction on a 64bit kernel to reach the do_syscall_trace_exit function from the gateway page. A huge page enabled kernel may need the additional branch distance bits. Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Helge Deller 提交于
For the 64bit kernel the initially 16 MB kernel memory might become too small if you build a kernel with many modules built-in and with kernel text and data areas mapped on huge pages. This patch increases the initial mapping to 32MB for 64bit kernels and keeps 16MB for 32bit kernels. Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Helge Deller 提交于
A fault vector on parisc needs to be 2K aligned. Furthermore the checksum of the fault vector needs to sum up to 0 which is being calculated and written at runtime. Up to now we aligned both PA20 and PA11 fault vectors on the same 4K page in order to easily write the checksum after having mapped the kernel read-only (by mapping this page only as read-write). But when we want to map the kernel text and data on huge pages this makes things harder. So, simplify it by aligning both fault vectors on 2K boundries and write the checksum before we map the page read-only. Signed-off-by: NHelge Deller <deller@gmx.de>
-