- 05 12月, 2017 1 次提交
-
-
由 Brijesh Singh 提交于
On AMD platforms, under certain conditions insn_len may be zero on #NPF. This can happen if a guest gets a page-fault on data access but the HW table walker is not able to read the instruction page (e.g instruction page is not present in memory). Typically, when insn_len is zero, x86_emulate_instruction() walks the guest page table and fetches the instruction bytes from guest memory. When SEV is enabled, the guest memory is encrypted with guest-specific key hence hypervisor will not able to fetch the instruction bytes. In those cases we simply restart the guest. I have encountered this issue when running kernbench inside the guest. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
-
- 25 10月, 2017 1 次提交
-
-
由 Mark Rutland 提交于
locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 10月, 2017 6 次提交
-
-
由 Paolo Bonzini 提交于
The x86 MMU if full of code that returns 0 and 1 for retry/emulate. Use the existing RET_MMIO_PF_RETRY/RET_MMIO_PF_EMULATE enum, renaming it to drop the MMIO part. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
It has always annoyed me a bit how SVM_EXIT_NPF is handled by pf_interception. This is also the only reason behind the under-documented need_unprotect argument to kvm_handle_page_fault. Let NPF go straight to kvm_mmu_page_fault, just like VMX does in handle_ept_violation and handle_ept_misconfig. Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tim Hansen 提交于
Remove redundant null checks before calling kmem_cache_destroy. Found with make coccicheck M=arch/x86/kvm on linux-next tag next-20170929. Signed-off-by: NTim Hansen <devtimhansen@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Shakeel Butt 提交于
The kvm slabs can consume a significant amount of system memory and indeed in our production environment we have observed that a lot of machines are spending significant amount of memory that can not be left as system memory overhead. Also the allocations from these slabs can be triggered directly by user space applications which has access to kvm and thus a buggy application can leak such memory. So, these caches should be accounted to kmemcg. Signed-off-by: NShakeel Butt <shakeelb@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Hildenbrand 提交于
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Hildenbrand 提交于
Let's just drop the return. Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 10 10月, 2017 2 次提交
-
-
由 Ladi Prosek 提交于
is_last_gpte() is not equivalent to the pseudo-code given in commit 6bb69c9b ("KVM: MMU: simplify last_pte_bitmap") because an incorrect value of last_nonleaf_level may override the result even if level == 1. It is critical for is_last_gpte() to return true on level == 1 to terminate page walks. Otherwise memory corruption may occur as level is used as an index to various data structures throughout the page walking code. Even though the actual bug would be wherever the MMU is initialized (as in the previous patch), be defensive and ensure here that is_last_gpte() returns the correct value. This patch is also enough to fix CVE-2017-12188. Fixes: 6bb69c9b Cc: stable@vger.kernel.org Cc: Andy Honig <ahonig@google.com> Signed-off-by: NLadi Prosek <lprosek@redhat.com> [Panic if walk_addr_generic gets an incorrect level; this is a serious bug and it's not worth a WARN_ON where the recovery path might hide further exploitable issues; suggested by Andrew Honig. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ladi Prosek 提交于
The function updates context->root_level but didn't call update_last_nonleaf_level so the previous and potentially wrong value was used for page walks. For example, a zero value of last_nonleaf_level would allow a potential out-of-bounds access in arch/x86/mmu/paging_tmpl.h's walk_addr_generic function (CVE-2017-12188). Fixes: 155a97a3Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 10月, 2017 1 次提交
-
-
由 Boqun Feng 提交于
Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call schedule() to reschedule in some cases. This could result in accidentally ending the current RCU read-side critical section early, causing random memory corruption in the guest, or otherwise preempting the currently running task inside between preempt_disable and preempt_enable. The difficulty to handle this well is because we don't know whether an async PF delivered in a preemptible section or RCU read-side critical section for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock() are both no-ops in that case. To cure this, we treat any async PF interrupting a kernel context as one that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing the schedule() path in that case. To do so, a second parameter for kvm_async_pf_task_wait() is introduced, so that we know whether it's called from a context interrupting the kernel, and the parameter is set properly in all the callsites. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: NBoqun Feng <boqun.feng@gmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 26 8月, 2017 1 次提交
-
-
由 Brijesh Singh 提交于
The following commit: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM") uses __sme_clr() to remove the C-bit in rsvd_bits(). rsvd_bits() is just a simple function to return some 1 bits. Applying a mask based on properties of the host MMU is incorrect. Additionally, the masks computed by __reset_rsvds_bits_mask also apply to guest page tables, where the C bit is reserved since we don't emulate SME. The fix is to clear the C-bit from rsvd_bits_mask array after it has been populated from __reset_rsvds_bits_mask() Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: paolo.bonzini@gmail.com Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Link: http://lkml.kernel.org/r/20170825205540.123531-1-brijesh.singh@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 8月, 2017 3 次提交
-
-
由 Paolo Bonzini 提交于
update_permission_bitmask currently does a 128-iteration loop to, essentially, compute a constant array. Computing the 8 bits in parallel reduces it to 16 iterations, and is enough to speed it up substantially because many boolean operations in the inner loop become constants or simplify noticeably. Because update_permission_bitmask is actually the top item in the profile for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand clock cycles, or up to 30%: before after cpuid 35173 25954 vmcall 35122 27079 inl_from_pmtimer 52635 42675 inl_from_qemu 53604 44599 inl_from_kernel 38498 30798 outl_to_kernel 34508 28816 wr_tsc_adjust_msr 34185 26818 rd_tsc_adjust_msr 37409 27049 mmio-no-eventfd:pci-mem 50563 45276 mmio-wildcard-eventfd:pci-mem 34495 30823 mmio-datamatch-eventfd:pci-mem 35612 31071 portio-no-eventfd:pci-io 44925 40661 portio-wildcard-eventfd:pci-io 29708 27269 portio-datamatch-eventfd:pci-io 31135 27164 (I wrote a small C program to compare the tables for all values of CR0.WP, CR4.SMAP and CR4.SMEP, and they match). Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yu Zhang 提交于
Extends the shadow paging code, so that 5 level shadow page table can be constructed if VM is running in 5 level paging mode. Also extends the ept code, so that 5 level ept table can be constructed if maxphysaddr of VM exceeds 48 bits. Unlike the shadow logic, KVM should still use 4 level ept table for a VM whose physical address width is less than 48 bits, even when the VM is running in 5 level paging mode. Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com> [Unconditionally reset the MMU context in kvm_cpuid_update. Changing MAXPHYADDR invalidates the reserved bit bitmasks. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yu Zhang 提交于
Now we have 4 level page table and 5 level page table in 64 bits long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL, then we can use PT64_ROOT_5LEVEL for 5 level page table, it's helpful to make the code more clear. Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just redefine it to 5 whenever a replacement is needed for 5 level paging. Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 8月, 2017 3 次提交
-
-
由 Paolo Bonzini 提交于
There is currently some confusion between nested and L1 GPAs. The assignment to "direct" in kvm_mmu_page_fault tries to fix that, but it is not enough. What this patch does is fence off the MMIO cache completely when using shadow nested page tables, since we have neither a GVA nor an L1 GPA to put in the cache. This also allows some simplifications in kvm_mmu_page_fault and FNAME(page_fault). The EPT misconfig likewise does not have an L1 GPA to pass to kvm_io_bus_write, so that must be skipped for guest mode. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> [Changed comment to say "GPAs" instead of "L1's physical addresses", as per David's review. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Brijesh Singh 提交于
When a guest causes a page fault which requires emulation, the vcpu->arch.gpa_available flag is set to indicate that cr2 contains a valid GPA. Currently, emulator_read_write_onepage() makes use of gpa_available flag to avoid a guest page walk for a known MMIO regions. Lets not limit the gpa_available optimization to just MMIO region. The patch extends the check to avoid page walk whenever gpa_available flag is set. Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> [Fix EPT=0 according to Wanpeng Li's fix, plus ensure VMX also uses the new code. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> [Moved "ret < 0" to the else brach, as per David's review. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
Calling handle_mmio_page_fault() has been unnecessary since commit e9ee956e ("KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()", 2016-02-22). handle_mmio_page_fault() can now be made static. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 12 8月, 2017 2 次提交
-
-
由 Wanpeng Li 提交于
Bailing out immediately if there is no available mmu page to alloc. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089] irq event stamp: 20532 hardirqs last enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0 softirqs last enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1 softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100 CPU: 5 PID: 3089 Comm: warn_test Tainted: G OE 4.13.0-rc3+ #8 RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm] Call Trace: make_mmu_pages_available.isra.120+0x71/0xc0 [kvm] kvm_mmu_load+0x1cf/0x410 [kvm] kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 ? __this_cpu_preempt_check+0x13/0x20 This can be reproduced readily by ept=N and running syzkaller tests since many syzkaller testcases don't setup any memory regions. However, if ept=Y rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES which just hide the issue. I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1, so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails to zap any pages, however prepare_zap_oldest_mmu_page() always returns true. It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock softlockup. This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page() according to whether or not there is mmu page zapped. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 8月, 2017 2 次提交
-
-
由 Paolo Bonzini 提交于
This is the same as commit 14727754 ("kvm: svm: Add support for additional SVM NPF error codes", 2016-11-23), but for Intel processors. In this case, the exit qualification field's bit 8 says whether the EPT violation occurred while translating the guest's final physical address or rather while translating the guest page tables. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Brijesh Singh 提交于
Commit 14727754 ("kvm: svm: Add support for additional SVM NPF error codes", 2016-11-23) added a new error code to aid nested page fault handling. The commit unprotects (kvm_mmu_unprotect_page) the page when we get a NPF due to guest page table walk where the page was marked RO. However, if an L0->L2 shadow nested page table can also be marked read-only when a page is read only in L1's nested page table. If such a page is accessed by L2 while walking page tables it can cause a nested page fault (page table walks are write accesses). However, after kvm_mmu_unprotect_page we may get another page fault, and again in an endless stream. To cover this use case, we qualify the new error_code check with vcpu->arch.mmu_direct_map so that the error_code check would run on L1 guest, and not the L2 guest. This avoids hitting the above scenario. Fixes: 14727754 Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 8月, 2017 1 次提交
-
-
由 Radim Krčmář 提交于
This patch turns guest_cpuid_has_XYZ(cpuid) into guest_cpuid_has(cpuid, X86_FEATURE_XYZ), which gets rid of many very similar helpers. When seeing a X86_FEATURE_*, we can know which cpuid it belongs to, but this information isn't in common code, so we recreate it for KVM. Add some BUILD_BUG_ONs to make sure that it runs nicely. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 7月, 2017 1 次提交
-
-
由 Tom Lendacky 提交于
Update the KVM support to work with SME. The VMCB has a number of fields where physical addresses are used and these addresses must contain the memory encryption mask in order to properly access the encrypted memory. Also, use the memory encryption mask when creating and using the nested page tables. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/89146eccfa50334409801ff20acd52a90fb5efcf.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 7月, 2017 2 次提交
-
-
由 Wanpeng Li 提交于
Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1, async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0, kvm_can_do_async_pf returns 0 if in guest mode. This is similar to what svm.c wanted to do all along, but it is only enabled for Linux as L1 hypervisor. Foreign hypervisors must never receive async page faults as vmexits, because they'd probably be very confused about that. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
This patch adds the L1 guest async page fault #PF vmexit handler, such by L1 similar to ordinary async page fault. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> [Passed insn parameters to kvm_mmu_page_fault().] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 03 7月, 2017 4 次提交
-
-
由 Peter Feiner 提交于
EPT A/D was enabled in the vmcs02 EPTP regardless of the vmcs12's EPTP value. The problem is that enabling A/D changes the behavior of L2's x86 page table walks as seen by L1. With A/D enabled, x86 page table walks are always treated as EPT writes. Commit ae1e2d10 ("kvm: nVMX: support EPT accessed/dirty bits", 2017-03-30) tried to work around this problem by clearing the write bit in the exit qualification for EPT violations triggered by page walks. However, that fixup introduced the opposite bug: page-table walks that actually set x86 A/D bits were *missing* the write bit in the exit qualification. This patch fixes the problem by disabling EPT A/D in the shadow MMU when EPT A/D is disabled in vmcs12's EPTP. Signed-off-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Feiner 提交于
Adds the plumbing to disable A/D bits in the MMU based on a new role bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D aren't available (i.e., using access tracking faults instead). To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the place, A/D disablement is now stored in the SPTE. This state is stored in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access tracking. Rather than just setting SPTE_SPECIAL_MASK when an access-tracking SPTE is non-present, we now always set SPTE_SPECIAL_MASK for access-tracking SPTEs. Signed-off-by: NPeter Feiner <pfeiner@google.com> [Use role.ad_disabled even for direct (non-shadow) EPT page tables. Add documentation and a few MMU_WARN_ONs. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Feiner 提交于
Specify both a mask (i.e., bits to consider) and a value (i.e., pattern of bits that indicates a special PTE) for mmio SPTEs. On Intel, this lets us pack even more information into the (SPTE_SPECIAL_MASK | EPT_VMX_RWX_MASK) mask we use for access tracking liberating all (SPTE_SPECIAL_MASK | (non-misconfigured-RWX)) values. Signed-off-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Feiner 提交于
The MMU always has hardware A bits or access tracking support, thus it's unnecessary to handle the scenario where we have neither. Signed-off-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 6月, 2017 1 次提交
-
-
由 Wanpeng Li 提交于
INFO: task gnome-terminal-:1734 blocked for more than 120 seconds. Not tainted 4.12.0-rc4+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. gnome-terminal- D 0 1734 1015 0x00000000 Call Trace: __schedule+0x3cd/0xb30 schedule+0x40/0x90 kvm_async_pf_task_wait+0x1cc/0x270 ? __vfs_read+0x37/0x150 ? prepare_to_swait+0x22/0x70 do_async_page_fault+0x77/0xb0 ? do_async_page_fault+0x77/0xb0 async_page_fault+0x28/0x30 This is triggered by running both win7 and win2016 on L1 KVM simultaneously, and then gives stress to memory on L1, I can observed this hang on L1 when at least ~70% swap area is occupied on L0. This is due to async pf was injected to L2 which should be injected to L1, L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host actually), and L1 guest starts accumulating tasks stuck in D state in kvm_async_pf_task_wait() since missing PAGE_READY async_pfs. This patch fixes the hang by doing async pf when executing L1 guest. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 5月, 2017 1 次提交
-
-
由 Bandan Das 提交于
When KVM updates accessed/dirty bits, this hook can be used to invoke an arch specific function that implements/emulates dirty logging such as PML. Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 4月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
Now use bit 6 of EPTP to optionally enable A/D bits for EPTP. Another thing to change is that, when EPT accessed and dirty bits are not in use, VMX treats accesses to guest paging structures as data reads. When they are in use (bit 6 of EPTP is set), they are treated as writes and the corresponding EPT dirty bit is set. The MMU didn't know this detail, so this patch adds it. We also have to fix up the exit qualification. It may be wrong because KVM sets bit 6 but the guest might not. L1 emulates EPT A/D bits using write permissions, so in principle it may be possible for EPT A/D bits to be used by L1 even though not available in hardware. The problem is that guest page-table walks will be treated as reads rather than writes, so they would not cause an EPT violation. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Fixed typo in walk_addr_generic() comment and changed bit clear + conditional-set pattern in handle_ept_violation() to conditional-clear] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 02 3月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 2月, 2017 1 次提交
-
-
由 Masahiro Yamada 提交于
Fix typos and add the following to the scripts/spelling.txt: an user||a user an userspace||a userspace I also added "userspace" to the list since it is a common word in Linux. I found some instances for "an userfaultfd", but I did not add it to the list. I felt it is endless to find words that start with "user" such as "userland" etc., so must draw a line somewhere. Link: http://lkml.kernel.org/r/1481573103-11329-4-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 1月, 2017 4 次提交
-
-
由 Junaid Shahid 提交于
Before fast page fault restores an access track PTE back to a regular PTE, it now also verifies that the restored PTE would grant the necessary permissions for the faulting access to succeed. If not, it falls back to the slow page fault path. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Redo the page table walk in fast_page_fault when retrying so that we are working on the latest PTE even if the hierarchy changes. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Reword the comment to hopefully make it more clear. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Instead of the caller including the SPTE_SPECIAL_MASK in the masks being supplied to kvm_mmu_set_mmio_spte_mask() and kvm_mmu_set_mask_ptes(), those functions now themselves include the SPTE_SPECIAL_MASK. Note that bit 63 is now reset in the default MMIO mask. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 1月, 2017 1 次提交
-
-
由 Junaid Shahid 提交于
This change implements lockless access tracking for Intel CPUs without EPT A bits. This is achieved by marking the PTEs as not-present (but not completely clearing them) when clear_flush_young() is called after marking the pages as accessed. When an EPT Violation is generated as a result of the VM accessing those pages, the PTEs are restored to their original values. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-