提交 · 42bcbebf11dff0c2501831f1d86655eee506240f · openanolis / cloud-kernel

12 8月, 2017 1 次提交

KVM: MMU: Fix softlockup due to mmu_lock is held too long · 42bcbebf

由 Wanpeng Li 提交于 8月 10, 2017

watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089]
 irq event stamp: 20532
 hardirqs last  enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d
 hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0
 softirqs last  enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1
 softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100
 CPU: 5 PID: 3089 Comm: warn_test Tainted: G           OE   4.13.0-rc3+ #8
 RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm]
 Call Trace:
  make_mmu_pages_available.isra.120+0x71/0xc0 [kvm]
  kvm_mmu_load+0x1cf/0x410 [kvm]
  kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2
  ? __this_cpu_preempt_check+0x13/0x20

This can be reproduced readily by ept=N and running syzkaller tests since
many syzkaller testcases don't setup any memory regions. However, if ept=Y
rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will
extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES
which just hide the issue.

I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1,
so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails
to zap any pages, however prepare_zap_oldest_mmu_page() always returns true.
It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock
softlockup.

This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page()
according to whether or not there is mmu page zapped.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42bcbebf

10 8月, 2017 2 次提交

kvm: nVMX: Add support for fast unprotection of nested guest page tables · eebed243

由 Paolo Bonzini 提交于 11月 28, 2016

This is the same as commit 14727754 ("kvm: svm: Add support for
additional SVM NPF error codes", 2016-11-23), but for Intel processors.
In this case, the exit qualification field's bit 8 says whether the
EPT violation occurred while translating the guest's final physical
address or rather while translating the guest page tables.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eebed243

KVM: SVM: Limit PFERR_NESTED_GUEST_PAGE error_code check to L1 guest · 64531a3b

由 Brijesh Singh 提交于 8月 07, 2017

Commit 14727754 ("kvm: svm: Add support for additional SVM NPF error
codes", 2016-11-23) added a new error code to aid nested page fault
handling.  The commit unprotects (kvm_mmu_unprotect_page) the page when
we get a NPF due to guest page table walk where the page was marked RO.

However, if an L0->L2 shadow nested page table can also be marked read-only
when a page is read only in L1's nested page table.  If such a page
is accessed by L2 while walking page tables it can cause a nested
page fault (page table walks are write accesses).  However, after
kvm_mmu_unprotect_page we may get another page fault, and again in an
endless stream.

To cover this use case, we qualify the new error_code check with
vcpu->arch.mmu_direct_map so that the error_code check would run on L1
guest, and not the L2 guest.  This avoids hitting the above scenario.

Fixes: 14727754
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64531a3b

07 8月, 2017 1 次提交

KVM: x86: generalize guest_cpuid_has_ helpers · d6321d49

由 Radim Krčmář 提交于 8月 05, 2017

This patch turns guest_cpuid_has_XYZ(cpuid) into guest_cpuid_has(cpuid,
X86_FEATURE_XYZ), which gets rid of many very similar helpers.

When seeing a X86_FEATURE_*, we can know which cpuid it belongs to, but
this information isn't in common code, so we recreate it for KVM.

Add some BUILD_BUG_ONs to make sure that it runs nicely.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6321d49

14 7月, 2017 2 次提交

KVM: async_pf: Let guest support delivery of async_pf from guest mode · 52a5c155

由 Wanpeng Li 提交于 7月 13, 2017

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.

This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor.  Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

52a5c155

KVM: async_pf: Add L1 guest async_pf #PF vmexit handler · 1261bfa3

由 Wanpeng Li 提交于 7月 13, 2017

This patch adds the L1 guest async page fault #PF vmexit handler, such
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Passed insn parameters to kvm_mmu_page_fault().]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1261bfa3

03 7月, 2017 4 次提交

x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 · 995f00a6

由 Peter Feiner 提交于 6月 30, 2017

EPT A/D was enabled in the vmcs02 EPTP regardless of the vmcs12's EPTP
value. The problem is that enabling A/D changes the behavior of L2's
x86 page table walks as seen by L1. With A/D enabled, x86 page table
walks are always treated as EPT writes.

Commit ae1e2d10 ("kvm: nVMX: support EPT accessed/dirty bits",
2017-03-30) tried to work around this problem by clearing the write
bit in the exit qualification for EPT violations triggered by page
walks.  However, that fixup introduced the opposite bug: page-table walks
that actually set x86 A/D bits were *missing* the write bit in the exit
qualification.

This patch fixes the problem by disabling EPT A/D in the shadow MMU
when EPT A/D is disabled in vmcs12's EPTP.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

995f00a6

kvm: x86: mmu: allow A/D bits to be disabled in an mmu · ac8d57e5

由 Peter Feiner 提交于 6月 30, 2017

Adds the plumbing to disable A/D bits in the MMU based on a new role
bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D
aren't available (i.e., using access tracking faults instead).

To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the
place, A/D disablement is now stored in the SPTE. This state is stored
in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access
tracking. Rather than just setting SPTE_SPECIAL_MASK when an
access-tracking SPTE is non-present, we now always set
SPTE_SPECIAL_MASK for access-tracking SPTEs.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
[Use role.ad_disabled even for direct (non-shadow) EPT page tables.  Add
 documentation and a few MMU_WARN_ONs. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ac8d57e5

x86: kvm: mmu: make spte mmio mask more explicit · dcdca5fe

由 Peter Feiner 提交于 6月 30, 2017

Specify both a mask (i.e., bits to consider) and a value (i.e.,
pattern of bits that indicates a special PTE) for mmio SPTEs. On
Intel, this lets us pack even more information into the
(SPTE_SPECIAL_MASK | EPT_VMX_RWX_MASK) mask we use for access
tracking liberating all (SPTE_SPECIAL_MASK | (non-misconfigured-RWX))
values.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dcdca5fe

x86: kvm: mmu: dead code thanks to access tracking · ce00053b

由 Peter Feiner 提交于 6月 30, 2017

The MMU always has hardware A bits or access tracking support, thus
it's unnecessary to handle the scenario where we have neither.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce00053b

11 6月, 2017 1 次提交

KVM: async_pf: avoid async pf injection when in guest mode · 9bc1f09f

由 Wanpeng Li 提交于 6月 08, 2017

 INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
       Not tainted 4.12.0-rc4+ #8
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gnome-terminal- D    0  1734   1015 0x00000000
 Call Trace:
  __schedule+0x3cd/0xb30
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? __vfs_read+0x37/0x150
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patch fixes the hang by doing async pf when executing L1 guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9bc1f09f

09 5月, 2017 1 次提交

kvm: x86: Add a hook for arch specific dirty logging emulation · bab4165e

由 Bandan Das 提交于 5月 05, 2017

When KVM updates accessed/dirty bits, this hook can be used
to invoke an arch specific function that implements/emulates
dirty logging such as PML.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bab4165e

07 4月, 2017 1 次提交

kvm: nVMX: support EPT accessed/dirty bits · ae1e2d10

由 Paolo Bonzini 提交于 3月 30, 2017

Now use bit 6 of EPTP to optionally enable A/D bits for EPTP.  Another
thing to change is that, when EPT accessed and dirty bits are not in use,
VMX treats accesses to guest paging structures as data reads.  When they
are in use (bit 6 of EPTP is set), they are treated as writes and the
corresponding EPT dirty bit is set.  The MMU didn't know this detail,
so this patch adds it.

We also have to fix up the exit qualification.  It may be wrong because
KVM sets bit 6 but the guest might not.

L1 emulates EPT A/D bits using write permissions, so in principle it may
be possible for EPT A/D bits to be used by L1 even though not available
in hardware.  The problem is that guest page-table walks will be treated
as reads rather than writes, so they would not cause an EPT violation.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Fixed typo in walk_addr_generic() comment and changed bit clear +
 conditional-set pattern in handle_ept_violation() to conditional-clear]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

ae1e2d10

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> · 3f07c014

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3f07c014

28 2月, 2017 1 次提交

scripts/spelling.txt: add "an user" pattern and fix typo instances · 9332ef9d

由 Masahiro Yamada 提交于 2月 27, 2017

Fix typos and add the following to the scripts/spelling.txt:

an user||a user
an userspace||a userspace

I also added "userspace" to the list since it is a common word in Linux.
I found some instances for "an userfaultfd", but I did not add it to the
list. I felt it is endless to find words that start with "user" such as
"userland" etc., so must draw a line somewhere.

Link: http://lkml.kernel.org/r/1481573103-11329-4-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9332ef9d

27 1月, 2017 4 次提交

kvm: x86: mmu: Verify that restored PTE has needed perms in fast page fault · d3e328f2

由 Junaid Shahid 提交于 12月 21, 2016

Before fast page fault restores an access track PTE back to a regular PTE,
it now also verifies that the restored PTE would grant the necessary
permissions for the faulting access to succeed. If not, it falls back
to the slow page fault path.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d3e328f2

kvm: x86: mmu: Move pgtbl walk inside retry loop in fast_page_fault · d162f30a

由 Junaid Shahid 提交于 12月 21, 2016

Redo the page table walk in fast_page_fault when retrying so that we are
working on the latest PTE even if the hierarchy changes.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d162f30a

kvm: x86: mmu: Update comment in mark_spte_for_access_track · 20d65236

由 Junaid Shahid 提交于 12月 21, 2016

Reword the comment to hopefully make it more clear.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

20d65236

kvm: x86: mmu: Set SPTE_SPECIAL_MASK within mmu.c · 312b616b

由 Junaid Shahid 提交于 12月 21, 2016

Instead of the caller including the SPTE_SPECIAL_MASK in the masks being
supplied to kvm_mmu_set_mmio_spte_mask() and kvm_mmu_set_mask_ptes(),
those functions now themselves include the SPTE_SPECIAL_MASK.

Note that bit 63 is now reset in the default MMIO mask.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

312b616b

09 1月, 2017 7 次提交

kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits. · f160c7b7