提交 · a255740876f006eb9041fadcc4750557d26add5f · openeuler / Kernel

18 6月, 2021 4 次提交

KVM: x86/mmu: Add a field to control memslot rmap allocation · a2557408

由 Ben Gardon 提交于 5月 18, 2021

Add a field to control whether new memslots should have rmaps allocated
for them. As of this change, it's not safe to skip allocating rmaps, so
the field is always set to allocate rmaps. Future changes will make it
safe to operate without rmaps, using the TDP MMU. Then further changes
will allow the rmaps to be allocated lazily when needed for nested
oprtation.

No functional change expected.
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210518173414.450044-6-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a2557408

KVM: x86: Do not write protect huge page in initially-all-set mode · 89212919

由 Keqian Zhu 提交于 4月 29, 2021

Currently, when dirty logging is started in initially-all-set mode,
we write protect huge pages to prepare for splitting them into
4K pages, and leave normal pages untouched as the logging will
be enabled lazily as dirty bits are cleared.

However, enabling dirty logging lazily is also feasible for huge pages.
This not only reduces the time of start dirty logging, but it also
greatly reduces side-effect on guest when there is high dirty rate.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Message-Id: <20210429034115.35560-3-zhukeqian1@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

89212919

KVM: x86: Support write protecting only large pages · 3ad93562

由 Keqian Zhu 提交于 4月 29, 2021

Prepare for write protecting large page lazily during dirty log tracking,
for which we will only need to write protect gfns at large page
granularity.

No functional or performance change expected.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Message-Id: <20210429034115.35560-2-zhukeqian1@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ad93562

KVM: x86/mmu: Make is_nx_huge_page_enabled an inline function · a9d6496d

由 Shaokun Zhang 提交于 5月 27, 2021

Function 'is_nx_huge_page_enabled' is called only by kvm/mmu, so make
it as inline fucntion and remove the unnecessary declaration.

Cc: Ben Gardon <bgardon@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Message-Id: <1622102271-63107-1-git-send-email-zhangshaokun@hisilicon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9d6496d

11 6月, 2021 1 次提交

KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU · 654430ef

由 Sean Christopherson 提交于 6月 10, 2021

Calculate and check the full mmu_role when initializing the MMU context
for the nested MMU, where "full" means the bits and pieces of the role
that aren't handled by kvm_calc_mmu_role_common().  While the nested MMU
isn't used for shadow paging, things like the number of levels in the
guest's page tables are surprisingly important when walking the guest
page tables.  Failure to reinitialize the nested MMU context if L2's
paging mode changes can result in unexpected and/or missed page faults,
and likely other explosions.

E.g. if an L1 vCPU is running both a 32-bit PAE L2 and a 64-bit L2, the
"common" role calculation will yield the same role for both L2s.  If the
64-bit L2 is run after the 32-bit PAE L2, L0 will fail to reinitialize
the nested MMU context, ultimately resulting in a bad walk of L2's page
tables as the MMU will still have a guest root_level of PT32E_ROOT_LEVEL.

  WARNING: CPU: 4 PID: 167334 at arch/x86/kvm/vmx/vmx.c:3075 ept_save_pdptrs+0x15/0xe0 [kvm_intel]
  Modules linked in: kvm_intel]
  CPU: 4 PID: 167334 Comm: CPU 3/KVM Not tainted 5.13.0-rc1-d849817d5673-reqs #185
  Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
  RIP: 0010:ept_save_pdptrs+0x15/0xe0 [kvm_intel]
  Code: <0f> 0b c3 f6 87 d8 02 00f
  RSP: 0018:ffffbba702dbba00 EFLAGS: 00010202
  RAX: 0000000000000011 RBX: 0000000000000002 RCX: ffffffff810a2c08
  RDX: ffff91d7bc30acc0 RSI: 0000000000000011 RDI: ffff91d7bc30a600
  RBP: ffff91d7bc30a600 R08: 0000000000000010 R09: 0000000000000007
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff91d7bc30a600
  R13: ffff91d7bc30acc0 R14: ffff91d67c123460 R15: 0000000115d7e005
  FS:  00007fe8e9ffb700(0000) GS:ffff91d90fb00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000029f15a001 CR4: 00000000001726e0
  Call Trace:
   kvm_pdptr_read+0x3a/0x40 [kvm]
   paging64_walk_addr_generic+0x327/0x6a0 [kvm]
   paging64_gva_to_gpa_nested+0x3f/0xb0 [kvm]
   kvm_fetch_guest_virt+0x4c/0xb0 [kvm]
   __do_insn_fetch_bytes+0x11a/0x1f0 [kvm]
   x86_decode_insn+0x787/0x1490 [kvm]
   x86_decode_emulated_instruction+0x58/0x1e0 [kvm]
   x86_emulate_instruction+0x122/0x4f0 [kvm]
   vmx_handle_exit+0x120/0x660 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0xe25/0x1cb0 [kvm]
   kvm_vcpu_ioctl+0x211/0x5a0 [kvm]
   __x64_sys_ioctl+0x83/0xb0
   do_syscall_64+0x40/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Fixes: bf627a92 ("x86/kvm/mmu: check if MMU reconfiguration is needed in init_kvm_nested_mmu()")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210610220026.1364486-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

654430ef

07 5月, 2021 1 次提交

KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging · 03ca4589

由 Sean Christopherson 提交于 5月 05, 2021

Disallow loading KVM SVM if 5-level paging is supported.  In theory, NPT
for L1 should simply work, but there unknowns with respect to how the
guest's MAXPHYADDR will be handled by hardware.

Nested NPT is more problematic, as running an L1 VMM that is using
2-level page tables requires stacking single-entry PDP and PML4 tables in
KVM's NPT for L2, as there are no equivalent entries in L1's NPT to
shadow.  Barring hardware magic, for 5-level paging, KVM would need stack
another layer to handle PML5.

Opportunistically rename the lm_root pointer, which is used for the
aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to
call out that it's specifically for PML4.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210505204221.1934471-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03ca4589

20 4月, 2021 2 次提交

KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns · 4c6654bd

由 Ben Gardon 提交于 4月 01, 2021

To avoid saddling a vCPU thread with the work of tearing down an entire
paging structure, take a reference on each root before they become
obsolete, so that the thread initiating the fast invalidation can tear
down the paging structure and (most likely) release the last reference.
As a bonus, this teardown can happen under the MMU lock in read mode so
as not to block the progress of vCPU threads.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-14-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c6654bd

KVM: x86/mmu: Fast invalidation for TDP MMU · b7cccd39

由 Ben Gardon 提交于 4月 01, 2021

Provide a real mechanism for fast invalidation by marking roots as
invalid so that their reference count will quickly fall to zero
and they will be torn down.

One negative side affect of this approach is that a vCPU thread will
likely drop the last reference to a root and be saddled with the work of
tearing down an entire paging structure. This issue will be resolved in
a later commit.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-13-bgardon@google.com>
[Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b7cccd39

19 4月, 2021 6 次提交

KVM: x86/mmu: Allow enabling/disabling dirty logging under MMU read lock · 24ae4cfa

由 Ben Gardon 提交于 4月 01, 2021

To reduce lock contention and interference with page fault handlers,
allow the TDP MMU functions which enable and disable dirty logging
to operate under the MMU read lock.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-12-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

24ae4cfa

KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU read lock · 2db6f772

由 Ben Gardon 提交于 4月 01, 2021

To reduce the impact of disabling dirty logging, change the TDP MMU
function which zaps collapsible SPTEs to run under the MMU read lock.
This way, page faults on zapped SPTEs can proceed in parallel with
kvm_mmu_zap_collapsible_sptes.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-11-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2db6f772

KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock · 6103bc07

由 Ben Gardon 提交于 4月 01, 2021

To reduce lock contention and interference with page fault handlers,
allow the TDP MMU function to zap a GFN range to operate under the MMU
read lock.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-10-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6103bc07

KVM: x86/mmu: Merge TDP MMU put and free root · 2bdb3d84

由 Ben Gardon 提交于 4月 01, 2021

kvm_tdp_mmu_put_root and kvm_tdp_mmu_free_root are always called
together, so merge the functions to simplify TDP MMU root refcounting /
freeing.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-5-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bdb3d84

KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMU · 76eb54e7

由 Ben Gardon 提交于 4月 01, 2021

The TDP MMU is almost the only user of kvm_mmu_get_root and
kvm_mmu_put_root. There is only one use of put_root in mmu.c for the
legacy / shadow MMU. Open code that one use and move the get / put
functions to the TDP MMU so they can be extended in future commits.

No functional change intended.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-3-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

76eb54e7

KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes · 8ca6f063

由 Ben Gardon 提交于 4月 01, 2021

kvm_tdp_mmu_zap_collapsible_sptes unnecessarily removes the const
qualifier from its memlsot argument, leading to a compiler warning. Add
the const annotation and pass it to subsequent functions.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-2-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8ca6f063

17 4月, 2021 9 次提交

KVM: Move x86's MMU notifier memslot walkers to generic code · 3039bcc7

由 Sean Christopherson 提交于 4月 01, 2021

Move the hva->gfn lookup for MMU notifiers into common code.  Every arch
does a similar lookup, and some arch code is all but identical across
multiple architectures.

In addition to consolidating code, this will allow introducing
optimizations that will benefit all architectures without incurring
multiple walks of the memslots, e.g. by taking mmu_lock if and only if a
relevant range exists in the memslots.

The use of __always_inline to avoid indirect call retpolines, as done by
x86, may also benefit other architectures.

Consolidating the lookups also fixes a wart in x86, where the legacy MMU
and TDP MMU each do their own memslot walks.

Lastly, future enhancements to the memslot implementation, e.g. to add an
interval tree to track host address, will need to touch far less arch
specific code.

MIPS, PPC, and arm64 will be converted one at a time in future patches.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3039bcc7

KVM: constify kvm_arch_flush_remote_tlbs_memslot · 6c9dd6d2

由 Paolo Bonzini 提交于 4月 02, 2021

memslots are stored in RCU and there should be no need to
change them.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c9dd6d2

KVM: MMU: protect TDP MMU pages only down to required level · dbb6964e

由 Paolo Bonzini 提交于 4月 02, 2021

When using manual protection of dirty pages, it is not necessary
to protect nested page tables down to the 4K level; instead KVM
can protect only hugepages in order to split them lazily, and
delay write protection at 4K-granularity until KVM_CLEAR_DIRTY_LOG.
This was overlooked in the TDP MMU, so do it there as well.

Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dbb6964e

KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint · 6dfbd6b5

由 Sean Christopherson 提交于 3月 25, 2021

Remove x86's trace_kvm_age_page() tracepoint.  It's mostly redundant with
the common trace_kvm_age_hva() tracepoint, and if there is a need for the
extra details, e.g. gfn, referenced, etc... those details should be added
to the common tracepoint so that all architectures and MMUs benefit from
the info.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-19-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6dfbd6b5

KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range() · 2b9663d8

由 Sean Christopherson 提交于 3月 25, 2021

Pass the address space ID to TDP MMU's primary "zap gfn range" helper to
allow the MMU notifier paths to iterate over memslots exactly once.
Currently, both the legacy MMU and TDP MMU iterate over memslots when
looking for an overlapping hva range, which can be quite costly if there
are a large number of memslots.

Add a "flush" parameter so that iterating over multiple address spaces
in the caller will continue to do the right thing when yielding while a
flush is pending from a previous address space.

Note, this also has a functional change in the form of coalescing TLB
flushes across multiple address spaces in kvm_zap_gfn_range(), and also
optimizes the TDP MMU to utilize range-based flushing when running as L1
with Hyper-V enlightenments.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-6-seanjc@google.com>
[Keep separate for loops to prepare for other incoming patches. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b9663d8

KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zap · 1a61b7db

由 Sean Christopherson 提交于 3月 25, 2021

Gather pending TLB flushes across both address spaces when zapping a
given gfn range.  This requires feeding "flush" back into subsequent
calls, but on the plus side sets the stage for further batching
between the legacy MMU and TDP MMU.  It also allows refactoring the
address space iteration to cover the legacy and TDP MMUs without
introducing truly ugly code.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a61b7db

KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs · 142ccde1

由 Sean Christopherson 提交于 3月 25, 2021

Gather pending TLB flushes across both the legacy and TDP MMUs when
zapping collapsible SPTEs to avoid multiple flushes if both the legacy
MMU (for nested guests) and TDP MMU have mappings for the memslot.

Note, this also optimizes the TDP MMU to flush only the relevant range
when running as L1 with Hyper-V enlightenments.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

142ccde1

KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMU · 302695a5

由 Sean Christopherson 提交于 3月 25, 2021

Place the onus on the caller of slot_handle_*() to flush the TLB, rather
than handling the flush in the helper, and rename parameters accordingly.
This will allow future patches to coalesce flushes between address spaces
and between the legacy and TDP MMUs.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

302695a5

KVM: MMU: load PDPTRs outside mmu_lock · 4a38162e

由 Paolo Bonzini 提交于 4月 08, 2021

On SVM, reading PDPTRs might access guest memory, which might fault
and thus might sleep.  On the other hand, it is not possible to
release the lock after make_mmu_pages_available has been called.

Therefore, push the call to make_mmu_pages_available and the
mmu_lock critical section within mmu_alloc_direct_roots and
mmu_alloc_shadow_roots.
Reported-by: NWanpeng Li <wanpengli@tencent.com>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a38162e

08 4月, 2021 1 次提交

KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp · 315f02c6

由 Paolo Bonzini 提交于 4月 06, 2021

Right now, if a call to kvm_tdp_mmu_zap_sp returns false, the caller
will skip the TLB flush, which is wrong.  There are two ways to fix
it:

- since kvm_tdp_mmu_zap_sp will not yield and therefore will not flush
  the TLB itself, we could change the call to kvm_tdp_mmu_zap_sp to
  use "flush |= ..."

- or we can chain the flush argument through kvm_tdp_mmu_zap_sp down
  to __kvm_tdp_mmu_zap_gfn_range.  Note that kvm_tdp_mmu_zap_sp will
  neither yield nor flush, so flush would never go from true to
  false.

This patch does the former to simplify application to stable kernels,
and to make it further clearer that kvm_tdp_mmu_zap_sp will not flush.

Cc: seanjc@google.com
Fixes: 048f4980 ("KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping")
Cc: <stable@vger.kernel.org> # 5.10.x: 048f4980: KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
Cc: <stable@vger.kernel.org> # 5.10.x: 33a31641: KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
Cc: <stable@vger.kernel.org>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

315f02c6

31 3月, 2021 2 次提交

KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641

由 Sean Christopherson 提交于 3月 25, 2021

Prevent the TDP MMU from yielding when zapping a gfn range during NX
page recovery.  If a flush is pending from a previous invocation of the
zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
has not accumulated a flush for the current invocation, then yielding
will release mmu_lock with stale TLB entries.

That being said, this isn't technically a bug fix in the current code, as
the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
will yield if and only if it has made forward progress, as defined by the
current gfn vs. the last yielded (or starting) gfn.  Because zapping a
single shadow page is guaranteed to (a) find that page and (b) step
sideways at the level of the shadow page, the TDP iter will break its loop
before getting a chance to yield.

But that is all very, very subtle, and will break at the slightest sneeze,
e.g. zapping while holding mmu_lock for read would break as the TDP MMU
wouldn't be guaranteed to see the present shadow page, and thus could step
sideways at a lower level.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-4-seanjc@google.com>
[Add lockdep assertion. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33a31641

KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping · 048f4980

由 Sean Christopherson 提交于 3月 25, 2021

Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
does the flush itself if and only if it yields (which it will never do in
this particular scenario), and otherwise expects the caller to do the
flush.  If pages are zapped from the TDP MMU but not the legacy MMU, then
no flush will occur.

Fixes: 29cf0f50 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-3-seanjc@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

048f4980

18 3月, 2021 1 次提交

x86: Fix various typos in comments · d9f6e12f

由 Ingo Molnar 提交于 3月 18, 2021

Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org

d9f6e12f

15 3月, 2021 13 次提交

KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging · 4a98623d

由 Sean Christopherson 提交于 3月 09, 2021

Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging.  Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU.  The extra documentation is nice to have.

Note, there are several subtleties at play with NPT.  In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT.  However, 32-bit Linux, and thus KVM, doesn't support SME.  And
64-bit KVM can happily set the C-bit in CR3.  This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.

Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210309224207.1218275-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a98623d

KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root · c834e5e4

由 Sean Christopherson 提交于 3月 09, 2021

Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
Unlike root_hpa, the pae_roots hold permission bits and thus are
guaranteed to be non-zero.  Having to deal with both values leads to
bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
value, etc...
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210309224207.1218275-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c834e5e4

KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE · bb4cdf3a

由 Sean Christopherson 提交于 2月 25, 2021

Debugging unexpected reserved bit page faults sucks.  Dump the reserved
bits that (likely) caused the page fault to make debugging suck a little
less.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-25-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb4cdf3a

KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic · 5fc3424f

由 Sean Christopherson 提交于 2月 25, 2021

Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for
a given KVM instance.  This will allow EPT to use high available bits,
which in turn will free up bit 11 for a constant MMU_PRESENT bit.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-19-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5fc3424f

KVM: x86/mmu: Co-locate code for setting various SPTE masks · d6b87f25

由 Sean Christopherson 提交于 2月 25, 2021

Squish all the code for (re)setting the various SPTE masks into one
location.  With the split code, it's not at all clear that the masks are
set once during module initialization.  This will allow a future patch to
clean up initialization of the masks without shuffling code all over
tarnation.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-17-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6b87f25

KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU proper · ec761cfd

由 Sean Christopherson 提交于 2月 25, 2021

Move kvm_mmu_set_mask_ptes() into mmu.c as prep for future cleanup of the
mask initialization code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-16-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec761cfd

KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs · 8120337a

由 Sean Christopherson 提交于 2月 25, 2021

Stop tagging MMIO SPTEs with specific available bits and instead detect
MMIO SPTEs by checking for their unique SPTE value. The value is
guaranteed to be unique on shadow paging and NPT as setting reserved
physical address bits on any other type of SPTE would consistute a KVM
bug. Ditto for EPT, as creating a WX non-MMIO would also be a bug.

Note, this approach is also future-compatibile with TDX, which will need
to reflect MMIO EPT violations as #VEs into the guest. To create an EPT
violation instead of a misconfig, TDX EPTs will need to have RWX=0, But,
MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so
MMIO SPTEs will still be guaranteed to have a unique value within a given
MMU context.

The main motivation is to make it easier to reason about which types of
SPTEs use which available bits. As a happy side effect, this frees up
two more bits for storing the MMIO generation.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8120337a

KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers · c236d962

由 Sean Christopherson 提交于 2月 25, 2021

The value returned by make_mmio_spte() is a SPTE, it is not a mask.
Name it accordingly.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c236d962

KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte() · a54aa15c

由 Sean Christopherson 提交于 2月 25, 2021

Now that it should be impossible to convert a valid SPTE to an MMIO SPTE,
handle MMIO SPTEs early in mmu_set_spte() without going through
set_spte() and all the logic for removing an existing, valid SPTE.
The other caller of set_spte(), FNAME(sync_page)(), explicitly handles
MMIO SPTEs prior to calling set_spte().

This simplifies mmu_set_spte() and set_spte(), and also "fixes" an oddity
where MMIO SPTEs are traced by both trace_kvm_mmu_set_spte() and
trace_mark_mmio_spte().

Note, mmu_spte_set() will WARN if this new approach causes KVM to create
an MMIO SPTE overtop a valid SPTE.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a54aa15c

KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is disabled · 30ab5901

由 Sean Christopherson 提交于 2月 25, 2021

If MMIO caching is disabled, e.g. when using shadow paging on CPUs with
52 bits of PA space, go straight to MMIO emulation and don't install an
MMIO SPTE.  The SPTE will just generate a !PRESENT #PF, i.e. can't
actually accelerate future MMIO.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

30ab5901

KVM: x86/mmu: Retry page faults that hit an invalid memslot · e0c37868

由 Sean Christopherson 提交于 2月 25, 2021

Retry page faults (re-enter the guest) that hit an invalid memslot
instead of treating the memslot as not existing, i.e. handling the
page fault as an MMIO access.  When deleting a memslot, SPTEs aren't
zapped and the TLBs aren't flushed until after the memslot has been
marked invalid.

Handling the invalid slot as MMIO means there's a small window where a
page fault could replace a valid SPTE with an MMIO SPTE.  The legacy
MMU handles such a scenario cleanly, but the TDP MMU assumes such
behavior is impossible (see the BUG() in __handle_changed_spte()).
There's really no good reason why the legacy MMU should allow such a
scenario, and closing this hole allows for additional cleanups.

Fixes: 2f2fad08 ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0c37868

KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not shadow-present · ec89e643

由 Sean Christopherson 提交于 2月 25, 2021

Bail from fast_page_fault() if the SPTE is not a shadow-present SPTE.
Functionally, this is not strictly necessary as the !is_access_allowed()
check will eventually reject the fast path, but an early check on
shadow-present skips unnecessary checks and will allow a future patch to
tweak the A/D status auditing to warn if KVM attempts to query A/D bits
without first ensuring the SPTE is a shadow-present SPTE.

Note, is_shadow_present_pte() is quite expensive at this time, i.e. this
might be a net negative in the short term. A future patch will optimize
is_shadow_present_pte() to a single AND operation and remedy the issue.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec89e643

KVM: x86/mmu: Add typedefs for rmap/iter handlers · c1b91493

由 Sean Christopherson 提交于 2月 25, 2021

Add typedefs for the MMU handlers that are invoked when walking the MMU
SPTEs (rmaps in legacy MMU) to act on a host virtual address range.

No functional change intended.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210226010329.1766033-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c1b91493

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功