提交 · a2855afc7ee88475e8feb16840b23f787bfc994d · openeuler / Kernel

04 2月, 2021 7 次提交

KVM: x86/mmu: Allow parallel page faults for the TDP MMU · a2855afc

由 Ben Gardon 提交于 2月 02, 2021

Make the last few changes necessary to enable the TDP MMU to handle page
faults in parallel while holding the mmu_lock in read mode.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-24-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a2855afc

KVM: x86/mmu: Use an rwlock for the x86 MMU · 531810ca

由 Ben Gardon 提交于 2月 02, 2021

Add a read / write lock to be used in place of the MMU spinlock on x86.
The rwlock will enable the TDP MMU to handle page faults, and other
operations in parallel in future commits.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-19-bgardon@google.com>
[Introduce virt/kvm/mmu_lock.h - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

531810ca

KVM: x86/mmu: Fix braces in kvm_recover_nx_lpages · 8d1a182e

由 Ben Gardon 提交于 2月 02, 2021

No functional change intended.

Fixes: 29cf0f50 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-10-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8d1a182e

KVM: x86/mmu: Add '__func__' in rmap_printk() · 805a0f83

由 Stephen Zhang 提交于 1月 27, 2021

Given the common pattern:

rmap_printk("%s:"..., __func__,...)

we could improve this by adding '__func__' in rmap_printk().
Signed-off-by: NStephen Zhang <stephenzhangzsd@gmail.com>
Message-Id: <1611713325-3591-1-git-send-email-stephenzhangzsd@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

805a0f83

KVM: x86: use static calls to reduce kvm_x86_ops overhead · b3646477

由 Jason Baron 提交于 1月 14, 2021

Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are
covered here except for 'pmu_ops and 'nested ops'.

Here are some numbers running cpuid in a loop of 1 million calls averaged
over 5 runs, measured in the vm (lower is better).

Intel Xeon 3000MHz:

           |default    |mitigations=off
-------------------------------------
vanilla    |.671s      |.486s
static call|.573s(-15%)|.458s(-6%)

AMD EPYC 2500MHz:

           |default    |mitigations=off
-------------------------------------
vanilla    |.710s      |.609s
static call|.664s(-6%) |.609s(0%)

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: NJason Baron <jbaron@akamai.com>
Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b3646477

KVM: x86/mmu: Remove the defunct update_pte() paging hook · c5e2184d

由 Sean Christopherson 提交于 1月 14, 2021

Remove the update_pte() shadow paging logic, which was obsoleted by
commit 4731d4c7 ("KVM: MMU: out of sync shadow core"), but never
removed.  As pointed out by Yu, KVM never write protects leaf page
tables for the purposes of shadow paging, and instead marks their
associated shadow page as unsync so that the guest can write PTEs at
will.

The update_pte() path, which predates the unsync logic, optimizes COW
scenarios by refreshing leaf SPTEs when they are written, as opposed to
zapping the SPTE, restarting the guest, and installing the new SPTE on
the subsequent fault.  Since KVM no longer write-protects leaf page
tables, update_pte() is unreachable and can be dropped.
Reported-by: NYu Zhang <yu.c.zhang@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210115004051.4099250-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5e2184d

KVM: x86: Zap the oldest MMU pages, not the newest · 8fc51726

由 Sean Christopherson 提交于 1月 13, 2021

Walk the list of MMU pages in reverse in kvm_mmu_zap_oldest_mmu_pages().
The list is FIFO, meaning new pages are inserted at the head and thus
the oldest pages are at the tail.  Using a "forward" iterator causes KVM
to zap MMU pages that were just added, which obliterates guest
performance once the max number of shadow MMU pages is reached.

Fixes: 6b82ef2c ("KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages")
Reported-by: NZdenek Kaspar <zkaspar82@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210113205030.3481307-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8fc51726

08 1月, 2021 4 次提交

KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte() · 9aa41879

由 Sean Christopherson 提交于 12月 17, 2020

Check only the terminal leaf for a "!PRESENT || MMIO" SPTE when looking
for reserved bits on valid, non-MMIO SPTEs. The get_walk() helpers
terminate their walks if a not-present or MMIO SPTE is encountered, i.e.
the non-terminal SPTEs have already been verified to be regular SPTEs.
This eliminates an extra check-and-branch in a relatively hot loop.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9aa41879

KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array · dde81f94

由 Sean Christopherson 提交于 12月 17, 2020

Bump the size of the sptes array by one and use the raw level of the
SPTE to index into the sptes array.  Using the SPTE level directly
improves readability by eliminating the need to reason out why the level
is being adjusted when indexing the array.  The array is on the stack
and is not explicitly initialized; bumping its size is nothing more than
a superficial adjustment to the stack frame.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-4-seanjc@google.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dde81f94

KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE · 39b4d43e

由 Sean Christopherson 提交于 12月 17, 2020

Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root".  Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk().  This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.

Opportunistically nuke a few extra newlines.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: NRichard Herbert <rherbert@sympatico.ca>
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

39b4d43e

KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() · 2aa07893

由 Sean Christopherson 提交于 12月 17, 2020

Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR.  Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data).  In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.

Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2aa07893

28 11月, 2020 1 次提交

kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT · 9a2a0d3c

由 Vitaly Kuznetsov 提交于 11月 26, 2020

Commit 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused
the following WARNING on an Intel Ice Lake CPU:

 get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy:
 ------ spte 0xb80a0 level 5.
 ------ spte 0xfcd210107 level 4.
 ------ spte 0x1004c40107 level 3.
 ------ spte 0x1004c41107 level 2.
 ------ spte 0x1db00000000b83b6 level 1.
 WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm]
...
 Call Trace:
  ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm]
  vcpu_enter_guest+0xaa1/0x16a0 [kvm]
  ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel]
  ? skip_emulated_instruction+0xaa/0x150 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm]

The guest triggering this crashes. Note, this happens with the traditional
MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
there was a subtle change in the above mentioned commit. Previously,
walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
which is returned by shadow_walk_init() and this equals to
'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
'int root = vcpu->arch.mmu->root_level'.

The difference between 'root_level' and 'shadow_root_level' on CPUs
supporting 5-level page tables is that in some case we don't want to
use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48'
kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used,
the corresponding SPTE will fail '__is_rsvd_bits_set()' check.

Revert to using 'shadow_root_level'.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a2a0d3c

15 11月, 2020 2 次提交

KVM: Don't allocate dirty bitmap if dirty ring is enabled · 044c59c4

由 Peter Xu 提交于 9月 30, 2020

Because kvm dirty rings and kvm dirty log is used in an exclusive way,
Let's avoid creating the dirty_bitmap when kvm dirty ring is enabled.
At the meantime, since the dirty_bitmap will be conditionally created
now, we can't use it as a sign of "whether this memory slot enabled
dirty tracking".  Change users like that to check against the kvm
memory slot flags.

Note that there still can be chances where the kvm memory slot got its
dirty_bitmap allocated, _if_ the memory slots are created before
enabling of the dirty rings and at the same time with the dirty
tracking capability enabled, they'll still with the dirty_bitmap.
However it should not hurt much (e.g., the bitmaps will always be
freed if they are there), and the real users normally won't trigger
this because dirty bit tracking flag should in most cases only be
applied to kvm slots only before migration starts, that should be far
latter than kvm initializes (VM starts).
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201001012226.5868-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

044c59c4

KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed

由 Peter Xu 提交于 9月 30, 2020

This patch is heavily based on previous work from Lei Cao
<lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]

KVM currently uses large bitmaps to track dirty memory. These bitmaps
are copied to userspace when userspace queries KVM for its dirty page
information. The use of bitmaps is mostly sufficient for live
migration, as large parts of memory are be dirtied from one log-dirty
pass to another. However, in a checkpointing system, the number of
dirty pages is small and in fact it is often bounded---the VM is
paused when it has dirtied a pre-defined number of pages. Traversing a
large, sparsely populated bitmap to find set bits is time-consuming,
as is copying the bitmap to user-space.

A similar issue will be there for live migration when the guest memory
is huge while the page dirty procedure is trivial. In that case for
each dirty sync we need to pull the whole dirty bitmap to userspace
and analyse every bit even if it's mostly zeros.

The preferred data structure for above scenarios is a dense list of
guest frame numbers (GFN). This patch series stores the dirty list in
kernel memory that can be memory mapped into userspace to allow speedy
harvesting.

This patch enables dirty ring for X86 only. However it should be
easily extended to other archs as well.

[1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: NLei Cao <lei.cao@stratus.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201001012222.5767-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb04a1ed

08 11月, 2020 1 次提交

KVM: x86/mmu: fix counting of rmap entries in pte_list_add · c6c4f961

由 Li RongQing 提交于 9月 27, 2020

Fix an off-by-one style bug in pte_list_add() where it failed to
account the last full set of SPTEs, i.e. when desc->sptes is full
and desc->more is NULL.

Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid
an extra comparison.
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <1601196297-24104-1-git-send-email-lirongqing@baidu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c6c4f961

31 10月, 2020 1 次提交

KVM: x86: replace static const variables with macros · 8a967d65

由 Paolo Bonzini 提交于 10月 30, 2020

Even though the compiler is able to replace static const variables with
their value, it will warn about them being unused when Linux is built with W=1.
Use good old macros instead, this is not C++.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a967d65

23 10月, 2020 10 次提交

kvm: x86/mmu: NX largepage recovery for TDP MMU · 29cf0f50

由 Ben Gardon 提交于 10月 14, 2020

When KVM maps a largepage backed region at a lower level in order to
make it executable (i.e. NX large page shattering), it reduces the TLB
performance of that region. In order to avoid making this degradation
permanent, KVM must periodically reclaim shattered NX largepages by
zapping them and allowing them to be rebuilt in the page fault handler.

With this patch, the TDP MMU does not respect KVM's rate limiting on
reclaim. It traverses the entire TDP structure every time. This will be
addressed in a future patch.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-21-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

29cf0f50

kvm: x86/mmu: Don't clear write flooding count for direct roots · daa5b6c1

由 Ben Gardon 提交于 10月 14, 2020

Direct roots don't have a write flooding count because the guest can't
affect that paging structure. Thus there's no need to clear the write
flooding count on a fast CR3 switch for direct roots.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-20-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

daa5b6c1

kvm: x86/mmu: Support MMIO in the TDP MMU · 95fb5b02

由 Ben Gardon 提交于 10月 14, 2020

In order to support MMIO, KVM must be able to walk the TDP paging
structures to find mappings for a given GFN. Support this walk for
the TDP MMU.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

v2: Thanks to Dan Carpenter and kernel test robot for finding that root
was used uninitialized in get_mmio_spte.
Signed-off-by: NBen Gardon <bgardon@google.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20201014182700.2888246-19-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95fb5b02

kvm: x86/mmu: Support write protection for nesting in tdp MMU · 46044f72

由 Ben Gardon 提交于 10月 14, 2020

To support nested virtualization, KVM will sometimes need to write
protect pages which are part of a shadowed paging structure or are not
writable in the shadowed paging structure. Add a function to write
protect GFN mappings for this purpose.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-18-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46044f72

kvm: x86/mmu: Support disabling dirty logging for the tdp MMU · 14881998

由 Ben Gardon 提交于 10月 14, 2020

Dirty logging ultimately breaks down MMU mappings to 4k granularity.
When dirty logging is no longer needed, these granaular mappings
represent a useless performance penalty. When dirty logging is disabled,
search the paging structure for mappings that could be re-constituted
into a large page mapping. Zap those mappings so that they can be
faulted in again at a higher mapping level.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-17-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14881998

kvm: x86/mmu: Support dirty logging for the TDP MMU · a6a0b05d

由 Ben Gardon 提交于 10月 14, 2020

Dirty logging is a key feature of the KVM MMU and must be supported by
the TDP MMU. Add support for both the write protection and PML dirty
logging modes.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-16-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6a0b05d

kvm: x86/mmu: Support changed pte notifier in tdp MMU · 1d8dd6b3

由 Ben Gardon 提交于 10月 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
a hook and handle the change_pte MMU notifier.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-15-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d8dd6b3

kvm: x86/mmu: Add access tracking for tdp_mmu · f8e14497

由 Ben Gardon 提交于 10月 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. The
main Linux MM uses the access tracking MMU notifiers for swap and other
features. Add hooks to handle the test/flush HVA (range) family of
MMU notifiers.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-14-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8e14497

kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU · 063afacd

由 Ben Gardon 提交于 10月 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
hooks to handle the invalidate range family of MMU notifiers.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-13-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

063afacd

kvm: x86/mmu: Add TDP MMU PF handler · bb18842e

由 Ben Gardon 提交于 10月 14, 2020

Add functions to handle page faults in the TDP MMU. These page faults
are currently handled in much the same way as the x86 shadow paging
based MMU, however the ordering of some operations is slightly
different. Future patches will add eager NX splitting, a fast page fault
handler, and parallel page faults.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-11-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb18842e

22 10月, 2020 10 次提交

kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg · 7d945312

由 Ben Gardon 提交于 10月 14, 2020

In order to avoid creating executable hugepages in the TDP MMU PF
handler, remove the dependency between disallowed_hugepage_adjust and
the shadow_walk_iterator. This will open the function up to being used
by the TDP MMU PF handler in a future patch.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-10-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7d945312

kvm: x86/mmu: Support zapping SPTEs in the TDP MMU · faaf05b0

由 Ben Gardon 提交于 10月 14, 2020

Add functions to zap SPTEs to the TDP MMU. These are needed to tear down
TDP MMU roots properly and implement other MMU functions which require
tearing down mappings. Future patches will add functions to populate the
page tables, but as for this patch there will not be any work for these
functions to do.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-8-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

faaf05b0

kvm: x86/mmu: Add functions to handle changed TDP SPTEs · 2f2fad08

由 Ben Gardon 提交于 10月 14, 2020

The existing bookkeeping done by KVM when a PTE is changed is spread
around several functions. This makes it difficult to remember all the
stats, bitmaps, and other subsystems that need to be updated whenever a
PTE is modified. When a non-leaf PTE is marked non-present or becomes a
leaf PTE, page table memory must also be freed. To simplify the MMU and
facilitate the use of atomic operations on SPTEs in future patches, create
functions to handle some of the bookkeeping required as a result of
a change.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f2fad08

kvm: x86/mmu: Allocate and free TDP MMU roots · 02c00b3a

由 Ben Gardon 提交于 10月 14, 2020

The TDP MMU must be able to allocate paging structure root pages and track
the usage of those pages. Implement a similar, but separate system for root
page allocation to that of the x86 shadow paging implementation. When
future patches add synchronization model changes to allow for parallel
page faults, these pages will need to be handled differently from the
x86 shadow paging based MMU's root pages.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02c00b3a

kvm: x86/mmu: Init / Uninit the TDP MMU · fe5db27d

由 Ben Gardon 提交于 10月 14, 2020

The TDP MMU offers an alternative mode of operation to the x86 shadow
paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU
will require new fields that need to be initialized and torn down. Add
hooks into the existing KVM MMU initialization process to do that
initialization / cleanup. Currently the initialization and cleanup
fucntions do not do very much, however more operations will be added in
future patches.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-4-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe5db27d

KVM: mmu: extract spte.h and spte.c · 5a9624af

由 Paolo Bonzini 提交于 10月 16, 2020

The SPTE format will be common to both the shadow and the TDP MMU.

Extract code that implements the format to a separate module, as a
first step towards adding the TDP MMU and putting mmu.c on a diet.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a9624af

KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp · cb3eedab

由 Paolo Bonzini 提交于 9月 28, 2020

The TDP MMU's own function for the changed-PTE notifier will need to be
update a PTE in the exact same way as the shadow MMU. Rather than
re-implementing this logic, factor the SPTE creation out of kvm_set_pte_rmapp.

Extracted out of a patch by Ben Gardon. <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cb3eedab

kvm: x86/mmu: Separate making SPTEs from set_spte · 799a4190

由 Ben Gardon 提交于 10月 14, 2020

Separate the functions for generating leaf page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

No functional change expected.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This commit introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

799a4190

kvm: mmu: Separate making non-leaf sptes from link_shadow_page · cc4674d0

由 Ben Gardon 提交于 9月 25, 2020

The TDP MMU page fault handler will need to be able to create non-leaf
SPTEs to build up the paging structures. Rather than re-implementing the
function, factor the SPTE creation out of link_shadow_page.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20200925212302.3979661-9-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc4674d0

kvm x86/mmu: Make struct kernel_param_ops definitions const · d5d6c18d

由 Joe Perches 提交于 10月 03, 2020

These should be const, so make it so.
Signed-off-by: NJoe Perches <joe@perches.com>
Message-Id: <ed95eef4f10fc1317b66936c05bc7dd8f943a6d5.1601770305.git.joe@perches.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5d6c18d

28 9月, 2020 4 次提交

KVM: x86/mmu: Move individual kvm_mmu initialization into common helper · 04d28e37

由 Sean Christopherson 提交于 9月 23, 2020

Move initialization of 'struct kvm_mmu' fields into alloc_mmu_pages() to
consolidate code, and rename the helper to __kvm_mmu_create().

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923163314.8181-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

04d28e37

KVM: x86/mmu: Track write/user faults using bools · e88b8093

由 Sean Christopherson 提交于 9月 23, 2020

Use bools to track write and user faults throughout the page fault paths
and down into mmu_set_spte().  The actual usage is purely boolean, but
that's not obvious without digging into all paths as the current code
uses a mix of bools (TDP and try_async_pf) and ints (shadow paging and
mmu_set_spte()).

No true functional change intended (although the pgprintk() will now
print 0/1 instead of 0/PFERR_WRITE_MASK).
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923183735.584-9-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e88b8093

KVM: x86/mmu: Hoist ITLB multi-hit workaround check up a level · dcc70651

由 Sean Christopherson 提交于 9月 23, 2020

Move the "ITLB multi-hit workaround enabled" check into the callers of
disallowed_hugepage_adjust() to make it more obvious that the helper is
specific to the workaround, and to be consistent with the accounting,
i.e. account_huge_nx_page() is called if and only if the workaround is
enabled.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923183735.584-8-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dcc70651

KVM: x86/mmu: Account NX huge page disallowed iff huge page was requested · 5bcaf3e1

由 Sean Christopherson 提交于 9月 23, 2020

Condition the accounting of a disallowed huge NX page on the original
requested level of the page being greater than the current iterator
level. This does two things: accounts the page if and only if a huge
page was actually disallowed, and accounts the shadow page if and only
if it was the level at which the huge page was disallowed. For the
latter case, the previous logic would account all shadow pages used to
create the translation for the forced small page, e.g. even PML4, which
can't be a huge page on current hardware, would be accounted as having
been a disallowed huge page when using 5-level EPT.

The overzealous accounting is purely a performance issue, i.e. the
recovery thread will spuriously zap shadow pages, but otherwise the bad
behavior is harmless.

Cc: Junaid Shahid <junaids@google.com>
Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923183735.584-6-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5bcaf3e1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功