提交 · 2f4f337248cd5660040b7e09b7287a7a0a861f3f · openeuler / Kernel

24 10月, 2010 18 次提交

KVM: MMU: move audit to a separate file · 2f4f3372

由 Xiao Guangrong 提交于 8月 30, 2010

Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2f4f3372

KVM: MMU: support disable/enable mmu audit dynamicly · 8b1fe17c

由 Xiao Guangrong 提交于 8月 30, 2010

Add a r/w module parameter named 'mmu_audit', it can control audit
enable/disable:

enable:
  echo 1 > /sys/module/kvm/parameters/mmu_audit

disable:
  echo 0 > /sys/module/kvm/parameters/mmu_audit

This patch not change the logic
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8b1fe17c

KVM: MMU: remove count_rmaps() · 8e0e8afa

由 Xiao Guangrong 提交于 8月 28, 2010

Nothing is checked in count_rmaps(), so remove it
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8e0e8afa

KVM: MMU: rewrite audit_mappings_page() function · 365fb3fd

由 Xiao Guangrong 提交于 8月 28, 2010

There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in
atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection).

This patch fix it by:
- introduce gfn_to_pfn_atomic instead of gfn_to_pfn
- get the mapping gfn from kvm_mmu_page_get_gfn()

And it adds 'notrap' ptes check in unsync/direct sps
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

365fb3fd

KVM: MMU: fix wrong not write protected sp report · bc32ce21

由 Xiao Guangrong 提交于 8月 28, 2010

The audit code reports some sp not write protected in current code, it's just the
bug in audit_write_protection(), since:

- the invalid sp not need write protected
- using uninitialize local variable('gfn')
- call kvm_mmu_audit() out of mmu_lock's protection
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bc32ce21

KVM: MMU: check rmap for every spte · 0beb8d66

由 Xiao Guangrong 提交于 8月 28, 2010

The read-only spte also has reverse mapping, so fix the code to check them,
also modify the function name to fit its doing
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0beb8d66

KVM: MMU: fix compile warning in audit code · 9ad17b10

由 Xiao Guangrong 提交于 8月 28, 2010

fix:

arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’:
arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’:
arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘set_spte’:
arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’:
arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9ad17b10

KVM: MMU: prefetch ptes when intercepted guest #PF · 957ed9ef

由 Xiao Guangrong 提交于 8月 22, 2010

Support prefetch ptes when intercept guest #PF, avoid to #PF by later
access

If we meet any failure in the prefetch path, we will exit it and
not try other ptes to avoid become heavy path
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

957ed9ef

KVM: MMU: fix missing percpu counter destroy · 45bf21a8

由 Wei Yongjun 提交于 8月 23, 2010

commit ad05c88266b4cce1c820928ce8a0fb7690912ba1
(KVM: create aggregate kvm_total_used_mmu_pages value)
introduce percpu counter kvm_total_used_mmu_pages but never
destroy it, this may cause oops when rmmod & modprobe.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

45bf21a8

KVM: MMU: fix regression from rework mmu_shrink() code · 80b63faf

由 Xiaotian Feng 提交于 8月 24, 2010

Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/
kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called
by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(),
This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever.
Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.
Reported-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Tested-by: NAvi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

80b63faf

KVM: create aggregate kvm_total_used_mmu_pages value · 45221ab6

由 Dave Hansen 提交于 8月 19, 2010

Of slab shrinkers, the VM code says:

 * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
 * querying the cache size, so a fastpath for that case is appropriate.

and it *means* it.  Look at how it calls the shrinkers:

    nr_before = (*shrinker->shrink)(0, gfp_mask);
    shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);

So, if you do anything stupid in your shrinker, the VM will doubly
punish you.

The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence.  If we have 100 VMs, then
we're going to take 101 locks.  We do it twice, so each call takes
202 locks.  If we're under memory pressure, we can have each cpu
trying to do this.  It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.

This is guaranteed to optimize at least half of those lock
aquisitions away.  It removes the need to take any of the locks
when simply trying to count objects.

A 'percpu_counter' can be a large object, but we only have one
of these for the entire system.  There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

45221ab6

KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages · 49d5ca26

由 Dave Hansen 提交于 8月 19, 2010

Doing this makes the code much more readable.  That's
borne out by the fact that this patch removes code.  "used"
also happens to be the number that we need to return back to
the slab code when our shrinker gets called.  Keeping this
value as opposed to free makes the next patch simpler.

So, 'struct kvm' is kzalloc()'d.  'struct kvm_arch' is a
structure member (and not a pointer) of 'struct kvm'.  That
means they start out zeroed.  I _think_ they get initialized
properly by kvm_mmu_change_mmu_pages().  But, that only happens
via kvm ioctls.

Another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

49d5ca26

KVM: rename x86 kvm->arch.n_alloc_mmu_pages · 39de71ec

由 Dave Hansen 提交于 8月 19, 2010

arch.n_alloc_mmu_pages is a poor choice of name. This value truly
means, "the number of pages which _may_ be allocated".  But,
reading the name, "n_alloc_mmu_pages" implies "the number of allocated
mmu pages", which is dead wrong.

It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages.  This change will make the next few patches
much more obvious and easy to read.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

39de71ec

KVM: abstract kvm x86 mmu->n_free_mmu_pages · e0df7b9f

由 Dave Hansen 提交于 8月 19, 2010

"free" is a poor name for this value.  In this context, it means,
"the number of mmu pages which this kvm instance should be able to
allocate."  But "free" implies much more that the objects are there
and ready for use.  "available" is a much better description, especially
when you see how it is calculated.

In this patch, we abstract its use into a function.  We'll soon
replace the function's contents by calculating the value in a
different way.

All of the reads of n_free_mmu_pages are taken care of in this
patch.  The modification sites will be handled in a patch
later in the series.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e0df7b9f

KVM: MMU: mark page dirty only when page is really written · 4132779b

由 Xiao Guangrong 提交于 8月 02, 2010

Mark page dirty only when this page is really written, it's more exacter,
and also can fix dirty page marking in speculation path
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4132779b

KVM: MMU: move bits lost judgement into a separate function · 8672b721

由 Xiao Guangrong 提交于 8月 02, 2010

Introduce spte_has_volatile_bits() function to judge whether spte
bits will miss, it's more readable and can help us to cleanup code
later
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8672b721

KVM: MMU: using kvm_set_pfn_accessed() instead of mark_page_accessed() · 251464c4

由 Xiao Guangrong 提交于 8月 02, 2010

It's a small cleanup that using using kvm_set_pfn_accessed() instead
of mark_page_accessed()
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

251464c4

KVM: MMU: remove valueless output message · 19ada5c4

由 Xiao Guangrong 提交于 7月 27, 2010

After commit 53383eaad08d, the '*spte' has updated before call
rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so
remove this information from error message
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

19ada5c4

07 8月, 2010 1 次提交

x86, kvm: Remove cast obsoleted by set_64bit() prototype cleanup · 7645e432

由 H. Peter Anvin 提交于 8月 06, 2010

KVM ended up having to put a pretty ugly wrapper around set_64bit()
in order to get the type right.  Now set_64bit() takes the expected
u64 type, and this wrapper can be cleaned up.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Avi Kivity <avi@redhat.com>
LKML-Reference: <4C5C4E7A.8040603@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7645e432

02 8月, 2010 17 次提交

KVM: MMU: using __xchg_spte more smarter · 9a3aad70

由 Xiao Guangrong 提交于 7月 16, 2010

Sometimes, atomically set spte is not needed, this patch call __xchg_spte()
more smartly

Note: if the old mapping's access bit is already set, we no need atomic operation
since the access bit is not lost
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9a3aad70

KVM: MMU: cleanup spte set and accssed/dirty tracking · e4b502ea

由 Xiao Guangrong 提交于 7月 16, 2010

Introduce set_spte_track_bits() to cleanup current code
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e4b502ea

KVM: MMU: don't atomicly set spte if it's not present · be233d49

由 Xiao Guangrong 提交于 7月 16, 2010

If the old mapping is not present, the spte.a is not lost, so no need
atomic operation to set it
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

be233d49

KVM: MMU: fix page dirty tracking lost while sync page · 9ed5520d

由 Xiao Guangrong 提交于 7月 16, 2010

In sync-page path, if spte.writable is changed, it will lose page dirty
tracking, for example:

assume spte.writable = 0 in a unsync-page, when it's synced, it map spte
to writable(that is spte.writable = 1), later guest write spte.gfn, it means
spte.gfn is dirty, then guest changed this mapping to read-only, after it's
synced,  spte.writable = 0

So, when host release the spte, it detect spte.writable = 0 and not mark page
dirty
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9ed5520d

KVM: MMU: fix broken page accessed tracking with ept enabled · daa3db69

由 Xiao Guangrong 提交于 7月 16, 2010

In current code, if ept is enabled(shadow_accessed_mask = 0), the page
accessed tracking is lost.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

daa3db69

KVM: MMU: add missing reserved bits check in speculative path · fa1de2bf

由 Xiao Guangrong 提交于 7月 16, 2010

In the speculative path, we should check guest pte's reserved bits just as
the real processor does
Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fa1de2bf

KVM: MMU: fix mmu notifier invalidate handler for huge spte · 6e3e243c

由 Andrea Arcangeli 提交于 7月 16, 2010

The index wasn't calculated correctly (off by one) for huge spte so KVM guest
was unstable with transparent hugepages.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Reviewed-by: NReviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6e3e243c

KVM: MMU: Add validate_direct_spte() helper · a357bd22

由 Avi Kivity 提交于 7月 13, 2010

Add a helper to verify that a direct shadow page is valid wrt the required
access permissions; drop the page if it is not valid.
Reviewed-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a357bd22

KVM: MMU: Add drop_large_spte() helper · a3aa51cf

由 Avi Kivity 提交于 7月 13, 2010

To clarify spte fetching code, move large spte handling into a helper.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a3aa51cf

KVM: MMU: Use __set_spte to link shadow pages · 121eee97

由 Avi Kivity 提交于 7月 13, 2010

To avoid split accesses to 64 bit sptes on i386, use __set_spte() to link
shadow pages together.

(not technically required since shadow pages are __GFP_KERNEL, so upper 32
bits are always clear)
Reviewed-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

121eee97

KVM: MMU: Add link_shadow_page() helper · 32ef26a3

由 Avi Kivity 提交于 7月 13, 2010

To simplify the process of fetching an spte, add a helper that links
a shadow page to an spte.
Reviewed-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

32ef26a3

KVM: Return EFAULT from kvm ioctl when guest accesses bad area · edba23e5

由 Gleb Natapov 提交于 7月 07, 2010

Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

edba23e5

KVM: MMU: Don't drop accessed bit while updating an spte · b79b93f9

由 Avi Kivity 提交于 6月 06, 2010

__set_spte() will happily replace an spte with the accessed bit set with
one that has the accessed bit clear.  Add a helper update_spte() which checks
for this condition and updates the page flag if needed.
Signed-off-by: NAvi Kivity <avi@redhat.com>

b79b93f9

KVM: MMU: Atomically check for accessed bit when dropping an spte · a9221dd5

由 Avi Kivity 提交于 6月 06, 2010

Currently, in the window between the check for the accessed bit, and actually
dropping the spte, a vcpu can access the page through the spte and set the bit,
which will be ignored by the mmu.

Fix by using an exchange operation to atmoically fetch the spte and drop it.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a9221dd5

KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte() · ce061867

由 Avi Kivity 提交于 6月 06, 2010

Since we need to make the check atomic, move it to the place that will
set the new spte.
Signed-off-by: NAvi Kivity <avi@redhat.com>

ce061867

KVM: MMU: Introduce drop_spte() · be38d276

由 Avi Kivity 提交于 6月 06, 2010

When we call rmap_remove(), we (almost) always immediately follow it by
an __set_spte() to a nonpresent pte.  Since we need to perform the two
operations atomically, to avoid losing the dirty and accessed bits, introduce
a helper drop_spte() and convert all call sites.

The operation is still nonatomic at this point.
Signed-off-by: NAvi Kivity <avi@redhat.com>

be38d276

KVM: VMX: fix tlb flush with invalid root · dd180b3e

由 Xiao Guangrong 提交于 7月 03, 2010

Commit 341d9b535b6c simplify reload logic while entry guest mode, it
can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and
KVM_REQ_MMU_SYNC both set.

But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the
root is invalid, it is triggered during my test:

Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable]
......

Fixed by directly return if the root is not ready.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dd180b3e

01 8月, 2010 4 次提交

KVM: Remove unnecessary divide operations · 82855413

由 Joerg Roedel 提交于 7月 01, 2010

This patch converts unnecessary divide and modulo operations
in the KVM large page related code into logical operations.
This allows to convert gfn_t to u64 while not breaking 32
bit builds.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

82855413

KVM: MMU: fix writable sync sp mapping · 36a2e677

由 Xiao Guangrong 提交于 6月 30, 2010

While we sync many unsync sp at one time(in mmu_sync_children()),
we may mapping the spte writable, it's dangerous, if one unsync
sp's mapping gfn is another unsync page's gfn.

For example:

SP1.pte[0] = P
SP2.gfn's pfn = P
[SP1.pte[0] = SP2.gfn's pfn]

First, we write protected SP1 and SP2, but SP1 and SP2 are still the
unsync sp.

Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp,
that is SP2, so it will mapping it writable, but we plan to sync SP2 soon,
at this point, the SP2->unsync is not reliable since later we sync SP2 but
SP2->gfn is already writable.

So the final result is: SP2 is the sync page but SP2.gfn is writable.

This bug will corrupt guest's page table, fixed by mark read-only mapping
if the mapped gfn has shadow pages.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

36a2e677

KVM: Add mini-API for vcpu->requests · a8eeb04a

由 Avi Kivity 提交于 5月 10, 2010

Makes it a little more readable and hackable.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a8eeb04a

KVM: Remove memory alias support · a1f4d395

由 Avi Kivity 提交于 6月 21, 2010

As advertised in feature-removal-schedule.txt.  Equivalent support is provided
by overlapping memory regions.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1f4d395

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功