提交 · 01c168ac3d6568fed0373d82bd2db2b9339aab16 · openeuler / raspberrypi-kernel

01 8月, 2010 16 次提交

KVM: MMU: don't check PT_WRITABLE_MASK directly · 01c168ac

由 Gui Jianfeng 提交于 5月 27, 2010

Since we have is_writable_pte(), make use of it.
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

01c168ac

KVM: MMU: Calculate correct base gfn for direct non-DIR level · c9fa0b3b

由 Lai Jiangshan 提交于 5月 26, 2010

In Document/kvm/mmu.txt:
  gfn:
    Either the guest page table containing the translations shadowed by this
    page, or the base page frame for linear translations. See role.direct.

But in __direct_map(), the base gfn calculation is incorrect,
it does not calculate correctly when level=3 or 4.

Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly.
Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c9fa0b3b

KVM: MMU: Don't allocate gfns page for direct mmu pages · 2032a93d

由 Lai Jiangshan 提交于 5月 26, 2010

When sp->role.direct is set, sp->gfns does not contain any essential
information, leaf sptes reachable from this sp are for a continuous
guest physical memory range (a linear range).
So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
Obviously, it is not essential information, we can calculate it when need.

It means we don't need sp->gfns when sp->role.direct=1,
Thus we can save one page usage for every kvm_mmu_page.

Note:
  Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
  or kvm_mmu_page_set_gfn().
  It is only exposed in FNAME(sync_page).
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2032a93d

KVM: MMU: allow more page become unsync at getting sp time · 9f1a122f

由 Xiao Guangrong 提交于 5月 24, 2010

Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level > 1), we should unsync all
gfn's unsync page
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9f1a122f

KVM: MMU: allow more page become unsync at gfn mapping time · 9cf5cf5a

由 Xiao Guangrong 提交于 5月 24, 2010

In current code, shadow page can become asynchronous only if one
shadow page for a gfn, this rule is too strict, in fact, we can
let all last mapping page(i.e, it's the pte page) become unsync,
and sync them at invlpg or flush tlb time.

This patch allow more page become asynchronous at gfn mapping time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9cf5cf5a

A
KVM: Update Red Hat copyrights · 221d059d
由 Avi Kivity 提交于 5月 23, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
221d059d

KVM: MMU: don't write-protect if have new mapping to unsync page · e02aa901

由 Xiao Guangrong 提交于 5月 15, 2010

Two cases maybe happen in kvm_mmu_get_page() function:

- one case is, the goal sp is already in cache, if the sp is unsync,
  we only need update it to assure this mapping is valid, but not
  mark it sync and not write-protect sp->gfn since it not broke unsync
  rule(one shadow page for a gfn)

- another case is, the goal sp not existed, we need create a new sp
  for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
  we should sync(mark sync and write-protect) gfn's unsync shadow page.
  After enabling multiple unsync shadows, we sync those shadow pages
  only when the new sp not allow to become unsync(also for the unsyc
  rule, the new rule is: allow all pte page become unsync)
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e02aa901

KVM: MMU: split kvm_sync_page() function · 1d9dc7e0

由 Xiao Guangrong 提交于 5月 15, 2010

Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
to clarify the code address Avi's suggestion

kvm_sync_page_transient() function only update shadow page but not mark
it sync and not write protect sp->gfn. it will be used by later patch
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1d9dc7e0

KVM: MMU: remove rmap before clear spte · 6d74229f

由 Xiao Guangrong 提交于 5月 13, 2010

Remove rmap before clear spte otherwise it will trigger BUG_ON() in
some functions such as rmap_write_protect().
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6d74229f

KVM: MMU: use proper cache object freeing function · e8ad9a70

由 Xiao Guangrong 提交于 5月 13, 2010

Use kmem_cache_free to free objects allocated by kmem_cache_alloc.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e8ad9a70

KVM: x86: Clean up duplicate assignment · 62ad0755

由 Sheng Yang 提交于 5月 12, 2010

mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the
destory_kvm_mmu().

kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to
vcpu->arch.cr4/efer, no need to do it again later.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

62ad0755

M
KVM: pass correct parameter to kvm_mmu_free_some_pages · 24955b6c
由 Marcelo Tosatti 提交于 5月 12, 2010
```
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
24955b6c

KVM: MMU: Fix free memory accounting race in mmu_alloc_roots() · f0f5933a

由 Avi Kivity 提交于 5月 10, 2010

We drop the mmu lock between freeing memory and allocating the roots; this
allows some other vcpu to sneak in and allocate memory.

While the race is benign (resulting only in temporary overallocation, not oom)
it is simple and easy to fix by moving the freeing close to the allocation.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f0f5933a

KVM: inject #UD if instruction emulation fails and exit to userspace · 6d77dbfc

由 Gleb Natapov 提交于 5月 10, 2010

Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6d77dbfc

KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed · 54a4f023

由 Gui Jianfeng 提交于 5月 05, 2010

Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

54a4f023

KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages · bf998156

由 Huang Ying 提交于 5月 31, 2010

In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]
Reported-by: NMax Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bf998156

19 7月, 2010 1 次提交

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0

13 7月, 2010 1 次提交

KVM: MMU: flush remote tlbs when overwriting spte with different pfn · 91546356

由 Xiao Guangrong 提交于 6月 30, 2010

After remove a rmap, we should flush all vcpu's tlb
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

91546356

09 6月, 2010 2 次提交

KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page · 69325a12

由 Avi Kivity 提交于 5月 27, 2010

If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0.
We do that by setting spte.w=1, since the host cr0.wp must remain set so the
host can write protect pages. Once we allow write access, we must remove
user access otherwise we mistakenly allow the user to write the page.
Reviewed-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

69325a12

KVM: MMU: invalidate and flush on spte small->large page size change · 3be2264b

由 Marcelo Tosatti 提交于 5月 28, 2010

Always invalidate spte and flush TLBs when changing page size, to make
sure different sized translations for the same address are never cached
in a CPU's TLB.

Currently the only case where this occurs is when a non-leaf spte pointer is
overwritten by a leaf, large spte entry. This can happen after dirty
logging is disabled on a memslot, for example.

Noticed by Andrea.

KVM-Stable-Tag
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3be2264b

19 5月, 2010 5 次提交

KVM: MMU: Segregate shadow pages with different cr0.wp · 3dbe1415

由 Avi Kivity 提交于 5月 12, 2010

When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte
having u/s=0 and r/w=1.  This allows excessive access if the guest sets
cr0.wp=1 and accesses through this spte.

Fix by making cr0.wp part of the base role; we'll have different sptes for
the two cases and the problem disappears.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3dbe1415

KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots · 8facbbff

由 Avi Kivity 提交于 5月 04, 2010

On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.

Push the spinlock into mmu_alloc_roots(), and only take it after we've read
the pdptr.
Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8facbbff

KVM: MMU: move unsync/sync tracpoints to proper place · 5e1b3ddb

由 Xiao Guangrong 提交于 4月 28, 2010

Move unsync/sync tracepoints to the proper place, it's good
for us to obtain unsync page live time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5e1b3ddb

KVM: Fix mmu shrinker error · d35b8dd9

由 Gui Jianfeng 提交于 4月 27, 2010

kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims
only one sp, but that's not the case. This will cause mmu shrinker returns
a wrong number. This patch fix the counting error.
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d35b8dd9

KVM: MMU: fix hashing for TDP and non-paging modes · 5a7388c2

由 Eric Northup 提交于 4月 26, 2010

For TDP mode, avoid creating multiple page table roots for the single
guest-to-host physical address map by fixing the inputs used for the
shadow page table hash in mmu_alloc_roots().
Signed-off-by: NEric Northup <digitaleric@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5a7388c2

17 5月, 2010 15 次提交

KVM: MMU: cleanup for function unaccount_shadowed() · 77a1a715

由 Wei Yongjun 提交于 4月 16, 2010

Since gfn is not changed in the for loop, we do not need to call
gfn_to_memslot_unaliased() under the loop, and it is safe to move
it out.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

77a1a715

KVM: Get rid of dead function gva_to_page() · 2a059bf4

由 Gui Jianfeng 提交于 4月 16, 2010

Nobody use gva_to_page() anymore, get rid of it.
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2a059bf4

KVM: MMU: Remove unused varialbe in rmap_next() · b2fc15a5

由 Gui Jianfeng 提交于 4月 16, 2010

Remove unused varialbe in rmap_next()
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b2fc15a5

KVM: use the correct RCU API for PROVE_RCU=y · 90d83dc3

由 Lai Jiangshan 提交于 4月 19, 2010

The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm]
 #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
 [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3
 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
 [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
 [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm]
 [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm]
 [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
 [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
 [<ffffffff810a8692>] ? unlock_page+0x27/0x2c
 [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1
 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm]
 [<ffffffff81060cfa>] ? up_read+0x23/0x3d
 [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6
 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db
 [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241
 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116
 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13
 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a
 [<ffffffff810021db>] system_call_fastpath+0x16/0x1b
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

90d83dc3

KVM: MMU: cleanup for hlist walk restart · 3246af0e

由 Xiao Guangrong 提交于 4月 16, 2010

Quote from Avi:

|Just change the assignment to a 'goto restart;' please,
|I don't like playing with list_for_each internals.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3246af0e

KVM: MMU: remove unused parameter in mmu_parent_walk() · 6b18493d

由 Xiao Guangrong 提交于 4月 16, 2010

'vcpu' is unused, remove it
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6b18493d

KVM: MMU: remove unused struct kvm_unsync_walk · 1b8c7934

由 Xiao Guangrong 提交于 4月 16, 2010

Remove 'struct kvm_unsync_walk' since it's not used.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1b8c7934

KVM: MMU: Replace role.glevels with role.cr4_pae · 5b7e0102

由 Avi Kivity 提交于 4月 14, 2010

There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way.  Drop
role.glevels and replace is with role.cr4_pae (which is meaningful).  This
simplifies the code a bit.

As a side effect, it allows sharing shadow page tables between pae and
longmode guest page tables at the same guest page.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5b7e0102

KVM: MMU: remove unused field · f84cbb05

由 Xiao Guangrong 提交于 4月 06, 2010

kvm_mmu_page.oos_link is not used, so remove it
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f84cbb05

KVM: MMU: cleanup/fix mmu audit code · 805d32de

由 Xiao Guangrong 提交于 4月 01, 2010

This patch does:
- 'sp' parameter in inspect_spte_fn() is not used, so remove it
- fix 'kvm' and 'slots' is not defined in count_rmaps()
- fix a bug in inspect_spte_has_rmap()
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

805d32de

KVM: MMU: Disassociate direct maps from guest levels · 84b0c8c6

由 Avi Kivity 提交于 3月 14, 2010

Direct maps are linear translations for a section of memory, used for
real mode or with large pages.  As such, they are independent of the guest
levels.

Teach the mmu about this by making page->role.glevels = 0 for direct maps.
This allows direct maps to be shared among real mode and the various paging
modes.
Signed-off-by: NAvi Kivity <avi@redhat.com>

84b0c8c6

KVM: MMU: check reserved bits only if CR4.PSE=1 or CR4.PAE=1 · f815bce8

由 Xiao Guangrong 提交于 3月 19, 2010

- Check reserved bits only if CR4.PAE=1 or CR4.PSE=1 when guest #PF occurs
- Fix a typo in reset_rsvds_bits_mask()
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f815bce8

KVM: MMU: Reinstate pte prefetch on invlpg · 08e850c6

由 Avi Kivity 提交于 3月 15, 2010

Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:

  "The processor may create entries in paging-structure caches for
   translations required for prefetches and for accesses that are a
   result of speculative execution that would never actually occur
   in the executed code path."

And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.

Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter.  If a race occured, do not install the pte.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

08e850c6

KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write() · 72016f3a

由 Avi Kivity 提交于 3月 15, 2010

kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

72016f3a

KVM: remove redundant initialization of page->private · d4f64b6c

由 Minchan Kim 提交于 3月 10, 2010

The prep_new_page() in page allocator calls set_page_private(page, 0).
So we don't need to reinitialize private of page.
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: Avi Kivity<avi@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d4f64b6c