提交 · 35149e2129fe34fc8cb5917e1ecf5156b0fa3415 · openanolis / cloud-kernel

27 4月, 2008 18 次提交

KVM: MMU: Don't assume struct page for x86 · 35149e21

由 Anthony Liguori 提交于 4月 02, 2008

This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty().  Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.

We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results.  When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.

This does not implement support for avoiding reference counting for reserved
RAM or for IO memory.  However, it should make those things pretty straight
forward.

Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those.  I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.

[avi: fix overflow when shifting left pfns by adding casts]
Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

35149e21

KVM: MMU: prepopulate guest pages after write-protecting · bed1d1df

由 Marcelo Tosatti 提交于 4月 04, 2008

Zdenek reported a bug where a looping "dmsetup status" eventually hangs
on SMP guests.

The problem is that kvm_mmu_get_page() prepopulates the shadow MMU
before write protecting the guest page tables. By doing so, it leaves a
window open where the guest can mark a pte as present while the host has
shadow cached such pte as "notrap". Accesses to such address will fault
in the guest without the host having a chance to fix the situation.

Fix by moving the write protection before the pte prefetch.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

bed1d1df

KVM: MMU: Only mark_page_accessed() if the page was accessed by the guest · fcd6dbac

由 Avi Kivity 提交于 4月 03, 2008

If the accessed bit is not set, the guest has never accessed this page
(at least through this spte), so there's no need to mark the page
accessed.  This provides more accurate data for the eviction algortithm.

Noted by Andrea Arcangeli.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

fcd6dbac

KVM: MMU: allow the vm to shrink the kvm mmu shadow caches · 3ee16c81

由 Izik Eidus 提交于 3月 30, 2008

Allow the Linux memory manager to reclaim memory in the kvm shadow cache.
Signed-off-by: NIzik Eidus <izike@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

3ee16c81

KVM: MMU: unify slots_lock usage · 3200f405

由 Marcelo Tosatti 提交于 3月 29, 2008

Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.

Also fix some callsites that were not grabbing the lock properly.

[avi: drop slots_lock while in guest mode to avoid holding the lock
      for indefinite periods]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

3200f405

A
KVM: MMU: Introduce and use spte_to_page() · 0b49ea86
由 Avi Kivity 提交于 3月 23, 2008
```
Encapsulate the pte mask'n'shift in a function.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
0b49ea86

KVM: MMU: fix dirty bit setting when removing write permissions · 855149aa

由 Izik Eidus 提交于 3月 20, 2008

When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.

This patch fix this issue by marking the page as dirty inside
rmap_write_protect().
Signed-off-by: NIzik Eidus <izike@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

855149aa

KVM: MMU: Set the accessed bit on non-speculative shadow ptes · 947da538

由 Avi Kivity 提交于 3月 18, 2008

If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction.  This saves a read-modify-write
operation.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

947da538

KVM: MMU: hypercall based pte updates and TLB flushes · 2f333bcb

由 Marcelo Tosatti 提交于 2月 22, 2008

Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - one mmu_op hypercall instead of one per op
 - allow 64-bit gpa on hypercall
 - don't pass host errors (-ENOMEM) to guest]

[akpm: warning fix on i386]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

2f333bcb

KVM: replace remaining __FUNCTION__ occurances · b8688d51

由 Harvey Harrison 提交于 3月 03, 2008

__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

b8688d51

KVM: MMU: large page support · 05da4558

由 Marcelo Tosatti 提交于 2月 23, 2008

Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed.  If the largepage contains
write-protected pages, a large pte is not used.

Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.

Anthony measures a 4% improvement on 4-way kernbench, with NPT.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

05da4558

KVM: MMU: ignore zapped root pagetables · 2e53d63a

由 Marcelo Tosatti 提交于 2月 20, 2008

Mark zapped root pagetables as invalid and ignore such pages during lookup.

This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

2e53d63a

KVM: MMU: add TDP support to the KVM MMU · fb72d167

由 Joerg Roedel 提交于 2月 07, 2008

This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

fb72d167

KVM: MMU: make the __nonpaging_map function generic · 4d9976bb

由 Joerg Roedel 提交于 2月 07, 2008

The mapping function for the nonpaging case in the softmmu does basically the
same as required for Nested Paging. Make this function generic so it can be
used for both.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

4d9976bb

KVM: export information about NPT to generic x86 code · 18552672

由 Joerg Roedel 提交于 2月 07, 2008

The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

18552672

KVM: MMU: Decouple mmio from shadow page tables · d196e343

由 Avi Kivity 提交于 1月 24, 2008

Currently an mmio guest pte is encoded in the shadow pagetable as a
not-present trapping pte, with the SHADOW_IO_MARK bit set. However
nothing is ever done with this information, so maintaining it is a
useless complication.

This patch moves the check for mmio to before shadow ptes are instantiated,
so the shadow code is never invoked for ptes that reference mmio. The code
is simpler, and with future work, can be made to handle mmio concurrently.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

d196e343

KVM: MMU: Simplify hash table indexing · 1ae0a13d

由 Dong, Eddie 提交于 1月 07, 2008

Signed-off-by: NYaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1ae0a13d

KVM: MMU: Update shadow ptes on partial guest pte writes · 489f1d65

由 Dong, Eddie 提交于 1月 07, 2008

A guest partial guest pte write will leave shadow_trap_nonpresent_pte
in spte, which generates a vmexit at the next guest access through that pte.

This patch improves this by reading the full guest pte in advance and thus
being able to update the spte and eliminate the vmexit.

This helps pae guests which use two 32-bit writes to set a single 64-bit pte.

[truncation fix by Eric]
Signed-off-by: NYaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: NFeng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

489f1d65

25 3月, 2008 3 次提交

KVM: MMU: Fix memory leak on guest demand faults · e48bb497

由 Avi Kivity 提交于 3月 23, 2008

While backporting 72dc67a6, a gfn_to_page()
call was duplicated instead of moved (due to an unrelated patch not being
present in mainline).  This caused a page reference leak, resulting in a
fairly massive memory leak.

Fix by removing the extraneous gfn_to_page() call.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

e48bb497

KVM: MMU: handle page removal with shadow mapping · 15aaa819

由 Marcelo Tosatti 提交于 3月 17, 2008

Do not assume that a shadow mapping will always point to the same host
frame number.  Fixes crash with madvise(MADV_DONTNEED).

[avi: move after first printk(), add another printk()]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

15aaa819

KVM: MMU: Fix is_rmap_pte() with io ptes · 4b1a80fa

由 Avi Kivity 提交于 3月 23, 2008

is_rmap_pte() doesn't take into account io ptes, which have the avail bit set.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

4b1a80fa

04 3月, 2008 3 次提交

KVM: MMU: Fix race when instantiating a shadow pte · f7d9c7b7

由 Avi Kivity 提交于 2月 26, 2008

For improved concurrency, the guest walk is performed concurrently with other
vcpus.  This means that we need to revalidate the guest ptes once we have
write-protected the guest page tables, at which point they can no longer be
modified.

The current code attempts to avoid this check if the shadow page table is not
new, on the assumption that if it has existed before, the guest could not have
modified the pte without the shadow lock.  However the assumption is incorrect,
as the racing vcpu could have modified the pte, then instantiated the shadow
page, before our vcpu regains control:

  vcpu0        vcpu1

  fault
  walk pte

               modify pte
               fault in same pagetable
               instantiate shadow page

  lookup shadow page
  conclude it is old
  instantiate spte based on stale guest pte

We could do something clever with generation counters, but a test run by
Marcelo suggests this is unnecessary and we can just do the revalidation
unconditionally.  The pte will be in the processor cache and the check can
be quite fast.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

f7d9c7b7

KVM: make MMU_DEBUG compile again · 24993d53

由 Marcelo Tosatti 提交于 2月 14, 2008

the cr3 variable is now inside the vcpu->arch structure.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

24993d53

KVM: remove the usage of the mmap_sem for the protection of the memory slots. · 72dc67a6

由 Izik Eidus 提交于 2月 10, 2008

This patch replaces the mmap_sem lock for the memory slots with a new
kvm private lock, it is needed beacuse untill now there were cases where
kvm accesses user memory while holding the mmap semaphore.
Signed-off-by: NIzik Eidus <izike@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

72dc67a6

31 1月, 2008 8 次提交

KVM: MMU: Fix dirty page setting for pages removed from rmap · 75e68e60

由 Izik Eidus 提交于 1月 12, 2008

Right now rmap_remove won't set the page as dirty if the shadow pte
pointed to this page had write access and then it became readonly.
This patches fixes that, by setting the page as dirty for spte changes from
write to readonly access.
Signed-off-by: NIzik Eidus <izike@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

75e68e60

KVM: MMU: Move kvm_free_some_pages() into critical section · eb787d10

由 Avi Kivity 提交于 12月 31, 2007

If some other cpu steals mmu pages between our check and an attempt to
allocate, we can run out of mmu pages.  Fix by moving the check into the
same critical section as the allocation.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

eb787d10

KVM: MMU: Switch to mmu spinlock · aaee2c94

由 Marcelo Tosatti 提交于 12月 20, 2007

Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.

Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

aaee2c94

KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte() · d7824fff

由 Avi Kivity 提交于 12月 30, 2007

Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).

[marcelo: avoid recursive locking of mmap_sem]
Signed-off-by: NAvi Kivity <avi@qumranet.com>

d7824fff

KVM: MMU: Concurrent guest walkers · 10589a46

由 Marcelo Tosatti 提交于 12月 20, 2007

Do not hold kvm->lock mutex across the entire pagefault code,
only acquire it in places where it is necessary, such as mmu
hash list, active list, rmap and parent pte handling.

Allow concurrent guest walkers by switching walk_addr() to use
mmap_sem in read-mode.

And get rid of the lockless __gfn_to_page.

[avi: move kvm_mmu_pte_write() locking inside the function]
[avi: add locking for real mode]
[avi: fix cmpxchg locking]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

10589a46

A
KVM: MMU: Add cache miss statistic · dfc5aa00
由 Avi Kivity 提交于 12月 18, 2007
```
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
dfc5aa00

KVM: MMU: Coalesce remote tlb flushes · caa5b8a5

由 Eddie Dong 提交于 12月 18, 2007

Host side TLB flush can be merged together if multiple
spte need to be write-protected.
Signed-off-by: NYaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

caa5b8a5

KVM: Move arch dependent files to new directory arch/x86/kvm/ · edf88417

由 Avi Kivity 提交于 12月 16, 2007

This paves the way for multiple architecture support.  Note that while
ioapic.c could potentially be shared with ia64, it is also moved.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

edf88417

30 1月, 2008 8 次提交

KVM: Portability: Move mmu-related fields to kvm_arch · f05e70ac

由 Zhang Xiantao 提交于 12月 14, 2007

This patches moves mmu-related fields to kvm_arch.
Signed-off-by: NZhang Xiantao <xiantao.zhang@intel.com>
Acked-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

f05e70ac

KVM: Portability: Split mmu-related static inline functions to mmu.h · 1d737c8a

由 Zhang Xiantao 提交于 12月 14, 2007

Since these functions need to know the details of kvm or kvm_vcpu structure,
it can't be put in x86.h.  Create mmu.h to hold them.
Signed-off-by: NZhang Xiantao <xiantao.zhang@intel.com>
Acked-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1d737c8a

KVM: Portability: Introduce kvm_vcpu_arch · ad312c7c

由 Zhang Xiantao 提交于 12月 13, 2007

Move all the architecture-specific fields in kvm_vcpu into a new struct
kvm_vcpu_arch.
Signed-off-by: NZhang Xiantao <xiantao.zhang@intel.com>
Acked-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

ad312c7c

KVM: MMU: Fix SMP shadow instantiation race · 7819026e

由 Marcelo Tosatti 提交于 12月 11, 2007

There is a race where VCPU0 is shadowing a pagetable entry while VCPU1
is updating it, which results in a stale shadow copy.

Fix that by comparing the contents of the cached guest pte with the
current guest pte after write-protecting the guest pagetable.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

7819026e

KVM: MMU: Use mmu_set_spte() for real-mode shadows · e833240f

由 Avi Kivity 提交于 12月 09, 2007

In addition to removing some duplicated code, this also handles the unlikely
case of real-mode code updating a guest page table.  This can happen when
one vcpu (in real mode) touches a second vcpu's (in protected mode) page
tables, or if a vcpu switches to real mode, touches page tables, and switches
back.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

e833240f

A
KVM: MMU: Adjust mmu_set_spte() debug code for gpte removal · bc750ba8
由 Avi Kivity 提交于 12月 09, 2007
```
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
bc750ba8

KVM: MMU: Move set_pte() into guest paging mode independent code · 1c4f1fd6

由 Avi Kivity 提交于 12月 09, 2007

As set_pte() no longer references either a gpte or the guest walker, we can
move it out of paging mode dependent code (which compiles twice and is
generally nasty).
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1c4f1fd6

KVM: MMU: Fix inherited permissions for emulated guest pte updates · 41074d07

由 Avi Kivity 提交于 12月 09, 2007

When we emulate a guest pte write, we fail to apply the correct inherited
permissions from the parent ptes. Now that we store inherited permissions
in the shadow page, we can use that to update the pte permissions correctly.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

41074d07

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功