提交 · 54987b7afa902e886b3a751c056c2a4d4701020e · Linux-御风守护者 / linux

05 9月, 2014 1 次提交

KVM: x86: propagate exception from permission checks on the nested page fault · 54987b7a

由 Paolo Bonzini 提交于 9月 02, 2014

Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault.  To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.
Reported-by: NValentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54987b7a

03 9月, 2014 5 次提交

KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD · a0c0feb5

由 Paolo Bonzini 提交于 9月 02, 2014

Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
page table entries. Intel ignores it; AMD ignores it in PDEs, but reserves it
in PDPEs and PML4Es. The SVM test is relying on this behavior, so enforce it.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a0c0feb5

KVM: mmio: cleanup kvm_set_mmio_spte_mask · d1431483

由 Tiejun Chen 提交于 9月 01, 2014

Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask()
for slightly better code.
Signed-off-by: NTiejun Chen <tiejun.chen@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1431483

kvm: x86: fix stale mmio cache bug · 56f17dd3

由 David Matlack 提交于 8月 18, 2014

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

Cc: stable@vger.kernel.org
Signed-off-by: NDavid Matlack <dmatlack@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56f17dd3

kvm: fix potentially corrupt mmio cache · ee3d1570

由 David Matlack 提交于 8月 18, 2014

vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
   <action based on cache occurs even       |
    though the caching decision was based   |
    on the old memslots>                    |
...                                         |
   <action *continues* to occur until next  |
    memslot generation change, which may    |
    be never>                               |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Cc: stable@vger.kernel.org
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee3d1570

KVM: do not bias the generation number in kvm_current_mmio_generation · 00f034a1

由 Paolo Bonzini 提交于 8月 20, 2014

The next patch will give a meaning (a la seqcount) to the low bit of the
generation number.  Ensure that it matches between kvm->memslots->generation
and kvm_current_mmio_generation().

Cc: stable@vger.kernel.org
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00f034a1

07 5月, 2014 1 次提交

KVM: x86: Mark bit 7 in long-mode PDPTE according to 1GB pages support · 5f7dde7b

由 Nadav Amit 提交于 5月 07, 2014

In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are
supported by the CPU. Currently the bit is considered by KVM as always
reserved.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f7dde7b

24 4月, 2014 4 次提交

KVM: MMU: flush tlb out of mmu lock when write-protect the sptes · 198c74f4

由 Xiao Guangrong 提交于 4月 17, 2014

Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
  just change the spte from writable to readonly so that we only need to care
  the case of changing spte from present to present (changing the spte from
  present to nonpresent will flush all the TLBs immediately), in other words,
  the only case we need to care is mmu_spte_update()

- in mmu_spte_update(), we haved checked
  SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
  means it does not depend on PT_WRITABLE_MASK anymore
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

198c74f4

KVM: MMU: flush tlb if the spte can be locklessly modified · 7f31c959

由 Xiao Guangrong 提交于 4月 17, 2014

Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)

This patch is used to avoid this kind of race:

      VCPU 0                         VCPU 1
lockless wirte protection:
      set spte.w = 0
                                 lock mmu-lock

                                 write protection the spte to sync shadow page,
                                 see spte.w = 0, then without flush tlb

				 unlock mmu-lock

                                 !!! At this point, the shadow page can still be
                                     writable due to the corrupt tlb entry
     Flush all TLB
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7f31c959

KVM: MMU: lazily drop large spte · c126d94f

由 Xiao Guangrong 提交于 4月 17, 2014

Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

This version has fixed the issue reported in 6b73a960, the reason of that
issue is that fast_page_fault() directly sets the readonly large spte to
writable but only dirty the first page into the dirty-bitmap that means
other pages are missed. Fixed it by only the normal sptes (on the
PT_PAGE_TABLE_LEVEL level) can be fast fixed
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c126d94f

KVM: MMU: properly check last spte in fast_page_fault() · 92a476cb

由 Xiao Guangrong 提交于 4月 17, 2014

Using sp->role.level instead of @level since @level is not got from the
page table hierarchy

There is no issue in current code since the fast page fault currently only
fixes the fault caused by dirty-log that is always on the last level
(level = 1)

This patch makes the code more readable and avoids potential issue in the
further development
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

92a476cb

17 4月, 2014 1 次提交

KVM: x86: Fix page-tables reserved bits · cd9ae5fe

由 Nadav Amit 提交于 4月 04, 2014

KVM does not handle the reserved bits of x86 page tables correctly:
In PAE, bits 5:8 are reserved in the PDPTE.
In IA-32e, bit 8 is not reserved.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cd9ae5fe

15 4月, 2014 2 次提交

KVM: Rename variable smep to cr4_smep · 66386ade

由 Feng Wu 提交于 4月 01, 2014

Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

66386ade

KVM: Add SMAP support when setting CR4 · 97ec8c06

由 Feng Wu 提交于 4月 01, 2014

This patch adds SMAP handling logic when setting CR4 for guests

Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

97ec8c06

27 2月, 2014 1 次提交

KVM: MMU: drop read-only large sptes when creating lower level sptes · 404381c5

由 Marcelo Tosatti 提交于 2月 24, 2014

Read-only large sptes can be created due to read-only faults as
follows:

- QEMU pagetable entry that maps guest memory is read-only
due to COW.
- Guest read faults such memory, COW is not broken, because
it is a read-only fault.
- Enable dirty logging, large spte not nuked because it is read-only.
- Write-fault on such memory causes guest to loop endlessly
(which must go down to level 1 because dirty logging is enabled).

Fix by dropping large spte when necessary.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

404381c5

30 1月, 2014 1 次提交

KVM: async_pf: Provide additional direct page notification · e0ead41a

由 Dominik Dingel 提交于 6月 06, 2013

By setting a Kconfig option, the architecture can control when
guest notifications will be presented by the apf backend.
There is the default batch mechanism, working as before, where the vcpu
thread should pull in this information.
Opposite to this, there is now the direct mechanism, that will push the
information to the guest.
This way s390 can use an already existing architecture interface.

Still the vcpu thread should call check_completion to cleanup leftovers.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

e0ead41a

15 1月, 2014 1 次提交

KVM: x86: handle invalid root_hpa everywhere · 37f6a4e2

由 Marcelo Tosatti 提交于 1月 03, 2014

Rom Freiman <rom@stratoscale.com> notes other code paths vulnerable to
bug fixed by 989c6b34.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

37f6a4e2

21 12月, 2013 1 次提交

KVM: MMU: handle invalid root_hpa at __direct_map · 989c6b34

由 Marcelo Tosatti 提交于 12月 19, 2013

It is possible for __direct_map to be called on invalid root_hpa
(-1), two examples:

1) try_async_pf -> can_do_async_pf
    -> vmx_interrupt_allowed -> nested_vmx_vmexit
2) vmx_handle_exit -> vmx_interrupt_allowed -> nested_vmx_vmexit

Then to load_vmcs12_host_state and kvm_mmu_reset_context.

Check for this possibility, let fault exception be regenerated.

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=924916Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

989c6b34

03 10月, 2013 4 次提交

KVM: mmu: change useless int return types to void · 8a3c1a33

由 Paolo Bonzini 提交于 10月 02, 2013

kvm_mmu initialization is mostly filling in function pointers, there is
no way for it to fail.  Clean up unused return values.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8a3c1a33

KVM: mmu: unify destroy_kvm_mmu with kvm_mmu_unload · 95f93af4

由 Paolo Bonzini 提交于 10月 02, 2013

They do the same thing, and destroy_kvm_mmu can be confused with
kvm_mmu_destroy.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

95f93af4

KVM: mmu: remove uninteresting MMU "new_cr3" callbacks · d8d173da

由 Paolo Bonzini 提交于 10月 02, 2013

The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit
e676505a (KVM: MMU: Force cr3 reload with two dimensional paging on mov
cr3 emulation, 2012-07-08).

The commit message mentioned that "mmu_free_roots() is somewhat of an overkill,
but fixing that is more complicated and will be done after this minimal fix".
One year has passed, and no one really felt the need to do a different fix.
Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the
callback.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

d8d173da

KVM: mmu: remove uninteresting MMU "free" callbacks · 20626094

由 Paolo Bonzini 提交于 10月 02, 2013

The free MMU callback has been a wrapper for mmu_free_roots since mmu_free_roots
itself was introduced (commit 17ac10ad, [PATCH] KVM: MU: Special treatment
for shadow pae root pages, 2007-01-05), and has always been the same for all
MMU cases. Remove the indirection as it is useless.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

20626094

30 9月, 2013 1 次提交

KVM: Convert kvm_lock back to non-raw spinlock · 2f303b74

由 Paolo Bonzini 提交于 9月 25, 2013

In commit e935b837 ("KVM: Convert kvm_lock to raw_spinlock"),
the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
function tries to grab the (non-raw) mmu_lock within the scope of
the raw locked kvm_lock being held.  This leads to the following:

BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]

Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
Call Trace:
 [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
 [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
 [<ffffffff811185bf>] kswapd+0x18f/0x490
 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
 [<ffffffff81060d2b>] kthread+0xdb/0xe0
 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
 [<ffffffff81060c50>] ? __init_kthread_worker+0x

After the previous patch, kvm_lock need not be a raw spinlock anymore,
so change it back.
Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Cc: kvm@vger.kernel.org
Cc: gleb@redhat.com
Cc: jan.kiszka@siemens.com
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f303b74

11 9月, 2013 1 次提交

shrinker: convert remaining shrinkers to count/scan API · 70534a73

由 Dave Chinner 提交于 8月 28, 2013

Convert the remaining couple of random shrinkers in the tree to the new
API.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NGlauber Costa <glommer@openvz.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

70534a73

29 8月, 2013 1 次提交

KVM: MMU: remove unused parameter · e5552fd2

由 Xiao Guangrong 提交于 7月 30, 2013

vcpu in page_fault_can_be_fast() is not used so remove it
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e5552fd2

07 8月, 2013 7 次提交

nEPT: Nested INVEPT · bfd0a56b

由 Nadav Har'El 提交于 8月 05, 2013

If we let L1 use EPT, we should probably also support the INVEPT instruction.

In our current nested EPT implementation, when L1 changes its EPT table
for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
the course of this modification already calls INVEPT. But if last level
of shadow page is unsync not all L1's changes to EPT12 are intercepted,
which means roots need to be synced when L1 calls INVEPT. Global INVEPT
should not be different since roots are synced by kvm_mmu_load() each
time EPTP02 changes.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NJun Nakajima <jun.nakajima@intel.com>
Signed-off-by: NXinhao Xu <xinhao.xu@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bfd0a56b

nEPT: MMU context for nested EPT · 155a97a3

由 Nadav Har'El 提交于 8月 05, 2013

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NJun Nakajima <jun.nakajima@intel.com>
Signed-off-by: NXinhao Xu <xinhao.xu@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

155a97a3

nEPT: Add nEPT violation/misconfigration support · 25d92081

由 Yang Zhang 提交于 8月 06, 2013

Inject nEPT fault to L1 guest. This patch is original from Xinhao.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NJun Nakajima <jun.nakajima@intel.com>
Signed-off-by: NXinhao Xu <xinhao.xu@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25d92081

nEPT: correctly check if remote tlb flush is needed for shadowed EPT tables · 53166229

由 Gleb Natapov 提交于 8月 05, 2013

need_remote_flush() assumes that shadow page is in PT64 format, but
with addition of nested EPT this is no longer always true. Fix it by
bits definitions that depend on host shadow page type.
Reported-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53166229

nEPT: Redefine EPT-specific link_shadow_page() · 7a1638ce

由 Yang Zhang 提交于 8月 05, 2013

Since nEPT doesn't support A/D bit, so we should not set those bit
when build shadow page table.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a1638ce

nEPT: Add EPT tables support to paging_tmpl.h · 37406aaa

由 Nadav Har'El 提交于 8月 05, 2013

This is the first patch in a series which adds nested EPT support to KVM's
nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use
EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest
to set its own cr3 and take its own page faults without either of L0 or L1
getting involved. This often significanlty improves L2's performance over the
previous two alternatives (shadow page tables over EPT, and shadow page
tables over shadow page tables).

This patch adds EPT support to paging_tmpl.h.

paging_tmpl.h contains the code for reading and writing page tables. The code
for 32-bit and 64-bit tables is very similar, but not identical, so
paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once
with PTTYPE=64, and this generates the two sets of similar functions.

There are subtle but important differences between the format of EPT tables
and that of ordinary x86 64-bit page tables, so for nested EPT we need a
third set of functions to read the guest EPT table and to write the shadow
EPT table.

So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed
with "EPT") which correctly read and write EPT tables.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NJun Nakajima <jun.nakajima@intel.com>
Signed-off-by: NXinhao Xu <xinhao.xu@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37406aaa

nEPT: Move common code to paging_tmpl.h · 0ad805a0

由 Nadav Har'El 提交于 8月 05, 2013

For preparation, we just move gpte_access(), prefetch_invalid_gpte(),
s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c
to paging_tmpl.h.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NJun Nakajima <jun.nakajima@intel.com>
Signed-off-by: NXinhao Xu <xinhao.xu@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NJun Nakajima <jun.nakajima@intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0ad805a0

29 7月, 2013 1 次提交

KVM: x86: rename EMULATE_DO_MMIO · ac0a48c3

由 Paolo Bonzini 提交于 6月 25, 2013

The next patch will reuse it for other userspace exits than MMIO,
namely debug events.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ac0a48c3

18 7月, 2013 2 次提交

KVM: x86: Avoid zapping mmio sptes twice for generation wraparound · e6dff7d1

由 Takuya Yoshikawa 提交于 7月 04, 2013

Now that kvm_arch_memslots_updated() catches every increment of the
memslots->generation, checking if the mmio generation has reached its
maximum value is enough.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e6dff7d1

KVM: MMU: avoid fast page fault fixing mmio page fault · 1c118b82

由 Xiao Guangrong 提交于 7月 18, 2013

Currently, fast page fault incorrectly tries to fix mmio page fault when
the generation number is invalid (spte.gen != kvm.gen). It then returns
to guest to retry the fault since it sees the last spte is nonpresent.
This causes an infinite loop.

Since fast page fault only works for direct mmu, the issue exists when
1) tdp is enabled. It is only triggered only on AMD host since on Intel host
the mmio page fault is recognized as ept-misconfig whose handler call
fault-page path with error_code = 0

2) guest paging is disabled. Under this case, the issue is hardly discovered
since paging disable is short-lived and the sptes will be invalid after
memslot changed for 150 times

Fix it by filtering out MMIO page faults in page_fault_can_be_fast.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1c118b82

27 6月, 2013 5 次提交

KVM: MMU: Inform users of mmio generation wraparound · 7a2e8aaf

由 Takuya Yoshikawa 提交于 6月 21, 2013

Without this information, users will just see unexpected performance
problems and there is little chance we will get good reports from them:
note that mmio generation is increased even when we just start, or stop,
dirty logging for some memory slot, in which case users cannot expect
all shadow pages to be zapped.

printk_ratelimited() is used for this taking into account the problems
that we can see the information many times when we start multiple VMs
and guests can trigger this by reading ROM in a loop for example.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a2e8aaf

KVM: MMU: document clear_spte_count · accaefe0

由 Xiao Guangrong 提交于 6月 19, 2013

Document it to Documentation/virtual/kvm/mmu.txt
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

accaefe0

KVM: MMU: drop kvm_mmu_zap_mmio_sptes · a8eca9dc

由 Xiao Guangrong 提交于 6月 10, 2013

Drop kvm_mmu_zap_mmio_sptes and use kvm_mmu_invalidate_zap_all_pages
instead to handle mmio generation number overflow
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8eca9dc

KVM: MMU: init kvm generation close to mmio wrap-around value · 69c9ea93

由 Xiao Guangrong 提交于 6月 07, 2013

Then it has the chance to trigger mmio generation number wrap-around
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
[Change from MMIO_MAX_GEN - 13 to MMIO_MAX_GEN - 150, because 13 is
 very close to the number of calls to KVM_SET_USER_MEMORY_REGION
 before the guest is started and there is any chance to create any
 spte. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

69c9ea93

KVM: MMU: add tracepoint for check_mmio_spte · 089504c0

由 Xiao Guangrong 提交于 6月 07, 2013

It is useful for debug mmio spte invalidation
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

089504c0

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致