提交 · 6fd01b711bee96ce3356f7b6f370ab708e37504b · openanolis / cloud-kernel

20 9月, 2012 7 次提交

KVM: MMU: Optimize is_last_gpte() · 6fd01b71

由 Avi Kivity 提交于 9月 12, 2012

Instead of branchy code depending on level, gpte.ps, and mmu configuration,
prepare everything in a bitmap during mode changes and look it up during
runtime.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6fd01b71

KVM: MMU: Simplify walk_addr_generic() loop · 13d22b6a

由 Avi Kivity 提交于 9月 12, 2012

The page table walk is coded as an infinite loop, with a special
case on the last pte.

Code it as an ordinary loop with a termination condition on the last
pte (large page or walk length exhausted), and put the last pte handling
code after the loop where it belongs.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

13d22b6a

KVM: MMU: Optimize pte permission checks · 97d64b78

由 Avi Kivity 提交于 9月 12, 2012

walk_addr_generic() permission checks are a maze of branchy code, which is
performed four times per lookup.  It depends on the type of access, efer.nxe,
cr0.wp, cr4.smep, and in the near future, cr4.smap.

Optimize this away by precalculating all variants and storing them in a
bitmap.  The bitmap is recalculated when rarely-changing variables change
(cr0, cr4) and is indexed by the often-changing variables (page fault error
code, pte access permissions).

The permission check is moved to the end of the loop, otherwise an SMEP
fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
Noted by Xiao Guangrong.

The result is short, branch-free code.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

97d64b78

KVM: MMU: Update accessed and dirty bits after guest pagetable walk · 8cbc7069

由 Avi Kivity 提交于 9月 16, 2012

While unspecified, the behaviour of Intel processors is to first
perform the page table walk, then, if the walk was successful, to
atomically update the accessed and dirty bits of walked paging elements.

While we are not required to follow this exactly, doing so will allow us
to perform the access permissions check after the walk is complete, rather
than after each walk step.

(the tricky case is SMEP: a zero in any pte's U bit makes the referenced
page a supervisor page, so we can't fault on a one bit during the walk
itself).
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8cbc7069

KVM: MMU: Move gpte_access() out of paging_tmpl.h · 3d34adec

由 Avi Kivity 提交于 9月 12, 2012

We no longer rely on paging_tmpl.h defines; so we can move the function
to mmu.c.

Rely on zero extension to 64 bits to get the correct nx behaviour.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3d34adec

KVM: MMU: Optimize gpte_access() slightly · edc2ae84

由 Avi Kivity 提交于 9月 12, 2012

If nx is disabled, then is gpte[63] is set we will hit a reserved
bit set fault before checking permissions; so we can ignore the
setting of efer.nxe.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

edc2ae84

KVM: MMU: Push clean gpte write protection out of gpte_access() · 8ea667f2

由 Avi Kivity 提交于 9月 12, 2012

gpte_access() computes the access permissions of a guest pte and also
write-protects clean gptes.  This is wrong when we are servicing a
write fault (since we'll be setting the dirty bit momentarily) but
correct when instantiating a speculative spte, or when servicing a
read fault (since we'll want to trap a following write in order to
set the dirty bit).

It doesn't seem to hurt in practice, but in order to make the code
readable, push the write protection out of gpte_access() and into
a new protect_clean_gpte() which is called explicitly when needed.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8ea667f2

06 8月, 2012 1 次提交

KVM: do not release the error pfn · cb9aaa30

由 Xiao Guangrong 提交于 8月 03, 2012

After commit a2766325, the error pfn is replaced by the
error code, it need not be released anymore

[ The patch has been compiling tested for powerpc ]
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cb9aaa30

11 7月, 2012 1 次提交

KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint · 6fbc2770

由 Xiao Guangrong 提交于 6月 20, 2012

The P bit of page fault error code is missed in this tracepoint, fix it by
passing the full error code
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6fbc2770

19 4月, 2012 1 次提交

KVM: MMU: use page table level macro · f71fa31f

由 Davidlohr Bueso 提交于 4月 18, 2012

Its much cleaner to use PT_PAGE_TABLE_LEVEL than its numeric value.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f71fa31f

20 3月, 2012 1 次提交

x86: remove the second argument of k[un]map_atomic() · 8fd75e12

由 Cong Wang 提交于 11月 25, 2011

Acked-by: NAvi Kivity <avi@redhat.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NCong Wang <amwang@redhat.com>

8fd75e12

27 12月, 2011 4 次提交

KVM: MMU: audit: replace mmu audit tracepoint with jump-label · 0375f7fa

由 Xiao Guangrong 提交于 11月 28, 2011

The tracepoint is only used to audit mmu code, it should not be exposed to
user, let us replace it with jump-label.
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0375f7fa

KVM: MMU: improve write flooding detected · a30f47cb

由 Xiao Guangrong 提交于 9月 22, 2011

Detecting write-flooding does not work well, when we handle page written, if
the last speculative spte is not accessed, we treat the page is
write-flooding, however, we can speculative spte on many path, such as pte
prefetch, page synced, that means the last speculative spte may be not point
to the written page and the written page can be accessed via other sptes, so
depends on the Accessed bit of the last speculative spte is not enough

Instead of detected page accessed, we can detect whether the spte is accessed
after it is written, if the spte is not accessed but it is written frequently,
we treat is not a page table or it not used for a long time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a30f47cb

KVM: MMU: fast prefetch spte on invlpg path · f57f2ef5

由 Xiao Guangrong 提交于 9月 22, 2011

Fast prefetch spte for the unsync shadow page on invlpg path
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f57f2ef5

KVM: MMU: cleanup FNAME(invlpg) · 505aef8f

由 Xiao Guangrong 提交于 9月 22, 2011

Directly Use mmu_page_zap_pte to zap spte in FNAME(invlpg), also remove the
same code between FNAME(invlpg) and FNAME(sync_page)
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

505aef8f

26 9月, 2011 2 次提交

KVM: MMU: Fix SMEP failure during fetch · cd46868c

由 Yang, Wei Y 提交于 8月 09, 2011

This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK
bit set in the case of SMEP fault.  The code updated 'eperm' after
the variable was checked.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cd46868c

KVM: MMU: Do not unconditionally read PDPTE from guest memory · e4e517b4

由 Avi Kivity 提交于 7月 28, 2011

Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
On SVM, it is not possible to implement this, but on VMX this is possible
and was indeed implemented until nested SVM changed this to unconditionally
read PDPTEs dynamically.  This has noticable impact when running PAE guests.

Fix by changing the MMU to read PDPTRs from the cache, falling back to
reading from memory for the nested MMU.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e4e517b4

24 7月, 2011 7 次提交

KVM: MMU: mmio page fault support · ce88decf

由 Xiao Guangrong 提交于 7月 12, 2011

The idea is from Avi:

| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

When the page fault is caused by mmio, we cache the info in the shadow page
table, and also set the reserved bits in the shadow page table, so if the mmio
is caused again, we can quickly identify it and emulate it directly

Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it
can be reduced by this feature, and also avoid walking guest page table for
soft mmu.

[jan: fix operator precedence issue]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ce88decf

KVM: MMU: abstract some functions to handle fault pfn · d7c55201

由 Xiao Guangrong 提交于 7月 12, 2011

Introduce handle_abnormal_pfn to handle fault pfn on page fault path,
introduce mmu_invalid_pfn to handle fault pfn on prefetch path

It is the preparing work for mmio page fault support
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d7c55201

KVM: MMU: remove bypass_guest_pf · c3707958

由 Xiao Guangrong 提交于 7月 12, 2011

The idea is from Avi:
| Maybe it's time to kill off bypass_guest_pf=1.  It's not as effective as
| it used to be, since unsync pages always use shadow_trap_nonpresent_pte,
| and since we convert between the two nonpresent_ptes during sync and unsync.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c3707958

KVM: MMU: rename 'pt_write' to 'emulate' · b90a0e6c

由 Xiao Guangrong 提交于 7月 12, 2011

If 'pt_write' is true, we need to emulate the fault. And in later patch, we
need to emulate the fault even though it is not a pt_write event, so rename
it to better fit the meaning
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b90a0e6c

KVM: MMU: cleanup for FNAME(fetch) · b36c7a7c

由 Xiao Guangrong 提交于 7月 12, 2011

gw->pte_access is the final access permission, since it is unified with
gw->pt_access when we walked guest page table:

FNAME(walk_addr_generic):
	pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true);
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b36c7a7c

KVM: MMU: optimize to handle dirty bit · 640d9b0d

由 Xiao Guangrong 提交于 7月 12, 2011

If dirty bit is not set, we can make the pte access read-only to avoid handing
dirty bit everywhere
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

640d9b0d

KVM: MMU: cache mmio info on page fault path · bebb106a

由 Xiao Guangrong 提交于 7月 12, 2011

If the page fault is caused by mmio, we can cache the mmio info, later, we do
not need to walk guest page table and quickly know it is a mmio fault while we
emulate the mmio instruction
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bebb106a

12 7月, 2011 4 次提交

KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic() · 3c8c652a

由 Takuya Yoshikawa 提交于 7月 01, 2011

Suggested by Ingo and Avi.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3c8c652a

KVM: MMU: Rename the walk label in walk_addr_generic() · 92c1c1e8

由 Takuya Yoshikawa 提交于 7月 01, 2011

The current name does not explain the meaning well.  So give it a better
name "retry_walk" to show that we are trying the walk again.

This was suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

92c1c1e8

KVM: MMU: Clean up the error handling of walk_addr_generic() · 134291bf

由 Takuya Yoshikawa 提交于 7月 01, 2011

Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

134291bf

KVM: Add instruction fetch checking when walking guest page table · e57d4a35

由 Yang, Wei Y 提交于 6月 03, 2011

This patch adds instruction fetch checking when walking guest page table,
to implement SMEP when emulating instead of executing natively.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NShan, Haitao <haitao.shan@intel.com>
Signed-off-by: NLi, Xin <xin.li@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e57d4a35

20 6月, 2011 1 次提交

KVM: MMU: Fix build warnings in walk_addr_generic() · b7233635

由 Borislav Petkov 提交于 5月 30, 2011

On 3.0-rc1 I get

In file included from arch/x86/kvm/mmu.c:2856:
arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
In file included from arch/x86/kvm/mmu.c:2852:
arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function

caused by 6e2ca7d1. According to Takuya
Yoshikawa, ptep_user won't be used uninitialized so shut up gcc.

Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Link: http://lkml.kernel.org/r/20110530094604.GC21833@liondog.tnicSigned-off-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b7233635

22 5月, 2011 7 次提交

KVM: MMU: Use ptep_user for cmpxchg_gpte() · c8cfbb55

由 Takuya Yoshikawa 提交于 5月 01, 2011

The address of the gpte was already calculated and stored in ptep_user
before entering cmpxchg_gpte().

This patch makes cmpxchg_gpte() to use that to make it clear that we
are using the same address during walk_addr_generic().

Note that the unlikely annotations are used to show that the conditions
are something unusual rather than for performance.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c8cfbb55

KVM: Validate userspace_addr of memslot when registered · fa3d315a

由 Takuya Yoshikawa 提交于 5月 07, 2011

This way, we can avoid checking the user space address many times when
we read the guest memory.

Although we can do the same for write if we check which slots are
writable, we do not care write now: reading the guest memory happens
more often than writing.

[avi: change VERIFY_READ to VERIFY_WRITE]
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fa3d315a

KVM: MMU: Clean up gpte reading with copy_from_user() · 12cb814f

由 Takuya Yoshikawa 提交于 5月 07, 2011

When we optimized walk_addr_generic() by not using the generic guest
memory reader, we replaced copy_from_user() with get_user():

  commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
  KVM: MMU: Optimize guest page table walk

  commit 15e2ac9a43d4d7d08088e404fddf2533a8e7d52e
  KVM: MMU: Fix 64-bit paging breakage on x86_32

But as Andi pointed out later, copy_from_user() does the same as
get_user() as long as we give a constant size to it.

So we use copy_from_user() to clean up the code.

The only, noticeable, regression introduced by this is 64-bit gpte
reading on x86_32 hosts needed for PAE guests.

But this can be mitigated by implementing 8-byte get_user() for x86_32,
if needed.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

12cb814f

KVM: MMU: Fix 64-bit paging breakage on x86_32 · 8f74d8e1

由 Takuya Yoshikawa 提交于 4月 28, 2011

Fix regression introduced by
  commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
  KVM: MMU: Optimize guest page table walk

On x86_32, get_user() does not support 64-bit values and we fail to
build KVM at the point of 64-bit paging.

This patch fixes this by using get_user() twice for that condition.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Reported-by: NJan Kiszka <jan.kiszka@web.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8f74d8e1

KVM: MMU: Add unlikely() annotations to walk_addr_generic() · 781e0743

由 Avi Kivity 提交于 4月 24, 2011

walk_addr_generic() is a hot path and is also hard for the cpu to predict -
some of the parameters (fetch_fault in particular) vary wildly from
invocation to invocation.

Add unlikely() annotations where appropriate; all walk failures are
considered unlikely, as are cases where we have to mark the accessed or
dirty bit, as they are slow paths both in kvm and on real processors.
Signed-off-by: NAvi Kivity <avi@redhat.com>

781e0743

KVM: MMU: Optimize guest page table walk · 6e2ca7d1

由 Takuya Yoshikawa 提交于 4月 22, 2011

This patch optimizes the guest page table walk by using get_user()
instead of copy_from_user().

With this patch applied, paging64_walk_addr_generic() has become
about 0.5us to 1.0us faster on my Phenom II machine with NPT on.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6e2ca7d1

KVM: MMU: Make cmpxchg_gpte aware of nesting too · a78484c6

由 Roedel, Joerg 提交于 4月 20, 2011

This patch makes the cmpxchg_gpte() function aware of the
difference between l1-gfns and l2-gfns when nested
virtualization is in use.  This fixes a potential
data-corruption problem in the l1-guest and makes the code
work correct (at least as correct as the hardware which is
emulated in this code) again.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a78484c6

11 5月, 2011 1 次提交

KVM: MMU: remove mmu_seq verification on pte update path · 7c562522

由 Xiao Guangrong 提交于 3月 28, 2011

The mmu_seq verification can be removed since we get the pfn in the
protection of mmu_lock.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7c562522

18 3月, 2011 3 次提交

x86: Fix common misspellings · 0d2eb44f

由 Lucas De Marchi 提交于 3月 17, 2011

They were generated by 'codespell' and then manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0d2eb44f

KVM: MMU: cleanup pte write path · 0f53b5b1

由 Xiao Guangrong 提交于 3月 09, 2011

This patch does:
- call vcpu->arch.mmu.update_pte directly
- use gfn_to_pfn_atomic in update_pte path

The suggestion is from Avi.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0f53b5b1

KVM: MMU: remove unused macros · 676646ee

由 Xiao Guangrong 提交于 3月 04, 2011

These macros are not used, so removed
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

676646ee

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功