提交 · d13bc5b5a1f9eafd59331baa1d1d32e1867f57b5 · openanolis / cloud-kernel

11 7月, 2012 2 次提交

KVM: MMU: abstract spte write-protect · d13bc5b5

由 Xiao Guangrong 提交于 6月 20, 2012

Introduce a common function to abstract spte write-protect to
cleanup the code
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d13bc5b5

KVM: MMU: return bool in __rmap_write_protect · 2f84569f

由 Xiao Guangrong 提交于 6月 20, 2012

The reture value of __rmap_write_protect is either 1 or 0, use
true/false instead of these
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2f84569f

09 7月, 2012 1 次提交

KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation · e676505a

由 Avi Kivity 提交于 7月 08, 2012

Currently the MMU's ->new_cr3() callback does nothing when guest paging
is disabled or when two-dimentional paging (e.g. EPT on Intel) is active.
This means that an emulated write to cr3 can be lost; kvm_set_cr3() will
write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its
old value and this is what the guest sees.

This bug did not have any effect until now because:
- with unrestricted guest, or with svm, we never emulate a mov cr3 instruction
- without unrestricted guest, and with paging enabled, we also never emulate a
  mov cr3 instruction
- without unrestricted guest, but with paging disabled, the guest's cr3 is
  ignored until the guest enables paging; at this point the value from arch.cr3
  is loaded correctly my the mov cr0 instruction which turns on paging

However, the patchset that enables big real mode causes us to emulate mov cr3
instructions in protected mode sometimes (when guest state is not virtualizable
by vmx); this mov cr3 is effectively ignored and will crash the guest.

The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3
reload.  This is awkward because now all the new_cr3 callbacks to the same
thing, and because mmu_free_roots() is somewhat of an overkill; but fixing
that is more complicated and will be done after this minimal fix.

Observed in the Window XP 32-bit installer while bringing up secondary vcpus.
Signed-off-by: NAvi Kivity <avi@redhat.com>

e676505a

14 6月, 2012 1 次提交

KVM: x86: change PT_FIRST_AVAIL_BITS_SHIFT to avoid conflict with EPT Dirty bit · 00763e41

由 Xudong Hao 提交于 6月 07, 2012

EPT Dirty bit use bit 9 as Intel SDM definition, to avoid conflict, change
PT_FIRST_AVAIL_BITS_SHIFT to 10.
Signed-off-by: NXudong Hao <xudong.hao@intel.com>
Signed-off-by: NXiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

00763e41

12 6月, 2012 1 次提交

KVM: MMU: Remove unused parameter from mmu_memory_cache_alloc() · 80feb89a

由 Takuya Yoshikawa 提交于 5月 29, 2012

Size is not needed to return one from pre-allocated objects.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

80feb89a

06 6月, 2012 1 次提交

KVM: disable uninitialized var warning · 79f702a6

由 Michael S. Tsirkin 提交于 6月 03, 2012

I see this in 3.5-rc1:

arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’:
arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function

The line in question was introduced by commit
1e3f42f0

 static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
                              unsigned long data)
 {
-       u64 *spte;
+       u64 *sptep;
+       struct rmap_iterator iter;   <- line 1271
        int young = 0;

        /*

The reason I think is that the compiler assumes that
the rmap value could be 0, so

static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator
*iter)
{
        if (!rmap)
                return NULL;

        if (!(rmap & 1)) {
                iter->desc = NULL;
                return (u64 *)rmap;
        }

        iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
        iter->pos = 0;
        return iter->desc->sptes[iter->pos];
}

will not initialize iter.desc, but the compiler isn't
smart enough to see that

        for (sptep = rmap_get_first(*rmapp, &iter); sptep;
             sptep = rmap_get_next(&iter)) {

will immediately exit in this case.
I checked by adding
        if (!*rmapp)
                goto out;
on top which is clearly equivalent but disables the warning.

This patch uses uninitialized_var to disable the warning without
increasing code size.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

79f702a6

05 6月, 2012 2 次提交

KVM: MMU: do not iterate over all VMs in mmu_shrink() · 19526396

由 Gleb Natapov 提交于 6月 04, 2012

mmu_shrink() needlessly iterates over all VMs even though it will not
attempt to free mmu pages from more than one on them. Fix that and also
check used mmu pages count outside of VM lock to skip inactive VMs faster.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

19526396

KVM: VMX: Use EPT Access bit in response to memory notifiers · 3f6d8c8a

由 Xudong Hao 提交于 5月 22, 2012

Signed-off-by: NHaitao Shan <haitao.shan@intel.com>
Signed-off-by: NXudong Hao <xudong.hao@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3f6d8c8a

28 5月, 2012 1 次提交

KVM: MMU: fix huge page adapted on non-PAE host · c3586667

由 Xiao Guangrong 提交于 5月 28, 2012

The huge page size is 4M on non-PAE host, but 2M page size is used in
transparent_hugepage_adjust(), so the page we get after adjust the
mapping level is not the head page, the BUG_ON() will be triggered
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c3586667

17 5月, 2012 1 次提交

KVM: MMU: Don't use RCU for lockless shadow walking · c142786c

由 Avi Kivity 提交于 5月 14, 2012

Using RCU for lockless shadow walking can increase the amount of memory
in use by the system, since RCU grace periods are unpredictable.  We also
have an unconditional write to a shared variable (reader_counter), which
isn't good for scaling.

Replace that with a scheme similar to x86's get_user_pages_fast(): disable
interrupts during lockless shadow walk to force the freer
(kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
processor with interrupts enabled.

We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
kvm_flush_remote_tlbs() from avoiding the IPI.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c142786c

19 4月, 2012 1 次提交

KVM: MMU: use page table level macro · f71fa31f

由 Davidlohr Bueso 提交于 4月 18, 2012

Its much cleaner to use PT_PAGE_TABLE_LEVEL than its numeric value.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f71fa31f

08 4月, 2012 4 次提交

KVM: MMU: Improve iteration through sptes from rmap · 1e3f42f0

由 Takuya Yoshikawa 提交于 3月 21, 2012

Iteration using rmap_next(), the actual body is pte_list_next(), is
inefficient: every time we call it we start from checking whether rmap
holds a single spte or points to a descriptor which links more sptes.

In the case of shadow paging, this quadratic total iteration cost is a
problem.  Even for two dimensional paging, with EPT/NPT on, in which we
almost always have a single mapping, the extra checks at the end of the
iteration should be eliminated.

This patch fixes this by introducing rmap_iterator which keeps the
iteration context for the next search.  Furthermore the implementation
of rmap_next() is splitted into two functions, rmap_get_first() and
rmap_get_next(), to avoid repeatedly checking whether the rmap being
iterated on has only one spte.

Although there seemed to be only a slight change for EPT/NPT, the actual
improvement was significant: we observed that GET_DIRTY_LOG for 1GB
dirty memory became 15% faster than before.  This is probably because
the new code is easy to make branch predictions.

Note: we just remove pte_list_next() because we can think of parent_ptes
as a reverse mapping.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1e3f42f0

KVM: MMU: Make pte_list_desc fit cache lines well · 220f773a

由 Takuya Yoshikawa 提交于 3月 21, 2012

We have PTE_LIST_EXT + 1 pointers in this structure and these 40/20
bytes do not fit cache lines well. Furthermore, some allocators may
use 64/32-byte objects for the pte_list_desc cache.

This patch solves this problem by changing PTE_LIST_EXT from 4 to 3.

For shadow paging, the new size is still large enough to hold both the
kernel and process mappings for usual anonymous pages. For file
mappings, there may be a slight change in the cache usage.

Note: with EPT/NPT we almost always have a single spte in each reverse
mapping and we will not see any change by this.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

220f773a

KVM: Avoid checking huge page mappings in get_dirty_log() · 5dc99b23

由 Takuya Yoshikawa 提交于 3月 01, 2012

Dropped such mappings when we enabled dirty logging and we will never
create new ones until we stop the logging.

For this we introduce a new function which can be used to write protect
a range of PT level pages: although we do not need to care about a range
of pages at this point, the following patch will need this feature to
optimize the write protection of many pages.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5dc99b23

KVM: MMU: Split the main body of rmap_write_protect() off from others · a0ed4607

由 Takuya Yoshikawa 提交于 3月 01, 2012

We will use this in the following patch to implement another function
which needs to write protect pages using the rmap information.

Note that there is a small change in debug printing for large pages:
we do not differentiate them from others to avoid duplicating code.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a0ed4607

08 3月, 2012 3 次提交

KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask · 4d6931c3

由 Davidlohr Bueso 提交于 3月 05, 2012

The reset_rsvds_bits_mask() function can use the guest walker's root level
number instead of using a separate 'level' variable.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4d6931c3

KVM: Introduce kvm_memory_slot::arch and move lpage_info into it · db3fe4eb

由 Takuya Yoshikawa 提交于 2月 08, 2012

Some members of kvm_memory_slot are not used by every architecture.

This patch is the first step to make this difference clear by
introducing kvm_memory_slot::arch;  lpage_info is moved into it.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

db3fe4eb

KVM: Introduce gfn_to_index() which returns the index for a given level · fb03cb6f

由 Takuya Yoshikawa 提交于 2月 08, 2012

This patch cleans up the code and removes the "(void)level;" warning
suppressor.

Note that we can also use this for PT_PAGE_TABLE_LEVEL to treat every
level uniformly later.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fb03cb6f

05 3月, 2012 6 次提交

KVM: MMU: Remove unused kvm parameter from rmap_next() · e4b35cc9

由 Takuya Yoshikawa 提交于 1月 17, 2012

Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e4b35cc9

KVM: MMU: Remove unused kvm parameter from __gfn_to_rmap() · 9373e2c0

由 Takuya Yoshikawa 提交于 1月 17, 2012

Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9373e2c0

KVM: MMU: unnecessary NX state assignment · 4a58ae61

由 Davidlohr Bueso 提交于 1月 06, 2012

We can remove the first ->nx state assignment since it is assigned afterwards anyways.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4a58ae61

KVM: MMU: remove the redundant get_written_sptes · a138fe75

由 Xiao Guangrong 提交于 12月 16, 2011

get_written_sptes is called twice in kvm_mmu_pte_write, one of them can be
removed
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a138fe75

T
KVM: MMU: Add missing large page accounting to drop_large_spte() · 6addd1aa
由 Takuya Yoshikawa 提交于 11月 29, 2011
```
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
6addd1aa

KVM: MMU: Remove for_each_unsync_children() macro · 37178b8b

由 Takuya Yoshikawa 提交于 11月 29, 2011

There is only one user of it and for_each_set_bit() does the same.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

37178b8b

13 1月, 2012 1 次提交

module_param: make bool parameters really bool (arch) · 476bc001

由 Rusty Russell 提交于 1月 13, 2012

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

476bc001

27 12月, 2011 15 次提交

KVM: MMU: Drop unused return value of kvm_mmu_remove_some_alloc_mmu_pages · 3d56cbdf

由 Jan Kiszka 提交于 12月 02, 2011

freed_pages is never evaluated, so remove it as well as the return code
kvm_mmu_remove_some_alloc_mmu_pages so far delivered to its only user.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3d56cbdf

KVM: MMU: audit: inline audit function · e37fa785

由 Xiao Guangrong 提交于 11月 30, 2011

inline audit function and little cleanup
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e37fa785

KVM: MMU: remove oos_shadow parameter · d750ea28

由 Xiao Guangrong 提交于 11月 28, 2011

The unsync code should be stable now, maybe it is the time to remove this
parameter to cleanup the code a little bit
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d750ea28

KVM: MMU: move the relevant mmu code to mmu.c · e459e322

由 Xiao Guangrong 提交于 11月 28, 2011

Move the mmu code in kvm_arch_vcpu_init() to kvm_mmu_create()
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e459e322

KVM: MMU: audit: replace mmu audit tracepoint with jump-label · 0375f7fa

由 Xiao Guangrong 提交于 11月 28, 2011

The tracepoint is only used to audit mmu code, it should not be exposed to
user, let us replace it with jump-label.
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0375f7fa

KVM: introduce kvm_for_each_memslot macro · be6ba0f0

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce kvm_for_each_memslot to walk all valid memslot
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

be6ba0f0

KVM: introduce KVM_MEM_SLOTS_NUM macro · 93a5cef0

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce KVM_MEM_SLOTS_NUM macro to instead of
KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93a5cef0

KVM: Optimize dirty logging by rmap_write_protect() · 95d4c16c

由 Takuya Yoshikawa 提交于 11月 14, 2011

Currently, write protecting a slot needs to walk all the shadow pages
and checks ones which have a pte mapping a page in it.

The walk is overly heavy when dirty pages in that slot are not so many
and checking the shadow pages would result in unwanted cache pollution.

To mitigate this problem, we use rmap_write_protect() and check only
the sptes which can be reached from gfns marked in the dirty bitmap
when the number of dirty pages are less than that of shadow pages.

This criterion is reasonable in its meaning and worked well in our test:
write protection became some times faster than before when the ratio of
dirty pages are low and was not worse even when the ratio was near the
criterion.

Note that the locking for this write protection becomes fine grained.
The reason why this is safe is descripted in the comments.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

95d4c16c

KVM: MMU: Split gfn_to_rmap() into two functions · 9b9b1492

由 Takuya Yoshikawa 提交于 11月 14, 2011

rmap_write_protect() calls gfn_to_rmap() for each level with gfn fixed.
This results in calling gfn_to_memslot() repeatedly with that gfn.

This patch introduces __gfn_to_rmap() which takes the slot as an
argument to avoid this.

This is also needed for the following dirty logging optimization.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9b9b1492

KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect() · d6eebf8b

由 Takuya Yoshikawa 提交于 11月 14, 2011

Remove redundant checks and use is_large_pte() macro.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d6eebf8b

KVM: MMU: remove KVM host pv mmu support · fb920458

由 Chris Wright 提交于 11月 01, 2011

The host side pv mmu support has been marked for feature removal in
January 2011.  It's not in use, is slower than shadow or hardware
assisted paging, and a maintenance burden.  It's November 2011, time to
remove it.
Signed-off-by: NChris Wright <chrisw@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fb920458

KVM: MMU: improve write flooding detected · a30f47cb

由 Xiao Guangrong 提交于 9月 22, 2011

Detecting write-flooding does not work well, when we handle page written, if
the last speculative spte is not accessed, we treat the page is
write-flooding, however, we can speculative spte on many path, such as pte
prefetch, page synced, that means the last speculative spte may be not point
to the written page and the written page can be accessed via other sptes, so
depends on the Accessed bit of the last speculative spte is not enough

Instead of detected page accessed, we can detect whether the spte is accessed
after it is written, if the spte is not accessed but it is written frequently,
we treat is not a page table or it not used for a long time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a30f47cb

KVM: MMU: fix detecting misaligned accessed · 5d9ca30e

由 Xiao Guangrong 提交于 9月 22, 2011

Sometimes, we only modify the last one byte of a pte to update status bit,
for example, clear_bit is used to clear r/w bit in linux kernel and 'andb'
instruction is used in this function, in this case, kvm_mmu_pte_write will
treat it as misaligned access, and the shadow page table is zapped
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5d9ca30e

KVM: MMU: split kvm_mmu_pte_write function · 889e5cbc

由 Xiao Guangrong 提交于 9月 22, 2011

kvm_mmu_pte_write is too long, we split it for better readable
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

889e5cbc

KVM: MMU: remove unnecessary kvm_mmu_free_some_pages · f8734352

由 Xiao Guangrong 提交于 9月 22, 2011

In kvm_mmu_pte_write, we do not need to alloc shadow page, so calling
kvm_mmu_free_some_pages is really unnecessary
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f8734352

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功