提交 · 14fa67ee95d4f7313fbf149fe37faccb903857c8 · openeuler / Kernel

26 9月, 2011 7 次提交

KVM: x86: get_msr support for HV_X64_MSR_APIC_ASSIST_PAGE · 14fa67ee

由 Mike Waychison 提交于 7月 21, 2011

"get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even
though it is explicitly enumerated as something the vmm should save in
msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST
ioctl.

Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE.  We simply return the
guest visible value of this register, which seems to be correct as a set
on the register is validated for us already.
Signed-off-by: NMike Waychison <mikew@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

14fa67ee

KVM: x86: Raise the hard VCPU count limit · 8c3ba334

由 Sasha Levin 提交于 7月 18, 2011

The patch raises the hard limit of VCPU count to 254.

This will allow developers to easily work on scalability
and will allow users to test high VCPU setups easily without
patching the kernel.

To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
now returns the recommended VCPU limit (which is still 64) - this
should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
returns the hard limit which is now 254.

Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Suggested-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8c3ba334

KVM: x86: cleanup the code of read/write emulation · 22388a3c

由 Xiao Guangrong 提交于 7月 13, 2011

Using the read/write operation to remove the same code
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

22388a3c

KVM: x86: abstract the operation for read/write emulation · 77d197b2

由 Xiao Guangrong 提交于 7月 13, 2011

The operations of read emulation and write emulation are very similar, so we
can abstract the operation of them, in larter patch, it is used to cleanup the
same code
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

77d197b2

KVM: x86: fix broken read emulation spans a page boundary · ca7d58f3

由 Xiao Guangrong 提交于 7月 13, 2011

If the range spans a page boundary, the mmio access can be broke, fix it as
write emulation.

And we already get the guest physical address, so use it to read guest data
directly to avoid walking guest page table again
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ca7d58f3

KVM: x86 emulator: fix Src2CL decode · 9be3be1f

由 Avi Kivity 提交于 9月 13, 2011

Src2CL decode (used for double width shifts) erronously decodes only bit 3
of %rcx, instead of bits 7:0.

Fix by decoding %cl in its entirety.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9be3be1f

KVM: MMU: fix incorrect return of spte · 41bc3186

由 Zhao Jin 提交于 9月 19, 2011

__update_clear_spte_slow should return original spte while the
current code returns low half of original spte combined with high
half of new spte.
Signed-off-by: NZhao Jin <cronozhj@gmail.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

41bc3186

17 8月, 2011 1 次提交

KVM: uses TASKSTATS, depends on NET · df3d8ae1

由 Randy Dunlap 提交于 8月 02, 2011

CONFIG_TASKSTATS just had a change to use netlink, including
a change to "depends on NET". Since "select" does not follow
dependencies, KVM also needs to depend on NET to prevent build
errors when CONFIG_NET is not enabled.

Sample of the reported "undefined reference" build errors:

taskstats.c:(.text+0x8f686): undefined reference to `nla_put'
taskstats.c:(.text+0x8f721): undefined reference to `nla_reserve'
taskstats.c:(.text+0x8f8fb): undefined reference to `init_net'
taskstats.c:(.text+0x8f905): undefined reference to `netlink_unicast'
taskstats.c:(.text+0x8f934): undefined reference to `kfree_skb'
taskstats.c:(.text+0x8f9e9): undefined reference to `skb_clone'
taskstats.c:(.text+0x90060): undefined reference to `__alloc_skb'
taskstats.c:(.text+0x901e9): undefined reference to `skb_put'
taskstats.c:(.init.text+0x4665): undefined reference to `genl_register_family'
taskstats.c:(.init.text+0x4699): undefined reference to `genl_register_ops'
taskstats.c:(.init.text+0x4710): undefined reference to `genl_unregister_ops'
taskstats.c:(.init.text+0x471c): undefined reference to `genl_unregister_family'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

df3d8ae1

27 7月, 2011 2 次提交

KVM: fix TASK_DELAY_ACCT kconfig warning · fd079fac

由 Randy Dunlap 提交于 7月 25, 2011

Fix kconfig dependency warning:

warning: (KVM) selects TASK_DELAY_ACCT which has unmet direct dependencies (TASKSTATS)
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fd079fac

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

24 7月, 2011 18 次提交

KVM: MMU: trace mmio page fault · 4f022648

由 Xiao Guangrong 提交于 7月 12, 2011

Add tracepoints to trace mmio page fault
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4f022648

KVM: MMU: mmio page fault support · ce88decf

由 Xiao Guangrong 提交于 7月 12, 2011

The idea is from Avi:

| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

When the page fault is caused by mmio, we cache the info in the shadow page
table, and also set the reserved bits in the shadow page table, so if the mmio
is caused again, we can quickly identify it and emulate it directly

Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it
can be reduced by this feature, and also avoid walking guest page table for
soft mmu.

[jan: fix operator precedence issue]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ce88decf

KVM: MMU: reorganize struct kvm_shadow_walk_iterator · dd3bfd59

由 Xiao Guangrong 提交于 7月 12, 2011

Reorganize it for good using the cache
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dd3bfd59

KVM: MMU: lockless walking shadow page table · c2a2ac2b

由 Xiao Guangrong 提交于 7月 12, 2011

Use rcu to protect shadow pages table to be freed, so we can safely walk it,
it should run fastly and is needed by mmio page fault
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c2a2ac2b

KVM: MMU: do not need atomicly to set/clear spte · 603e0651

由 Xiao Guangrong 提交于 7月 12, 2011

Now, the spte is just from nonprsent to present or present to nonprsent, so
we can use some trick to set/clear spte non-atomicly as linux kernel does
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

603e0651

KVM: MMU: introduce the rules to modify shadow page table · 1df9f2dc

由 Xiao Guangrong 提交于 7月 12, 2011

Introduce some interfaces to modify spte as linux kernel does:
- mmu_spte_clear_track_bits, it set the spte from present to nonpresent, and
  track the stat bits(accessed/dirty) of spte
- mmu_spte_clear_no_track, the same as mmu_spte_clear_track_bits except
  tracking the stat bits
- mmu_spte_set, set spte from nonpresent to present
- mmu_spte_update, only update the stat bits

Now, it does not allowed to set spte from present to present, later, we can
drop the atomicly opration for X86_32 host, and it is the preparing work to
get spte on X86_32 host out of the mmu lock
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1df9f2dc

KVM: MMU: abstract some functions to handle fault pfn · d7c55201

由 Xiao Guangrong 提交于 7月 12, 2011

Introduce handle_abnormal_pfn to handle fault pfn on page fault path,
introduce mmu_invalid_pfn to handle fault pfn on prefetch path

It is the preparing work for mmio page fault support
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d7c55201

KVM: MMU: filter out the mmio pfn from the fault pfn · fce92dce

由 Xiao Guangrong 提交于 7月 12, 2011

If the page fault is caused by mmio, the gfn can not be found in memslots, and
'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify
the mmio page fault.
And, to clarify the meaning of mmio pfn, we return fault page instead of bad
page when the gfn is not allowd to prefetch
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fce92dce

KVM: MMU: remove bypass_guest_pf · c3707958

由 Xiao Guangrong 提交于 7月 12, 2011

The idea is from Avi:
| Maybe it's time to kill off bypass_guest_pf=1.  It's not as effective as
| it used to be, since unsync pages always use shadow_trap_nonpresent_pte,
| and since we convert between the two nonpresent_ptes during sync and unsync.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c3707958

KVM: MMU: split kvm_mmu_free_page · bd4c86ea

由 Xiao Guangrong 提交于 7月 12, 2011

Split kvm_mmu_free_page to kvm_mmu_isolate_page and
kvm_mmu_free_page

One is used to remove the page from cache under mmu lock and the other is
used to free page table out of mmu lock
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bd4c86ea

KVM: MMU: count used shadow pages on prepareing path · aa6bd187

由 Xiao Guangrong 提交于 7月 12, 2011

Move counting used shadow pages from commiting path to preparing path to
reduce tlb flush on some paths
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

aa6bd187

KVM: MMU: rename 'pt_write' to 'emulate' · b90a0e6c

由 Xiao Guangrong 提交于 7月 12, 2011

If 'pt_write' is true, we need to emulate the fault. And in later patch, we
need to emulate the fault even though it is not a pt_write event, so rename
it to better fit the meaning
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b90a0e6c

KVM: MMU: cleanup for FNAME(fetch) · b36c7a7c

由 Xiao Guangrong 提交于 7月 12, 2011

gw->pte_access is the final access permission, since it is unified with
gw->pt_access when we walked guest page table:

FNAME(walk_addr_generic):
	pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true);
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b36c7a7c

KVM: MMU: optimize to handle dirty bit · 640d9b0d

由 Xiao Guangrong 提交于 7月 12, 2011

If dirty bit is not set, we can make the pte access read-only to avoid handing
dirty bit everywhere
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

640d9b0d

KVM: MMU: cache mmio info on page fault path · bebb106a

由 Xiao Guangrong 提交于 7月 12, 2011

If the page fault is caused by mmio, we can cache the mmio info, later, we do
not need to walk guest page table and quickly know it is a mmio fault while we
emulate the mmio instruction
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bebb106a

KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code · af7cc7d1

由 Xiao Guangrong 提交于 7月 12, 2011

Introduce vcpu_mmio_gva_to_gpa to translate the gva to gpa, we can use it
to cleanup the code between read emulation and write emulation
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

af7cc7d1

KVM: MMU: do not update slot bitmap if spte is nonpresent · ffb61bb3

由 Xiao Guangrong 提交于 7月 12, 2011

Set slot bitmap only if the spte is present
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ffb61bb3

KVM: MMU: fix walking shadow page table · 052331be

由 Xiao Guangrong 提交于 7月 12, 2011

Properly check the last mapping, and do not walk to the next level if last spte
is met
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

052331be

23 7月, 2011 1 次提交

virtio: expose for non-virtualization users too · e7254219

由 Ohad Ben-Cohen 提交于 7月 05, 2011

virtio has been so far used only in the context of virtualization,
and the virtio Kconfig was sourced directly by the relevant arch
Kconfigs when VIRTUALIZATION was selected.

Now that we start using virtio for inter-processor communications,
we need to source the virtio Kconfig outside of the virtualization
scope too.

Moreover, some architectures might use virtio for both virtualization
and inter-processor communications, so directly sourcing virtio
might yield unexpected results due to conflicting selections.

The simple solution offered by this patch is to always source virtio's
Kconfig in drivers/Kconfig, and remove it from the appropriate arch
Kconfigs. Additionally, a virtio menu entry has been added so virtio
drivers don't show up in the general drivers menu.

This way anyone can use virtio, though it's arguably less accessible
(and neat!) for virtualization users now.

Note: some architectures (mips and sh) seem to have a VIRTUALIZATION
menu merely for sourcing virtio's Kconfig, so that menu is removed too.
Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

e7254219

14 7月, 2011 1 次提交

KVM: Steal time implementation · c9aaa895

由 Glauber Costa 提交于 7月 11, 2011

To implement steal time, we need the hypervisor to pass the guest
information about how much time was spent running other processes
outside the VM, while the vcpu had meaningful work to do - halt
time does not count.

This information is acquired through the run_delay field of
delayacct/schedstats infrastructure, that counts time spent in a
runqueue but not running.

Steal time is a per-cpu information, so the traditional MSR-based
infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
memory area address containing information about steal time

This patch contains the hypervisor part of the steal time infrasructure,
and can be backported independently of the guest portion.

[avi, yongjie: export delayacct_on, to avoid build failures in some configs]
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Rik van Riel <riel@redhat.com>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NYongjie Ren <yongjie.ren@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c9aaa895

12 7月, 2011 10 次提交

KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic() · 3c8c652a

由 Takuya Yoshikawa 提交于 7月 01, 2011

Suggested by Ingo and Avi.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3c8c652a

KVM: MMU: Rename the walk label in walk_addr_generic() · 92c1c1e8

由 Takuya Yoshikawa 提交于 7月 01, 2011

The current name does not explain the meaning well.  So give it a better
name "retry_walk" to show that we are trying the walk again.

This was suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

92c1c1e8

KVM: MMU: Clean up the error handling of walk_addr_generic() · 134291bf

由 Takuya Yoshikawa 提交于 7月 01, 2011

Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

134291bf

Revert "KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB" · f8f7e5ee

由 Marcelo Tosatti 提交于 6月 21, 2011

This reverts commit bee931d31e588b8eb86b7edee32fac2d16930cd7.

TLB flush should be done lazily during guest entry, in
kvm_mmu_load().
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f8f7e5ee

KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB · 45bd07b9

由 Avi Kivity 提交于 6月 12, 2011

kvm_set_cr0() and kvm_set_cr4(), and possible other functions,
assume that kvm_mmu_reset_context() flushes the guest TLB.  However,
it does not.

Fix by flushing the tlb (and syncing the new root as well).
Signed-off-by: NAvi Kivity <avi@redhat.com>

45bd07b9

KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 · 411c588d

由 Avi Kivity 提交于 6月 06, 2011

When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them).  Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.

Adjust for this by also setting NX on the spte in these circumstances.
Signed-off-by: NAvi Kivity <avi@redhat.com>

411c588d

KVM: Enable ERMS feature support for KVM · a01c8f9b

由 Yang, Wei 提交于 6月 14, 2011

This patch exposes ERMS feature to KVM guests.

The REP MOVSB/STOSB instruction can enhance fast strings attempts to
move as much of the data with larger size load/stores as possible.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a01c8f9b

KVM: Expose RDWRGSFS bit to KVM guests · 176f61da

由 Yang, Wei 提交于 6月 14, 2011

This patch exposes RDWRGSFS bit to KVM guests.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

176f61da

KVM: Add RDWRGSFS support when setting CR4 · 74dc2b4f

由 Yang, Wei 提交于 6月 14, 2011

This patch adds RDWRGSFS support when setting CR4.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

74dc2b4f

KVM: Enable DRNG feature support for KVM · 4a00efdf

由 Yang, Wei Y 提交于 6月 13, 2011

This patch exposes DRNG feature to KVM guests.

The RDRAND instruction can provide software with sequences of
random numbers generated from white noise.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4a00efdf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功