提交 · 8d8f4e9f66ab36e4fcc75eca1e828af8466309f1 · openanolis / cloud-kernel

24 10月, 2010 40 次提交

KVM: x86 emulator: support byte/word opcode pairs · 8d8f4e9f

由 Avi Kivity 提交于 8月 26, 2010

Many x86 instructions come in byte and word variants distinguished with bit
0 of the opcode.  Add macros to aid in defining them.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8d8f4e9f

KVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand · 081bca0e

由 Avi Kivity 提交于 8月 26, 2010

SrcMemFAddr is not defined with the modrm operand designating a register
instead of a memory address.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

081bca0e

KVM: x86 emulator: get rid of "restart" in emulation context. · d2ddd1c4

由 Gleb Natapov 提交于 8月 25, 2010

x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d2ddd1c4

G
KVM: x86 emulator: move string instruction completion check into separate function · 3e2f65d5
由 Gleb Natapov 提交于 8月 25, 2010
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
3e2f65d5
G
KVM: x86 emulator: Rename variable that shadows another local variable. · 6e2fb2ca
由 Gleb Natapov 提交于 8月 25, 2010
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
6e2fb2ca
W
KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a) · cc4feed5
由 Wei Yongjun 提交于 8月 25, 2010
```
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
cc4feed5

KVM: S390: Export kvm_virtio.h · a3c321c6

由 Alexander Graf 提交于 8月 24, 2010

As suggested by Christian, we should expose headers to user space with
information that might be valuable there. The s390 virtio interface is
one of those cases. It defines an ABI between hypervisor and guest, so
it should be exposed to user space.
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a3c321c6

KVM: S390: Add virtio hotplug add support · cefa33e2

由 Alexander Graf 提交于 8月 24, 2010

The one big missing feature in s390-virtio was hotplugging. This is no more.
This patch implements hotplug add support, so you can on the fly add new devices
in the guest.

Keep in mind that this needs a patch for qemu to actually leverage the
functionality.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cefa33e2

KVM: S390: take a full byte as ext_param indicator · fc678d67

由 Alexander Graf 提交于 8月 24, 2010

Currenty the ext_param field only distinguishes between "config change" and
"vring interrupt". We can do a lot more with it though, so let's enable a
full byte of possible values and constants to #defines while at it.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

fc678d67

KVM: MMU: combine guest pte read between fetch and pte prefetch · 189be38d

由 Xiao Guangrong 提交于 8月 22, 2010

Combine guest pte read between guest pte check in the fetch path and pte prefetch
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

189be38d

KVM: MMU: prefetch ptes when intercepted guest #PF · 957ed9ef

由 Xiao Guangrong 提交于 8月 22, 2010

Support prefetch ptes when intercept guest #PF, avoid to #PF by later
access

If we meet any failure in the prefetch path, we will exit it and
not try other ptes to avoid become heavy path
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

957ed9ef

KVM: MMU: introduce gfn_to_page_many_atomic() function · 48987781

由 Xiao Guangrong 提交于 8月 22, 2010

Introduce this function to get consecutive gfn's pages, it can reduce
gup's overload, used by later patch
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

48987781

KVM: MMU: introduce hva_to_pfn_atomic function · 887c08ac

由 Xiao Guangrong 提交于 8月 22, 2010

Introduce hva_to_pfn_atomic(), it's the fast path and can used in atomic
context, the later patch will use it
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

887c08ac

export __get_user_pages_fast() function · 45888a0c

由 Xiao Guangrong 提交于 8月 22, 2010

This function is used by KVM to pin process's page in the atomic context.

Define the 'weak' function to avoid other architecture not support it
Acked-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

45888a0c

KVM: x86: Add timekeeping documentation · f392eb25

由 Zachary Amsden 提交于 8月 19, 2010

Basic informational document about x86 timekeeping and how KVM
is affected.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f392eb25

KVM: x86: Fix a possible backwards warp of kvmclock · 1d5f066e

由 Zachary Amsden 提交于 8月 19, 2010

Kernel time, which advances in discrete steps may progress much slower
than TSC.  As a result, when kvmclock is adjusted to a new base, the
apparent time to the guest, which runs at a much higher, nsec scaled
rate based on the current TSC, may have already been observed to have
a larger value (kernel_ns + scaled tsc) than the value to which we are
setting it (kernel_ns + 0).

We must instead compute the clock as potentially observed by the guest
for kernel_ns to make sure it does not go backwards.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1d5f066e

x86: pvclock: Move scale_delta into common header · 347bb444

由 Zachary Amsden 提交于 8月 19, 2010

The scale_delta function for shift / multiply with 31-bit
precision moves to a common header so it can be used by both
kernel and kvm module.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

347bb444

KVM: x86: Add clock sync request to hardware enable · ca84d1a2

由 Zachary Amsden 提交于 8月 19, 2010

If there are active VCPUs which are marked as belonging to
a particular hardware CPU, request a clock sync for them when
enabling hardware; the TSC could be desynchronized on a newly
arriving CPU, and we need to recompute guests system time
relative to boot after a suspend event.

This covers both cases.

Note that it is acceptable to take the spinlock, as either
no other tasks will be running and no locks held (BSP after
resume), or other tasks will be guaranteed to drop the lock
relatively quickly (AP on CPU_STARTING).

Noting we now get clock synchronization requests for VCPUs
which are starting up (or restarting), it is tempting to
attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers
at this time, however it is not correct to do so; they are
required for systems with non-constant TSC as the frequency
may not be known immediately after the processor has started
until the cpufreq driver has had a chance to run and query
the chipset.

Updated: implement better locking semantics for hardware_enable

Removed the hack of dropping and retaking the lock by adding the
semantic that we always hold kvm_lock when hardware_enable is
called.  The one place that doesn't need to worry about it is
resume, as resuming a frozen CPU, the spinlock won't be taken.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ca84d1a2

KVM: x86: Robust TSC compensation · 46543ba4

由 Zachary Amsden 提交于 8月 19, 2010

Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

46543ba4

KVM: x86: Add helper functions for time computation · 759379dd

由 Zachary Amsden 提交于 8月 19, 2010

Add a helper function to compute the kernel time and convert nanoseconds
back to CPU specific cycles.  Note that these must not be called in preemptible
context, as that would mean the kernel could enter software suspend state,
which would cause non-atomic operation.

Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel
time helper, these should be bootbased as well.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

759379dd

KVM: x86: Fix deep C-state TSC desynchronization · 48434c20

由 Zachary Amsden 提交于 8月 19, 2010

When CPUs with unstable TSCs enter deep C-state, TSC may stop
running.  This causes us to require resynchronization.  Since
we can't tell when this may potentially happen, we assume the
worst by forcing re-compensation for it at every point the VCPU
task is descheduled.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

48434c20

KVM: x86: Unify TSC logic · e48672fa

由 Zachary Amsden 提交于 8月 19, 2010

Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops.  Now all TSC decisions
can be done in one place.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e48672fa

KVM: x86: Warn about unstable TSC · 6755bae8

由 Zachary Amsden 提交于 8月 19, 2010

If creating an SMP guest with unstable host TSC, issue a warning
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6755bae8

KVM: x86: Make cpu_tsc_khz updates use local CPU · 8cfdc000

由 Zachary Amsden 提交于 8月 19, 2010

This simplifies much of the init code; we can now simply always
call tsc_khz_changed, optionally passing it a new value, or letting
it figure out the existing value (while interrupts are disabled, and
thus, by inference from the rule, not raceful against CPU hotplug or
frequency updates, which will issue IPIs to the local CPU to perform
this very same task).
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8cfdc000

KVM: x86: TSC reset compensation · f38e098f

由 Zachary Amsden 提交于 8月 19, 2010

Attempt to synchronize TSCs which are reset to the same value.  In the
case of a reliable hardware TSC, we can just re-use the same offset, but
on non-reliable hardware, we can get closer by adjusting the offset to
match the elapsed time.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f38e098f

KVM: x86: Move TSC offset writes to common code · 99e3e30a

由 Zachary Amsden 提交于 8月 19, 2010

Also, ensure that the storing of the offset and the reading of the TSC
are never preempted by taking a spinlock.  While the lock is overkill
now, it is useful later in this patch series.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

99e3e30a

KVM: x86: Convert TSC writes to TSC offset writes · f4e1b3c8

由 Zachary Amsden 提交于 8月 19, 2010

Change svm / vmx to be the same internally and write TSC offset
instead of bare TSC in helper functions.  Isolated as a single
patch to contain code movement.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f4e1b3c8

KVM: x86: Drop vm_init_tsc · ae38436b

由 Zachary Amsden 提交于 8月 19, 2010

This is used only by the VMX code, and is not done properly;
if the TSC is indeed backwards, it is out of sync, and will
need proper handling in the logic at each and every CPU change.
For now, drop this test during init as misguided.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ae38436b

KVM: MMU: fix missing percpu counter destroy · 45bf21a8

由 Wei Yongjun 提交于 8月 23, 2010

commit ad05c88266b4cce1c820928ce8a0fb7690912ba1
(KVM: create aggregate kvm_total_used_mmu_pages value)
introduce percpu counter kvm_total_used_mmu_pages but never
destroy it, this may cause oops when rmmod & modprobe.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

45bf21a8

KVM: MMU: fix regression from rework mmu_shrink() code · 80b63faf

由 Xiaotian Feng 提交于 8月 24, 2010

Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/
kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called
by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(),
This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever.
Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.
Reported-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Tested-by: NAvi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

80b63faf

KVM: x86 emulator: add JrCXZ instruction emulation · e4abac67

由 Wei Yongjun 提交于 8月 19, 2010

Add JrCXZ instruction emulation (opcode 0xe3)
Used by FreeBSD boot loader.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e4abac67

KVM: x86 emulator: add LDS/LES/LFS/LGS/LSS instruction emulation · 09b5f4d3

由 Wei Yongjun 提交于 8月 23, 2010

Add LDS/LES/LFS/LGS/LSS instruction emulation.
(opcode 0xc4, 0xc5, 0x0f 0xb2, 0x0f 0xb4~0xb5)
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

09b5f4d3

KVM: create aggregate kvm_total_used_mmu_pages value · 45221ab6

由 Dave Hansen 提交于 8月 19, 2010

Of slab shrinkers, the VM code says:

 * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
 * querying the cache size, so a fastpath for that case is appropriate.

and it *means* it.  Look at how it calls the shrinkers:

    nr_before = (*shrinker->shrink)(0, gfp_mask);
    shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);

So, if you do anything stupid in your shrinker, the VM will doubly
punish you.

The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence.  If we have 100 VMs, then
we're going to take 101 locks.  We do it twice, so each call takes
202 locks.  If we're under memory pressure, we can have each cpu
trying to do this.  It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.

This is guaranteed to optimize at least half of those lock
aquisitions away.  It removes the need to take any of the locks
when simply trying to count objects.

A 'percpu_counter' can be a large object, but we only have one
of these for the entire system.  There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

45221ab6

KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages · 49d5ca26

由 Dave Hansen 提交于 8月 19, 2010

Doing this makes the code much more readable.  That's
borne out by the fact that this patch removes code.  "used"
also happens to be the number that we need to return back to
the slab code when our shrinker gets called.  Keeping this
value as opposed to free makes the next patch simpler.

So, 'struct kvm' is kzalloc()'d.  'struct kvm_arch' is a
structure member (and not a pointer) of 'struct kvm'.  That
means they start out zeroed.  I _think_ they get initialized
properly by kvm_mmu_change_mmu_pages().  But, that only happens
via kvm ioctls.

Another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

49d5ca26

KVM: rename x86 kvm->arch.n_alloc_mmu_pages · 39de71ec

由 Dave Hansen 提交于 8月 19, 2010

arch.n_alloc_mmu_pages is a poor choice of name. This value truly
means, "the number of pages which _may_ be allocated".  But,
reading the name, "n_alloc_mmu_pages" implies "the number of allocated
mmu pages", which is dead wrong.

It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages.  This change will make the next few patches
much more obvious and easy to read.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

39de71ec

KVM: abstract kvm x86 mmu->n_free_mmu_pages · e0df7b9f

由 Dave Hansen 提交于 8月 19, 2010

"free" is a poor name for this value.  In this context, it means,
"the number of mmu pages which this kvm instance should be able to
allocate."  But "free" implies much more that the objects are there
and ready for use.  "available" is a much better description, especially
when you see how it is calculated.

In this patch, we abstract its use into a function.  We'll soon
replace the function's contents by calculating the value in a
different way.

All of the reads of n_free_mmu_pages are taken care of in this
patch.  The modification sites will be handled in a patch
later in the series.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e0df7b9f

A
KVM: x86 emulator: implement CWD (opcode 99) · 61429142
由 Avi Kivity 提交于 8月 19, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
61429142
A
KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69) · d46164db
由 Avi Kivity 提交于 8月 18, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
d46164db
A
KVM: x86 emulator: add Src2Imm decoding · 7db41eb7
由 Avi Kivity 提交于 8月 18, 2010
```
Needed for 3-operand IMUL.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
7db41eb7
A
KVM: x86 emulator: consolidate immediate decode into a function · 39f21ee5
由 Avi Kivity 提交于 8月 18, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
39f21ee5

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功