提交 · b0a94d4e23201c7559bb8f8657cfb629561288f2 · openeuler / Kernel

06 12月, 2012 10 次提交

KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registers · b0a94d4e

由 Paul Mackerras 提交于 11月 04, 2012

This adds basic emulation of the PURR and SPURR registers.  We assume
we are emulating a single-threaded core, so these advance at the same
rate as the timebase.  A Linux kernel running on a POWER7 expects to
be able to access these registers and is not prepared to handle a
program interrupt on accessing them.

This also adds a very minimal emulation of the DSCR (data stream
control register).  Writes are ignored and reads return zero.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b0a94d4e

KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages · 1cc8ed0b

由 Paul Mackerras 提交于 11月 21, 2012

Currently, if the guest does an H_PROTECT hcall requesting that the
permissions on a HPT entry be changed to allow writing, we make the
requested change even if the page is marked read-only in the host
Linux page tables.  This is a problem since it would for instance
allow a guest to modify a page that KSM has decided can be shared
between multiple guests.

To fix this, if the new permissions for the page allow writing, we need
to look up the memslot for the page, work out the host virtual address,
and look up the Linux page tables to get the PTE for the page.  If that
PTE is read-only, we reduce the HPTE permissions to read-only.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1cc8ed0b

KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPT · 05dd85f7

由 Paul Mackerras 提交于 11月 21, 2012

This fixes a bug in the code which allows userspace to read out the
contents of the guest's hashed page table (HPT).  On the second and
subsequent passes through the HPT, when we are reporting only those
entries that have changed, we were incorrectly initializing the index
field of the header with the index of the first entry we skipped
rather than the first changed entry.  This fixes it.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

05dd85f7

KVM: PPC: Book3S HV: Reset reverse-map chains when resetting the HPT · a64fd707

由 Paul Mackerras 提交于 11月 21, 2012

With HV-style KVM, we maintain reverse-mapping lists that enable us to
find all the HPT (hashed page table) entries that reference each guest
physical page, with the heads of the lists in the memslot->arch.rmap
arrays. When we reset the HPT (i.e. when we reboot the VM), we clear
out all the HPT entries but we were not clearing out the reverse
mapping lists. The result is that as we create new HPT entries, the
lists get corrupted, which can easily lead to loops, resulting in the
host kernel hanging when it tries to traverse those lists.

This fixes the problem by zeroing out all the reverse mapping lists
when we zero out the HPT. This incidentally means that we are also
zeroing our record of the referenced and changed bits (not the bits
in the Linux PTEs, used by the Linux MM subsystem, but the bits used
by the KVM_GET_DIRTY_LOG ioctl, and those used by kvm_age_hva() and
kvm_test_age_hva()).
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a64fd707

KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT · a2932923

由 Paul Mackerras 提交于 11月 19, 2012

A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT.  There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags.  The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).

This is intended for use in implementing qemu's savevm/loadvm and for
live migration.  Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
end of the HPT, it returns from the read.  Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.

The format of the data provides a simple run-length compression of the
invalid entries.  Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries.  The valid entries, 16
bytes each, follow the header.  The invalid entries are not explicitly
represented.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

a2932923

KVM: PPC: Book3S HV: Make a HPTE removal function available · 6b445ad4

由 Paul Mackerras 提交于 11月 19, 2012

This makes a HPTE removal function, kvmppc_do_h_remove(), available
outside book3s_hv_rm_mmu.c.  This will be used by the HPT writing
code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6b445ad4

KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be

由 Paul Mackerras 提交于 11月 19, 2012

This uses a bit in our record of the guest view of the HPTE to record
when the HPTE gets modified. We use a reserved bit for this, and ensure
that this bit is always cleared in HPTE values returned to the guest.

The recording of modified HPTEs is only done if other code indicates
its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
The reason for this is that when later commits add facilities for
userspace to read the HPT, the first pass of reading the HPT will be
quicker if there are no (or very few) HPTEs marked as modified,
rather than having most HPTEs marked as modified.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

44e5f6be

KVM: PPC: Book3S HV: Fix bug causing loss of page dirty state · 4879f241

由 Paul Mackerras 提交于 11月 19, 2012

This fixes a bug where adding a new guest HPT entry via the H_ENTER
hcall would lose the "changed" bit in the reverse map information
for the guest physical page being mapped.  The result was that the
KVM_GET_DIRTY_LOG could return a zero bit for the page even though
the page had been modified by the guest.

This fixes it by only modifying the index and present bits in the
reverse map entry, thus preserving the reference and change bits.
We were also unnecessarily setting the reference bit, and this
fixes that too.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4879f241

KVM: PPC: Book3S HV: Restructure HPT entry creation code · 7ed661bf

由 Paul Mackerras 提交于 11月 13, 2012

This restructures the code that creates HPT (hashed page table)
entries so that it can be called in situations where we don't have a
struct vcpu pointer, only a struct kvm pointer. It also fixes a bug
where kvmppc_map_vrma() would corrupt the guest R4 value.

Most of the work of kvmppc_virtmode_h_enter is now done by a new
function, kvmppc_virtmode_do_h_enter, which itself calls another new
function, kvmppc_do_h_enter, which contains most of the old
kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments
for the place to return the HPTE index, the Linux page tables to use,
and whether it is being called in real mode, thus removing the need
for it to have the vcpu as an argument.

Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
to handle H_ENTER hcalls from the guest that need to pin a page of
memory. Since H_ENTER returns the index of the created HPTE in R4,
kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
in the case when it gets called from kvmppc_map_vrma on the first
VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls
kvmppc_virtmode_do_h_enter with the address of a dummy word as the
place to store the HPTE index, thus avoiding corrupting the guest R4.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7ed661bf

KVM: PPC: Support eventfd · 0e673fb6

由 Alexander Graf 提交于 10月 09, 2012

In order to support the generic eventfd infrastructure on PPC, we need
to call into the generic KVM in-kernel device mmio code.
Signed-off-by: NAlexander Graf <agraf@suse.de>

0e673fb6

02 12月, 2012 1 次提交

KVM: x86: Fix uninitialized return code · 45e3cc7d

由 Jan Kiszka 提交于 12月 02, 2012

This is a regression caused by 18595411.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

45e3cc7d

01 12月, 2012 2 次提交

KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635

由 Will Auld 提交于 11月 29, 2012

CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported

Basic design is to emulate the MSR by allowing reads and writes to a guest
vcpu specific location to store the value of the emulated MSR while adding
the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
is of course as long as the "use TSC counter offsetting" VM-execution control
is enabled as well as the IA32_TSC_ADJUST control.

However, because hardware will only return the TSC + IA32_TSC_ADJUST +
vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
of these three locations. The argument against storing it in the actual MSR
is performance. This is likely to be seldom used while the save/restore is
required on every transition. IA32_TSC_ADJUST was created as a way to solve
some issues with writing TSC itself so that is not an option either.

The remaining option, defined above as our solution has the problem of
returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
done here) as mentioned above. However, more problematic is that storing the
data in vmcs tsc_offset will have a different semantic effect on the system
than does using the actual MSR. This is illustrated in the following example:

The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
process performs a rdtsc. In this case the guest process will get
TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
as seen by the guest do not and hence this will not cause a problem.
Signed-off-by: NWill Auld <will.auld@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ba904635

KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46

由 Will Auld 提交于 11月 29, 2012

In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST msr
while setting the tsc value. It is anticipated that this capability is
useful for other tasks.
Signed-off-by: NWill Auld <will.auld@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8fe8ab46

30 11月, 2012 1 次提交

KVM: VMX: fix memory order between loading vmcs and clearing vmcs · 5a560f8b

由 Xiao Guangrong 提交于 11月 28, 2012

vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs
does not exist on any vcpu

If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu
list. The list can be corrupted if the cpu prefetch the vmcs's list before
reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before
making vmcs->vcpu == -1 be visible
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5a560f8b

29 11月, 2012 2 次提交

KVM: VMX: fix invalid cpu passed to smp_call_function_single · e6c7d321

由 Xiao Guangrong 提交于 11月 28, 2012

In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to
smp_call_function_single, if the target cpu is downing (doing cpu hot remove),
loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single

It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called
in the preemptionable context
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e6c7d321

KVM: use is_idle_task() instead of idle_cpu() to decide when to halt in async_pf · 859f8450

由 Gleb Natapov 提交于 11月 28, 2012

As Frederic pointed idle_cpu() may return false even if async fault
happened in the idle task if wake up is pending. In this case the code
will try to put idle task to sleep. Fix this by using is_idle_task() to
check for idle task.
Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

859f8450

28 11月, 2012 16 次提交

KVM: x86: update pvclock area conditionally, on cpu migration · d98d07ca

由 Marcelo Tosatti 提交于 11月 27, 2012

As requested by Glauber, do not update kvmclock area on vcpu->pcpu
migration, in case the host has stable TSC.

This is to reduce cacheline bouncing.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d98d07ca

KVM: x86: require matched TSC offsets for master clock · b48aa97e

由 Marcelo Tosatti 提交于 11月 27, 2012

With master clock, a pvclock clock read calculates:

ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]

Where 'rdtsc' is the host TSC.

system_timestamp and tsc_timestamp are unique, one tuple
per VM: the "master clock".

Given a host with synchronized TSCs, its obvious that
guest TSC must be matched for the above to guarantee monotonicity.

Allow master clock usage only if guest TSCs are synchronized.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b48aa97e

M
KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization · 42897d86
由 Marcelo Tosatti 提交于 11月 27, 2012
```
TSC initialization will soon make use of online_vcpus.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
42897d86

KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag · d828199e

由 Marcelo Tosatti 提交于 11月 27, 2012

KVM added a global variable to guarantee monotonicity in the guest.
One of the reasons for that is that the time between

	1. ktime_get_ts(&timespec);
	2. rdtscll(tsc);

Is variable. That is, given a host with stable TSC, suppose that
two VCPUs read the same time via ktime_get_ts() above.

The time required to execute 2. is not the same on those two instances
executing in different VCPUS (cache misses, interrupts...).

If the TSC value that is used by the host to interpolate when
calculating the monotonic time is the same value used to calculate
the tsc_timestamp value stored in the pvclock data structure, and
a single <system_timestamp, tsc_timestamp> tuple is visible to all
vcpus simultaneously, this problem disappears. See comment on top
of pvclock_update_vm_gtod_copy for details.

Monotonicity is then guaranteed by synchronicity of the host TSCs
and guest TSCs.

Set TSC stable pvclock flag in that case, allowing the guest to read
clock from userspace.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d828199e

KVM: x86: notifier for clocksource changes · 16e8d74d

由 Marcelo Tosatti 提交于 11月 27, 2012

Register a notifier for clocksource change event. In case
the host switches to clock other than TSC, disable master
clock usage.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

16e8d74d

KVM: x86: pass host_tsc to read_l1_tsc · 886b470c

由 Marcelo Tosatti 提交于 11月 27, 2012

Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

886b470c

x86: vdso: pvclock gettime support · 51c19b4f

由 Marcelo Tosatti 提交于 11月 27, 2012

Improve performance of time system calls when using Linux pvclock,
by reading time info from fixmap visible copy of pvclock data.

Originally from Jeremy Fitzhardinge.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

51c19b4f

x86: kvm guest: pvclock vsyscall support · 3dc4f7cf

由 Marcelo Tosatti 提交于 11月 27, 2012

Hook into generic pvclock vsyscall code, with the aim to
allow userspace to have visibility into pvclock data.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3dc4f7cf

x86: pvclock: generic pvclock vsyscall initialization · 71056ae2

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

Introduce generic, non hypervisor specific, pvclock initialization
routines.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

71056ae2

x86: pvclock: add note about rdtsc barriers · 189e1173

由 Marcelo Tosatti 提交于 11月 27, 2012

As noted by Gleb, not advertising SSE2 support implies
no RDTSC barriers.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

189e1173

x86: pvclock: introduce helper to read flags · 2697902b

由 Marcelo Tosatti 提交于 11月 27, 2012

Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2697902b

x86: pvclock: create helper for pvclock data retrieval · dce2db0a

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

So code can be reused.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dce2db0a

x86: pvclock: remove pvclock_shadow_time · 42b5637d

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

We can copy the information directly from "struct pvclock_vcpu_time_info",
remove pvclock_shadow_time.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

42b5637d

x86: pvclock: make sure rdtsc doesnt speculate out of region · b01578de

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

pvclock_get_time_values, which contains the memory barriers
will be removed by next patch.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b01578de

x86: kvmclock: allocate pvclock shared memory area · 7069ed67

由 Marcelo Tosatti 提交于 11月 27, 2012

We want to expose the pvclock shared memory areas, which
the hypervisor periodically updates, to userspace.

For a linear mapping from userspace, it is necessary that
entire page sized regions are used for array of pvclock
structures.

There is no such guarantee with per cpu areas, therefore move
to memblock_alloc based allocation.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7069ed67

KVM: x86: retain pvclock guest stopped bit in guest memory · 78c0337a

由 Marcelo Tosatti 提交于 11月 27, 2012

Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU
migration) to clear the bit.

Noticed by Paolo Bonzini.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

78c0337a

14 11月, 2012 3 次提交

KVM: remove unnecessary return value check · 807f12e5

由 Guo Chao 提交于 11月 02, 2012

No need to check return value before breaking switch.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

807f12e5

KVM: x86: fix return value of kvm_vm_ioctl_set_tss_addr() · 951179ce

由 Guo Chao 提交于 11月 02, 2012

Return value of this function will be that of ioctl().

#include <stdio.h>
#include <linux/kvm.h>

int main () {
	int fd;
	fd = open ("/dev/kvm", 0);
	fd = ioctl (fd, KVM_CREATE_VM, 0);
	ioctl (fd, KVM_SET_TSS_ADDR, 0xfffff000);
	perror ("");
	return 0;
}

Output is "Operation not permitted". That's not what
we want.

Return -EINVAL in this case.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

951179ce

KVM: do not kfree error pointer · 18595411

由 Guo Chao 提交于 11月 02, 2012

We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and
kvm_arch_vcpu_ioctl().
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

18595411

31 10月, 2012 1 次提交

PPC: ePAPR: Convert hcall header to uapi (round 2) · 63a19091

由 Alexander Graf 提交于 10月 31, 2012

The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.
Signed-off-by: NAlexander Graf <agraf@suse.de>

63a19091

30 10月, 2012 4 次提交

KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte() · 8b5869ad

由 Paul Mackerras 提交于 10月 15, 2012

This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the "bits" variable.  Nevertheless it should still be fixed.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8b5869ad

KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0 · 9f8c8c78

由 Paul Mackerras 提交于 10月 15, 2012

Commit 55b665b0 ("KVM: PPC: Book3S HV: Provide a way for userspace
to get/set per-vCPU areas") includes a check on the length of the
dispatch trace log (DTL) to make sure the buffer is at least one entry
long.  This is appropriate when registering a buffer, but the
interface also allows for any existing buffer to be unregistered by
specifying a zero address.  In this case the length check is not
appropriate.  This makes the check conditional on the address being
non-zero.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9f8c8c78

KVM: PPC: Book3S HV: Fix accounting of stolen time · c7b67670

由 Paul Mackerras 提交于 10月 15, 2012

Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c7b67670

KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run · 8455d79e

由 Paul Mackerras 提交于 10月 15, 2012

Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8455d79e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功