提交 · 4879f241720cda3e6c18a1713bf9b2ed2de14ee4 · openeuler / Kernel

06 12月, 2012 4 次提交

KVM: PPC: Book3S HV: Fix bug causing loss of page dirty state · 4879f241

由 Paul Mackerras 提交于 11月 19, 2012

This fixes a bug where adding a new guest HPT entry via the H_ENTER
hcall would lose the "changed" bit in the reverse map information
for the guest physical page being mapped.  The result was that the
KVM_GET_DIRTY_LOG could return a zero bit for the page even though
the page had been modified by the guest.

This fixes it by only modifying the index and present bits in the
reverse map entry, thus preserving the reference and change bits.
We were also unnecessarily setting the reference bit, and this
fixes that too.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4879f241

KVM: PPC: Book3S HV: Restructure HPT entry creation code · 7ed661bf

由 Paul Mackerras 提交于 11月 13, 2012

This restructures the code that creates HPT (hashed page table)
entries so that it can be called in situations where we don't have a
struct vcpu pointer, only a struct kvm pointer. It also fixes a bug
where kvmppc_map_vrma() would corrupt the guest R4 value.

Most of the work of kvmppc_virtmode_h_enter is now done by a new
function, kvmppc_virtmode_do_h_enter, which itself calls another new
function, kvmppc_do_h_enter, which contains most of the old
kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments
for the place to return the HPTE index, the Linux page tables to use,
and whether it is being called in real mode, thus removing the need
for it to have the vcpu as an argument.

Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
to handle H_ENTER hcalls from the guest that need to pin a page of
memory. Since H_ENTER returns the index of the created HPTE in R4,
kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
in the case when it gets called from kvmppc_map_vrma on the first
VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls
kvmppc_virtmode_do_h_enter with the address of a dummy word as the
place to store the HPTE index, thus avoiding corrupting the guest R4.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7ed661bf

KVM: PPC: Support eventfd · 0e673fb6

由 Alexander Graf 提交于 10月 09, 2012

In order to support the generic eventfd infrastructure on PPC, we need
to call into the generic KVM in-kernel device mmio code.
Signed-off-by: NAlexander Graf <agraf@suse.de>

0e673fb6

KVM: Distangle eventfd code from irqchip · 914daba8

由 Alexander Graf 提交于 10月 09, 2012

The current eventfd code assumes that when we have eventfd, we also have
irqfd for in-kernel interrupt delivery. This is not necessarily true. On
PPC we don't have an in-kernel irqchip yet, but we can still support easily
support eventfd.
Signed-off-by: NAlexander Graf <agraf@suse.de>

914daba8

02 12月, 2012 1 次提交

KVM: x86: Fix uninitialized return code · 45e3cc7d

由 Jan Kiszka 提交于 12月 02, 2012

This is a regression caused by 18595411.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

45e3cc7d

01 12月, 2012 2 次提交

KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635

由 Will Auld 提交于 11月 29, 2012

CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported

Basic design is to emulate the MSR by allowing reads and writes to a guest
vcpu specific location to store the value of the emulated MSR while adding
the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
is of course as long as the "use TSC counter offsetting" VM-execution control
is enabled as well as the IA32_TSC_ADJUST control.

However, because hardware will only return the TSC + IA32_TSC_ADJUST +
vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
of these three locations. The argument against storing it in the actual MSR
is performance. This is likely to be seldom used while the save/restore is
required on every transition. IA32_TSC_ADJUST was created as a way to solve
some issues with writing TSC itself so that is not an option either.

The remaining option, defined above as our solution has the problem of
returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
done here) as mentioned above. However, more problematic is that storing the
data in vmcs tsc_offset will have a different semantic effect on the system
than does using the actual MSR. This is illustrated in the following example:

The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
process performs a rdtsc. In this case the guest process will get
TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
as seen by the guest do not and hence this will not cause a problem.
Signed-off-by: NWill Auld <will.auld@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ba904635

KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46

由 Will Auld 提交于 11月 29, 2012

In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST msr
while setting the tsc value. It is anticipated that this capability is
useful for other tasks.
Signed-off-by: NWill Auld <will.auld@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8fe8ab46

30 11月, 2012 2 次提交

KVM: Fix user memslot overlap check · 5419369e

由 Alex Williamson 提交于 11月 29, 2012

Prior to memory slot sorting this loop compared all of the user memory
slots for overlap with new entries.  With memory slot sorting, we're
just checking some number of entries in the array that may or may not
be user slots.  Instead, walk all the slots with kvm_for_each_memslot,
which has the added benefit of terminating early when we hit the first
empty slot, and skip comparison to private slots.

Cc: stable@vger.kernel.org
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5419369e

KVM: VMX: fix memory order between loading vmcs and clearing vmcs · 5a560f8b

由 Xiao Guangrong 提交于 11月 28, 2012

vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs
does not exist on any vcpu

If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu
list. The list can be corrupted if the cpu prefetch the vmcs's list before
reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before
making vmcs->vcpu == -1 be visible
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5a560f8b

29 11月, 2012 2 次提交

KVM: VMX: fix invalid cpu passed to smp_call_function_single · e6c7d321

由 Xiao Guangrong 提交于 11月 28, 2012

In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to
smp_call_function_single, if the target cpu is downing (doing cpu hot remove),
loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single

It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called
in the preemptionable context
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e6c7d321

KVM: use is_idle_task() instead of idle_cpu() to decide when to halt in async_pf · 859f8450

由 Gleb Natapov 提交于 11月 28, 2012

As Frederic pointed idle_cpu() may return false even if async fault
happened in the idle task if wake up is pending. In this case the code
will try to put idle task to sleep. Fix this by using is_idle_task() to
check for idle task.
Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

859f8450

28 11月, 2012 18 次提交

KVM: x86: update pvclock area conditionally, on cpu migration · d98d07ca

由 Marcelo Tosatti 提交于 11月 27, 2012

As requested by Glauber, do not update kvmclock area on vcpu->pcpu
migration, in case the host has stable TSC.

This is to reduce cacheline bouncing.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d98d07ca

KVM: x86: require matched TSC offsets for master clock · b48aa97e

由 Marcelo Tosatti 提交于 11月 27, 2012

With master clock, a pvclock clock read calculates:

ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]

Where 'rdtsc' is the host TSC.

system_timestamp and tsc_timestamp are unique, one tuple
per VM: the "master clock".

Given a host with synchronized TSCs, its obvious that
guest TSC must be matched for the above to guarantee monotonicity.

Allow master clock usage only if guest TSCs are synchronized.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b48aa97e

M
KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization · 42897d86
由 Marcelo Tosatti 提交于 11月 27, 2012
```
TSC initialization will soon make use of online_vcpus.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
42897d86

KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag · d828199e

由 Marcelo Tosatti 提交于 11月 27, 2012

KVM added a global variable to guarantee monotonicity in the guest.
One of the reasons for that is that the time between

	1. ktime_get_ts(&timespec);
	2. rdtscll(tsc);

Is variable. That is, given a host with stable TSC, suppose that
two VCPUs read the same time via ktime_get_ts() above.

The time required to execute 2. is not the same on those two instances
executing in different VCPUS (cache misses, interrupts...).

If the TSC value that is used by the host to interpolate when
calculating the monotonic time is the same value used to calculate
the tsc_timestamp value stored in the pvclock data structure, and
a single <system_timestamp, tsc_timestamp> tuple is visible to all
vcpus simultaneously, this problem disappears. See comment on top
of pvclock_update_vm_gtod_copy for details.

Monotonicity is then guaranteed by synchronicity of the host TSCs
and guest TSCs.

Set TSC stable pvclock flag in that case, allowing the guest to read
clock from userspace.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d828199e

KVM: x86: notifier for clocksource changes · 16e8d74d

由 Marcelo Tosatti 提交于 11月 27, 2012

Register a notifier for clocksource change event. In case
the host switches to clock other than TSC, disable master
clock usage.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

16e8d74d

time: export time information for KVM pvclock · e0b306fe

由 Marcelo Tosatti 提交于 11月 27, 2012

As suggested by John, export time data similarly to how its
done by vsyscall support. This allows KVM to retrieve necessary
information to implement vsyscall support in KVM guests.
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e0b306fe

KVM: x86: pass host_tsc to read_l1_tsc · 886b470c

由 Marcelo Tosatti 提交于 11月 27, 2012

Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

886b470c

x86: vdso: pvclock gettime support · 51c19b4f

由 Marcelo Tosatti 提交于 11月 27, 2012

Improve performance of time system calls when using Linux pvclock,
by reading time info from fixmap visible copy of pvclock data.

Originally from Jeremy Fitzhardinge.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

51c19b4f

x86: kvm guest: pvclock vsyscall support · 3dc4f7cf

由 Marcelo Tosatti 提交于 11月 27, 2012

Hook into generic pvclock vsyscall code, with the aim to
allow userspace to have visibility into pvclock data.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3dc4f7cf

x86: pvclock: generic pvclock vsyscall initialization · 71056ae2

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

Introduce generic, non hypervisor specific, pvclock initialization
routines.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

71056ae2

sched: add notifier for cross-cpu migrations · 582b336e

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.
Acked-by: NIngo Molnar <mingo@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

582b336e

x86: pvclock: add note about rdtsc barriers · 189e1173

由 Marcelo Tosatti 提交于 11月 27, 2012

As noted by Gleb, not advertising SSE2 support implies
no RDTSC barriers.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

189e1173

x86: pvclock: introduce helper to read flags · 2697902b

由 Marcelo Tosatti 提交于 11月 27, 2012

Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2697902b

x86: pvclock: create helper for pvclock data retrieval · dce2db0a

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

So code can be reused.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dce2db0a

x86: pvclock: remove pvclock_shadow_time · 42b5637d

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

We can copy the information directly from "struct pvclock_vcpu_time_info",
remove pvclock_shadow_time.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

42b5637d

x86: pvclock: make sure rdtsc doesnt speculate out of region · b01578de

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

pvclock_get_time_values, which contains the memory barriers
will be removed by next patch.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b01578de

x86: kvmclock: allocate pvclock shared memory area · 7069ed67

由 Marcelo Tosatti 提交于 11月 27, 2012

We want to expose the pvclock shared memory areas, which
the hypervisor periodically updates, to userspace.

For a linear mapping from userspace, it is necessary that
entire page sized regions are used for array of pvclock
structures.

There is no such guarantee with per cpu areas, therefore move
to memblock_alloc based allocation.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7069ed67

KVM: x86: retain pvclock guest stopped bit in guest memory · 78c0337a

由 Marcelo Tosatti 提交于 11月 27, 2012

Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU
migration) to clear the bit.

Noticed by Paolo Bonzini.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

78c0337a

14 11月, 2012 3 次提交

KVM: remove unnecessary return value check · 807f12e5

由 Guo Chao 提交于 11月 02, 2012

No need to check return value before breaking switch.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

807f12e5

KVM: x86: fix return value of kvm_vm_ioctl_set_tss_addr() · 951179ce

由 Guo Chao 提交于 11月 02, 2012

Return value of this function will be that of ioctl().

#include <stdio.h>
#include <linux/kvm.h>

int main () {
	int fd;
	fd = open ("/dev/kvm", 0);
	fd = ioctl (fd, KVM_CREATE_VM, 0);
	ioctl (fd, KVM_SET_TSS_ADDR, 0xfffff000);
	perror ("");
	return 0;
}

Output is "Operation not permitted". That's not what
we want.

Return -EINVAL in this case.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

951179ce

KVM: do not kfree error pointer · 18595411

由 Guo Chao 提交于 11月 02, 2012

We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and
kvm_arch_vcpu_ioctl().
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

18595411

01 11月, 2012 2 次提交

Merge branch 'for-queue' of https://github.com/agraf/linux-2.6 into queue · f026399f

由 Marcelo Tosatti 提交于 10月 31, 2012

* 'for-queue' of https://github.com/agraf/linux-2.6:
  PPC: ePAPR: Convert hcall header to uapi (round 2)
  KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
  KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
  KVM: PPC: Book3S HV: Fix accounting of stolen time
  KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
  KVM: PPC: Book3S HV: Fixes for late-joining threads
  KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
  KVM: PPC: Book3S HV: Fix some races in starting secondary threads
  KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
  PPC: ePAPR: Convert header to uapi
  KVM: PPC: Move mtspr/mfspr emulation into own functions
  KVM: Documentation: Fix reentry-to-be-consistent paragraph
  KVM: PPC: 44x: fix DCR read/write

f026399f

KVM: SVM: update MAINTAINERS entry · 7de609c8

由 Joerg Roedel 提交于 10月 29, 2012

I have no access to my AMD email address anymore. Update
entry in MAINTAINERS to the new address.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NJoerg Roedel <joro@8bytes.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7de609c8

31 10月, 2012 2 次提交

PPC: ePAPR: Convert hcall header to uapi (round 2) · 63a19091

由 Alexander Graf 提交于 10月 31, 2012

The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.
Signed-off-by: NAlexander Graf <agraf@suse.de>

63a19091

A
Merge commit 'origin/queue' into for-queue · 0588000e
由 Alexander Graf 提交于 10月 31, 2012
```
Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/uapi/asm/Kbuild
```
0588000e

30 10月, 2012 4 次提交

KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte() · 8b5869ad

由 Paul Mackerras 提交于 10月 15, 2012

This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the "bits" variable.  Nevertheless it should still be fixed.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8b5869ad

KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0 · 9f8c8c78

由 Paul Mackerras 提交于 10月 15, 2012

Commit 55b665b0 ("KVM: PPC: Book3S HV: Provide a way for userspace
to get/set per-vCPU areas") includes a check on the length of the
dispatch trace log (DTL) to make sure the buffer is at least one entry
long.  This is appropriate when registering a buffer, but the
interface also allows for any existing buffer to be unregistered by
specifying a zero address.  In this case the length check is not
appropriate.  This makes the check conditional on the address being
non-zero.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9f8c8c78

KVM: PPC: Book3S HV: Fix accounting of stolen time · c7b67670

由 Paul Mackerras 提交于 10月 15, 2012

Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c7b67670

KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run · 8455d79e

由 Paul Mackerras 提交于 10月 15, 2012

Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8455d79e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功