提交 · d98d07ca7e0347d712d54a865af323c4aee04bc2 · openeuler / Kernel

28 11月, 2012 18 次提交

KVM: x86: update pvclock area conditionally, on cpu migration · d98d07ca

由 Marcelo Tosatti 提交于 11月 27, 2012

As requested by Glauber, do not update kvmclock area on vcpu->pcpu
migration, in case the host has stable TSC.

This is to reduce cacheline bouncing.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d98d07ca

KVM: x86: require matched TSC offsets for master clock · b48aa97e

由 Marcelo Tosatti 提交于 11月 27, 2012

With master clock, a pvclock clock read calculates:

ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]

Where 'rdtsc' is the host TSC.

system_timestamp and tsc_timestamp are unique, one tuple
per VM: the "master clock".

Given a host with synchronized TSCs, its obvious that
guest TSC must be matched for the above to guarantee monotonicity.

Allow master clock usage only if guest TSCs are synchronized.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b48aa97e

M
KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization · 42897d86
由 Marcelo Tosatti 提交于 11月 27, 2012
```
TSC initialization will soon make use of online_vcpus.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
42897d86

KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag · d828199e

由 Marcelo Tosatti 提交于 11月 27, 2012

KVM added a global variable to guarantee monotonicity in the guest.
One of the reasons for that is that the time between

	1. ktime_get_ts(&timespec);
	2. rdtscll(tsc);

Is variable. That is, given a host with stable TSC, suppose that
two VCPUs read the same time via ktime_get_ts() above.

The time required to execute 2. is not the same on those two instances
executing in different VCPUS (cache misses, interrupts...).

If the TSC value that is used by the host to interpolate when
calculating the monotonic time is the same value used to calculate
the tsc_timestamp value stored in the pvclock data structure, and
a single <system_timestamp, tsc_timestamp> tuple is visible to all
vcpus simultaneously, this problem disappears. See comment on top
of pvclock_update_vm_gtod_copy for details.

Monotonicity is then guaranteed by synchronicity of the host TSCs
and guest TSCs.

Set TSC stable pvclock flag in that case, allowing the guest to read
clock from userspace.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d828199e

KVM: x86: notifier for clocksource changes · 16e8d74d

由 Marcelo Tosatti 提交于 11月 27, 2012

Register a notifier for clocksource change event. In case
the host switches to clock other than TSC, disable master
clock usage.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

16e8d74d

time: export time information for KVM pvclock · e0b306fe

由 Marcelo Tosatti 提交于 11月 27, 2012

As suggested by John, export time data similarly to how its
done by vsyscall support. This allows KVM to retrieve necessary
information to implement vsyscall support in KVM guests.
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e0b306fe

KVM: x86: pass host_tsc to read_l1_tsc · 886b470c

由 Marcelo Tosatti 提交于 11月 27, 2012

Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

886b470c

x86: vdso: pvclock gettime support · 51c19b4f

由 Marcelo Tosatti 提交于 11月 27, 2012

Improve performance of time system calls when using Linux pvclock,
by reading time info from fixmap visible copy of pvclock data.

Originally from Jeremy Fitzhardinge.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

51c19b4f

x86: kvm guest: pvclock vsyscall support · 3dc4f7cf

由 Marcelo Tosatti 提交于 11月 27, 2012

Hook into generic pvclock vsyscall code, with the aim to
allow userspace to have visibility into pvclock data.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3dc4f7cf

x86: pvclock: generic pvclock vsyscall initialization · 71056ae2

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

Introduce generic, non hypervisor specific, pvclock initialization
routines.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

71056ae2

sched: add notifier for cross-cpu migrations · 582b336e

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.
Acked-by: NIngo Molnar <mingo@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

582b336e

x86: pvclock: add note about rdtsc barriers · 189e1173

由 Marcelo Tosatti 提交于 11月 27, 2012

As noted by Gleb, not advertising SSE2 support implies
no RDTSC barriers.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

189e1173

x86: pvclock: introduce helper to read flags · 2697902b

由 Marcelo Tosatti 提交于 11月 27, 2012

Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2697902b

x86: pvclock: create helper for pvclock data retrieval · dce2db0a

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

So code can be reused.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dce2db0a

x86: pvclock: remove pvclock_shadow_time · 42b5637d

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

We can copy the information directly from "struct pvclock_vcpu_time_info",
remove pvclock_shadow_time.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

42b5637d

x86: pvclock: make sure rdtsc doesnt speculate out of region · b01578de

由 Marcelo Tosatti 提交于 11月 27, 2012

Originally from Jeremy Fitzhardinge.

pvclock_get_time_values, which contains the memory barriers
will be removed by next patch.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b01578de

x86: kvmclock: allocate pvclock shared memory area · 7069ed67

由 Marcelo Tosatti 提交于 11月 27, 2012

We want to expose the pvclock shared memory areas, which
the hypervisor periodically updates, to userspace.

For a linear mapping from userspace, it is necessary that
entire page sized regions are used for array of pvclock
structures.

There is no such guarantee with per cpu areas, therefore move
to memblock_alloc based allocation.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7069ed67

KVM: x86: retain pvclock guest stopped bit in guest memory · 78c0337a

由 Marcelo Tosatti 提交于 11月 27, 2012

Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU
migration) to clear the bit.

Noticed by Paolo Bonzini.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

78c0337a

14 11月, 2012 3 次提交

KVM: remove unnecessary return value check · 807f12e5

由 Guo Chao 提交于 11月 02, 2012

No need to check return value before breaking switch.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

807f12e5

KVM: x86: fix return value of kvm_vm_ioctl_set_tss_addr() · 951179ce

由 Guo Chao 提交于 11月 02, 2012

Return value of this function will be that of ioctl().

#include <stdio.h>
#include <linux/kvm.h>

int main () {
	int fd;
	fd = open ("/dev/kvm", 0);
	fd = ioctl (fd, KVM_CREATE_VM, 0);
	ioctl (fd, KVM_SET_TSS_ADDR, 0xfffff000);
	perror ("");
	return 0;
}

Output is "Operation not permitted". That's not what
we want.

Return -EINVAL in this case.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

951179ce

KVM: do not kfree error pointer · 18595411

由 Guo Chao 提交于 11月 02, 2012

We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and
kvm_arch_vcpu_ioctl().
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

18595411

01 11月, 2012 2 次提交

Merge branch 'for-queue' of https://github.com/agraf/linux-2.6 into queue · f026399f

由 Marcelo Tosatti 提交于 10月 31, 2012

* 'for-queue' of https://github.com/agraf/linux-2.6:
  PPC: ePAPR: Convert hcall header to uapi (round 2)
  KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
  KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
  KVM: PPC: Book3S HV: Fix accounting of stolen time
  KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
  KVM: PPC: Book3S HV: Fixes for late-joining threads
  KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
  KVM: PPC: Book3S HV: Fix some races in starting secondary threads
  KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
  PPC: ePAPR: Convert header to uapi
  KVM: PPC: Move mtspr/mfspr emulation into own functions
  KVM: Documentation: Fix reentry-to-be-consistent paragraph
  KVM: PPC: 44x: fix DCR read/write

f026399f

KVM: SVM: update MAINTAINERS entry · 7de609c8

由 Joerg Roedel 提交于 10月 29, 2012

I have no access to my AMD email address anymore. Update
entry in MAINTAINERS to the new address.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NJoerg Roedel <joro@8bytes.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7de609c8

31 10月, 2012 2 次提交

PPC: ePAPR: Convert hcall header to uapi (round 2) · 63a19091

由 Alexander Graf 提交于 10月 31, 2012

The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.
Signed-off-by: NAlexander Graf <agraf@suse.de>

63a19091

A
Merge commit 'origin/queue' into for-queue · 0588000e
由 Alexander Graf 提交于 10月 31, 2012
```
Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/uapi/asm/Kbuild
```
0588000e

30 10月, 2012 14 次提交

KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte() · 8b5869ad

由 Paul Mackerras 提交于 10月 15, 2012

This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the "bits" variable.  Nevertheless it should still be fixed.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8b5869ad

KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0 · 9f8c8c78

由 Paul Mackerras 提交于 10月 15, 2012

Commit 55b665b0 ("KVM: PPC: Book3S HV: Provide a way for userspace
to get/set per-vCPU areas") includes a check on the length of the
dispatch trace log (DTL) to make sure the buffer is at least one entry
long.  This is appropriate when registering a buffer, but the
interface also allows for any existing buffer to be unregistered by
specifying a zero address.  In this case the length check is not
appropriate.  This makes the check conditional on the address being
non-zero.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9f8c8c78

KVM: PPC: Book3S HV: Fix accounting of stolen time · c7b67670

由 Paul Mackerras 提交于 10月 15, 2012

Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c7b67670

KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run · 8455d79e

由 Paul Mackerras 提交于 10月 15, 2012

Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8455d79e

KVM: PPC: Book3S HV: Fixes for late-joining threads · 2f12f034

由 Paul Mackerras 提交于 10月 15, 2012

If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest.  Currently this only happens
rarely, when a vcpu is first started.  This fixes some bugs and
omissions in the code in this case.

First, we need to check for VPA updates for the latecomer and make
a DTL entry for it.  Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core.  To handle this correctly
we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap.  A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

2f12f034

KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock · 913d3ff9

由 Paul Mackerras 提交于 10月 15, 2012

There were a few places where we were traversing the list of runnable
threads in a virtual core, i.e. vc->runnable_threads, without holding
the vcore spinlock.  This extends the places where we hold the vcore
spinlock to cover everywhere that we traverse that list.

Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
this moves the call of it from kvmppc_handle_exit out to
kvmppc_vcpu_run, where we don't hold the vcore lock.

In kvmppc_vcore_blocked, we don't actually need to check whether
all vcpus are ceded and don't have any pending exceptions, since the
caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
actually checking for pending exceptions, so we add that.

The change of if to while in kvmppc_run_vcpu is to make sure that we
never call kvmppc_remove_runnable() when the vcore state is RUNNING or
EXITING.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

913d3ff9

KVM: PPC: Book3S HV: Fix some races in starting secondary threads · 7b444c67

由 Paul Mackerras 提交于 10月 15, 2012

Subsequent patches implementing in-kernel XICS emulation will make it
possible for IPIs to arrive at secondary threads at arbitrary times.
This fixes some races in how we start the secondary threads, which
if not fixed could lead to occasional crashes of the host kernel.

This makes sure that (a) we have grabbed all the secondary threads,
and verified that they are no longer in the kernel, before we start
any thread, (b) that the secondary thread loads its vcpu pointer
after clearing the IPI that woke it up (so we don't miss a wakeup),
and (c) that the secondary thread clears its vcpu pointer before
incrementing the nap count. It also removes unnecessary setting
of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7b444c67

KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online · 512691d4

由 Paul Mackerras 提交于 10月 15, 2012

When a Book3S HV KVM guest is running, we need the host to be in
single-thread mode, that is, all of the cores (or at least all of
the cores where the KVM guest could run) to be running only one
active hardware thread. This is because of the hardware restriction
in POWER processors that all of the hardware threads in the core
must be in the same logical partition. Complying with this restriction
is much easier if, from the host kernel's point of view, only one
hardware thread is active.

This adds two hooks in the SMP hotplug code to allow the KVM code to
make sure that secondary threads (i.e. hardware threads other than
thread 0) cannot come online while any KVM guest exists. The KVM
code still has to check that any core where it runs a guest has the
secondary threads offline, but having done that check it can now be
sure that they will not come online while the guest is running.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

512691d4

PPC: ePAPR: Convert header to uapi · c99ec973

由 Alexander Graf 提交于 10月 27, 2012

The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.
Signed-off-by: NAlexander Graf <agraf@suse.de>

c99ec973

KVM: PPC: Move mtspr/mfspr emulation into own functions · 388cf9ee

由 Alexander Graf 提交于 10月 06, 2012

The mtspr/mfspr emulation code became quite big over time. Move it
into its own function so things stay more readable.
Signed-off-by: NAlexander Graf <agraf@suse.de>

388cf9ee

KVM: Documentation: Fix reentry-to-be-consistent paragraph · 686de182

由 Alexander Graf 提交于 10月 07, 2012

All user space offloaded instruction emulation needs to reenter kvm
to produce consistent state again. Fix the section in the documentation
to mention all of them.
Signed-off-by: NAlexander Graf <agraf@suse.de>

686de182

KVM: PPC: 44x: fix DCR read/write · e43a0287

由 Alexander Graf 提交于 10月 06, 2012

When remembering the direction of a DCR transaction, we should write
to the same variable that we interpret on later when doing vcpu_run
again.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Cc: stable@vger.kernel.org

e43a0287

KVM: do not treat noslot pfn as a error pfn · 81c52c56

由 Xiao Guangrong 提交于 10月 16, 2012

This patch filters noslot pfn out from error pfns based on Marcelo comment:
noslot pfn is not a error pfn

After this patch,
- is_noslot_pfn indicates that the gfn is not in slot
- is_error_pfn indicates that the gfn is in slot but the error is occurred
  when translate the gfn to pfn
- is_error_noslot_pfn indicates that the pfn either it is error pfns or it
  is noslot pfn
And is_invalid_pfn can be removed, it makes the code more clean
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

81c52c56

Merge remote-tracking branch 'master' into queue · 19bf7f8a

由 Marcelo Tosatti 提交于 10月 29, 2012

Merge reason: development work has dependency on kvm patches merged
upstream.

Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/asm/kvm_para.h
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

19bf7f8a

29 10月, 2012 1 次提交

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 35fd3dc5

由 Linus Torvalds 提交于 10月 29, 2012

Pull Ceph fixes form Sage Weil:
 "There are two fixes in the messenger code, one that can trigger a NULL
  dereference, and one that error in refcounting (extra put).  There is
  also a trivial fix that in the fs client code that is triggered by NFS
  reexport."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix dentry reference leak in encode_fh()
  libceph: avoid NULL kref_put when osd reset races with alloc_msg
  rbd: reset BACKOFF if unable to re-queue

35fd3dc5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功