- 01 12月, 2012 2 次提交
-
-
由 Will Auld 提交于
CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: NWill Auld <will.auld@intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Will Auld 提交于
In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: NWill Auld <will.auld@intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 30 11月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs does not exist on any vcpu If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu list. The list can be corrupted if the cpu prefetch the vmcs's list before reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before making vmcs->vcpu == -1 be visible Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 29 11月, 2012 2 次提交
-
-
由 Xiao Guangrong 提交于
In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to smp_call_function_single, if the target cpu is downing (doing cpu hot remove), loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called in the preemptionable context Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Gleb Natapov 提交于
As Frederic pointed idle_cpu() may return false even if async fault happened in the idle task if wake up is pending. In this case the code will try to put idle task to sleep. Fix this by using is_idle_task() to check for idle task. Reported-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 28 11月, 2012 16 次提交
-
-
由 Marcelo Tosatti 提交于
As requested by Glauber, do not update kvmclock area on vcpu->pcpu migration, in case the host has stable TSC. This is to reduce cacheline bouncing. Acked-by: NGlauber Costa <glommer@parallels.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
With master clock, a pvclock clock read calculates: ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ] Where 'rdtsc' is the host TSC. system_timestamp and tsc_timestamp are unique, one tuple per VM: the "master clock". Given a host with synchronized TSCs, its obvious that guest TSC must be matched for the above to guarantee monotonicity. Allow master clock usage only if guest TSCs are synchronized. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
TSC initialization will soon make use of online_vcpus. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
KVM added a global variable to guarantee monotonicity in the guest. One of the reasons for that is that the time between 1. ktime_get_ts(×pec); 2. rdtscll(tsc); Is variable. That is, given a host with stable TSC, suppose that two VCPUs read the same time via ktime_get_ts() above. The time required to execute 2. is not the same on those two instances executing in different VCPUS (cache misses, interrupts...). If the TSC value that is used by the host to interpolate when calculating the monotonic time is the same value used to calculate the tsc_timestamp value stored in the pvclock data structure, and a single <system_timestamp, tsc_timestamp> tuple is visible to all vcpus simultaneously, this problem disappears. See comment on top of pvclock_update_vm_gtod_copy for details. Monotonicity is then guaranteed by synchronicity of the host TSCs and guest TSCs. Set TSC stable pvclock flag in that case, allowing the guest to read clock from userspace. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Register a notifier for clocksource change event. In case the host switches to clock other than TSC, disable master clock usage. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc(). Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Improve performance of time system calls when using Linux pvclock, by reading time info from fixmap visible copy of pvclock data. Originally from Jeremy Fitzhardinge. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Hook into generic pvclock vsyscall code, with the aim to allow userspace to have visibility into pvclock data. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Originally from Jeremy Fitzhardinge. Introduce generic, non hypervisor specific, pvclock initialization routines. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
As noted by Gleb, not advertising SSE2 support implies no RDTSC barriers. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Acked-by: NGlauber Costa <glommer@parallels.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Originally from Jeremy Fitzhardinge. So code can be reused. Acked-by: NGlauber Costa <glommer@parallels.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Originally from Jeremy Fitzhardinge. We can copy the information directly from "struct pvclock_vcpu_time_info", remove pvclock_shadow_time. Reviewed-by: NGlauber Costa <glommer@parallels.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Originally from Jeremy Fitzhardinge. pvclock_get_time_values, which contains the memory barriers will be removed by next patch. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
We want to expose the pvclock shared memory areas, which the hypervisor periodically updates, to userspace. For a linear mapping from userspace, it is necessary that entire page sized regions are used for array of pvclock structures. There is no such guarantee with per cpu areas, therefore move to memblock_alloc based allocation. Acked-by: NGlauber Costa <glommer@parallels.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU migration) to clear the bit. Noticed by Paolo Bonzini. Reviewed-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NGlauber Costa <glommer@parallels.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 14 11月, 2012 3 次提交
-
-
由 Guo Chao 提交于
No need to check return value before breaking switch. Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Guo Chao 提交于
Return value of this function will be that of ioctl(). #include <stdio.h> #include <linux/kvm.h> int main () { int fd; fd = open ("/dev/kvm", 0); fd = ioctl (fd, KVM_CREATE_VM, 0); ioctl (fd, KVM_SET_TSS_ADDR, 0xfffff000); perror (""); return 0; } Output is "Operation not permitted". That's not what we want. Return -EINVAL in this case. Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Guo Chao 提交于
We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and kvm_arch_vcpu_ioctl(). Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 31 10月, 2012 1 次提交
-
-
由 Alexander Graf 提交于
The new uapi framework splits kernel internal and user space exported bits of header files more cleanly. Adjust the ePAPR header accordingly. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 10月, 2012 12 次提交
-
-
由 Paul Mackerras 提交于
This fixes an error in the inline asm in try_lock_hpte() where we were erroneously using a register number as an immediate operand. The bug only affects an error path, and in fact the code will still work as long as the compiler chooses some register other than r0 for the "bits" variable. Nevertheless it should still be fixed. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Commit 55b665b0 ("KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas") includes a check on the length of the dispatch trace log (DTL) to make sure the buffer is at least one entry long. This is appropriate when registering a buffer, but the interface also allows for any existing buffer to be unregistered by specifying a zero address. In this case the length check is not appropriate. This makes the check conditional on the address being non-zero. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently the code that accounts stolen time tends to overestimate the stolen time, and will sometimes report more stolen time in a DTL (dispatch trace log) entry than has elapsed since the last DTL entry. This can cause guests to underflow the user or system time measured for some tasks, leading to ridiculous CPU percentages and total runtimes being reported by top and other utilities. In addition, the current code was designed for the previous policy where a vcore would only run when all the vcpus in it were runnable, and so only counted stolen time on a per-vcore basis. Now that a vcore can run while some of the vcpus in it are doing other things in the kernel (e.g. handling a page fault), we need to count the time when a vcpu task is preempted while it is not running as part of a vcore as stolen also. To do this, we bring back the BUSY_IN_HOST vcpu state and extend the vcpu_load/put functions to count preemption time while the vcpu is in that state. Handling the transitions between the RUNNING and BUSY_IN_HOST states requires checking and updating two variables (accumulated time stolen and time last preempted), so we add a new spinlock, vcpu->arch.tbacct_lock. This protects both the per-vcpu stolen/preempt-time variables, and the per-vcore variables while this vcpu is running the vcore. Finally, we now don't count time spent in userspace as stolen time. The task could be executing in userspace on behalf of the vcpu, or it could be preempted, or the vcpu could be genuinely stopped. Since we have no way of dividing up the time between these cases, we don't count any of it as stolen. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently the Book3S HV code implements a policy on multi-threaded processors (i.e. POWER7) that requires all of the active vcpus in a virtual core to be ready to run before we run the virtual core. However, that causes problems on reset, because reset stops all vcpus except vcpu 0, and can also reduce throughput since all four threads in a virtual core have to wait whenever any one of them hits a hypervisor page fault. This relaxes the policy, allowing the virtual core to run as soon as any vcpu in it is runnable. With this, the KVMPPC_VCPU_STOPPED state and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish between them. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
If a thread in a virtual core becomes runnable while other threads in the same virtual core are already running in the guest, it is possible for the latecomer to join the others on the core without first pulling them all out of the guest. Currently this only happens rarely, when a vcpu is first started. This fixes some bugs and omissions in the code in this case. First, we need to check for VPA updates for the latecomer and make a DTL entry for it. Secondly, if it comes along while the master vcpu is doing a VPA update, we don't need to do anything since the master will pick it up in kvmppc_run_core. To handle this correctly we introduce a new vcore state, VCORE_STARTING. Thirdly, there is a race because we currently clear the hardware thread's hwthread_req before waiting to see it get to nap. A latecomer thread could have its hwthread_req cleared before it gets to test it, and therefore never increment the nap_count, leading to messages about wait_for_nap timeouts. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
There were a few places where we were traversing the list of runnable threads in a virtual core, i.e. vc->runnable_threads, without holding the vcore spinlock. This extends the places where we hold the vcore spinlock to cover everywhere that we traverse that list. Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault, this moves the call of it from kvmppc_handle_exit out to kvmppc_vcpu_run, where we don't hold the vcore lock. In kvmppc_vcore_blocked, we don't actually need to check whether all vcpus are ceded and don't have any pending exceptions, since the caller has already done that. The caller (kvmppc_run_vcpu) wasn't actually checking for pending exceptions, so we add that. The change of if to while in kvmppc_run_vcpu is to make sure that we never call kvmppc_remove_runnable() when the vcore state is RUNNING or EXITING. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Subsequent patches implementing in-kernel XICS emulation will make it possible for IPIs to arrive at secondary threads at arbitrary times. This fixes some races in how we start the secondary threads, which if not fixed could lead to occasional crashes of the host kernel. This makes sure that (a) we have grabbed all the secondary threads, and verified that they are no longer in the kernel, before we start any thread, (b) that the secondary thread loads its vcpu pointer after clearing the IPI that woke it up (so we don't miss a wakeup), and (c) that the secondary thread clears its vcpu pointer before incrementing the nap count. It also removes unnecessary setting of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
When a Book3S HV KVM guest is running, we need the host to be in single-thread mode, that is, all of the cores (or at least all of the cores where the KVM guest could run) to be running only one active hardware thread. This is because of the hardware restriction in POWER processors that all of the hardware threads in the core must be in the same logical partition. Complying with this restriction is much easier if, from the host kernel's point of view, only one hardware thread is active. This adds two hooks in the SMP hotplug code to allow the KVM code to make sure that secondary threads (i.e. hardware threads other than thread 0) cannot come online while any KVM guest exists. The KVM code still has to check that any core where it runs a guest has the secondary threads offline, but having done that check it can now be sure that they will not come online while the guest is running. Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
The new uapi framework splits kernel internal and user space exported bits of header files more cleanly. Adjust the ePAPR header accordingly. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
The mtspr/mfspr emulation code became quite big over time. Move it into its own function so things stay more readable. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When remembering the direction of a DCR transaction, we should write to the same variable that we interpret on later when doing vcpu_run again. Signed-off-by: NAlexander Graf <agraf@suse.de> Cc: stable@vger.kernel.org
-
由 Xiao Guangrong 提交于
This patch filters noslot pfn out from error pfns based on Marcelo comment: noslot pfn is not a error pfn After this patch, - is_noslot_pfn indicates that the gfn is not in slot - is_error_pfn indicates that the gfn is in slot but the error is occurred when translate the gfn to pfn - is_error_noslot_pfn indicates that the pfn either it is error pfns or it is noslot pfn And is_invalid_pfn can be removed, it makes the code more clean Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 27 10月, 2012 3 次提交
-
-
由 Arnd Bergmann 提交于
This reverts commit 92063cee, it was applied prematurely, causing this build error for imx_v4_v5_defconfig: arch/arm/mach-imx/clk-imx25.c: In function 'mx25_clocks_init': arch/arm/mach-imx/clk-imx25.c:206:26: error: 'pwm_ipg_per' undeclared (first use in this function) arch/arm/mach-imx/clk-imx25.c:206:26: note: each undeclared identifier is reported only once for each function it appears in Sascha Hauer explains: > There are several gates missing in clk-imx25.c. I have a patch which > adds support for them and I seem to have missed that the above depends > on it. Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
由 Arnd Bergmann 提交于
With the introduction of CONFIG_ARCH_MULTIPLATFORM, versatile is no longer the default platform, so we need to enable CONFIG_ARCH_VERSATILE explicitly in order for that to be selected rather than the multiplatform configuration. Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
由 Thomas Petazzoni 提交于
The split of 370 and XP into two Kconfig options and the multiplatform kernel support has changed a few Kconfig symbols, so let's update the mvebu_defconfig file with the latest changes. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-