提交 · ba904635d498fea43fc3610983f9dc430ac324e4 · openeuler / Kernel

01 12月, 2012 2 次提交

KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635

由 Will Auld 提交于 11月 29, 2012

CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported

Basic design is to emulate the MSR by allowing reads and writes to a guest
vcpu specific location to store the value of the emulated MSR while adding
the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
is of course as long as the "use TSC counter offsetting" VM-execution control
is enabled as well as the IA32_TSC_ADJUST control.

However, because hardware will only return the TSC + IA32_TSC_ADJUST +
vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
of these three locations. The argument against storing it in the actual MSR
is performance. This is likely to be seldom used while the save/restore is
required on every transition. IA32_TSC_ADJUST was created as a way to solve
some issues with writing TSC itself so that is not an option either.

The remaining option, defined above as our solution has the problem of
returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
done here) as mentioned above. However, more problematic is that storing the
data in vmcs tsc_offset will have a different semantic effect on the system
than does using the actual MSR. This is illustrated in the following example:

The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
process performs a rdtsc. In this case the guest process will get
TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
as seen by the guest do not and hence this will not cause a problem.
Signed-off-by: NWill Auld <will.auld@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ba904635

KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46

由 Will Auld 提交于 11月 29, 2012

In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST msr
while setting the tsc value. It is anticipated that this capability is
useful for other tasks.
Signed-off-by: NWill Auld <will.auld@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8fe8ab46

30 11月, 2012 1 次提交

KVM: VMX: fix memory order between loading vmcs and clearing vmcs · 5a560f8b

由 Xiao Guangrong 提交于 11月 28, 2012

vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs
does not exist on any vcpu

If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu
list. The list can be corrupted if the cpu prefetch the vmcs's list before
reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before
making vmcs->vcpu == -1 be visible
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5a560f8b

29 11月, 2012 1 次提交

KVM: VMX: fix invalid cpu passed to smp_call_function_single · e6c7d321

由 Xiao Guangrong 提交于 11月 28, 2012

In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to
smp_call_function_single, if the target cpu is downing (doing cpu hot remove),
loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single

It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called
in the preemptionable context
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e6c7d321

28 11月, 2012 7 次提交

KVM: x86: update pvclock area conditionally, on cpu migration · d98d07ca

由 Marcelo Tosatti 提交于 11月 27, 2012

As requested by Glauber, do not update kvmclock area on vcpu->pcpu
migration, in case the host has stable TSC.

This is to reduce cacheline bouncing.
Acked-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d98d07ca

KVM: x86: require matched TSC offsets for master clock · b48aa97e

由 Marcelo Tosatti 提交于 11月 27, 2012

With master clock, a pvclock clock read calculates:

ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]

Where 'rdtsc' is the host TSC.

system_timestamp and tsc_timestamp are unique, one tuple
per VM: the "master clock".

Given a host with synchronized TSCs, its obvious that
guest TSC must be matched for the above to guarantee monotonicity.

Allow master clock usage only if guest TSCs are synchronized.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b48aa97e

M
KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization · 42897d86
由 Marcelo Tosatti 提交于 11月 27, 2012
```
TSC initialization will soon make use of online_vcpus.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
42897d86

KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag · d828199e

由 Marcelo Tosatti 提交于 11月 27, 2012

KVM added a global variable to guarantee monotonicity in the guest.
One of the reasons for that is that the time between

	1. ktime_get_ts(&timespec);
	2. rdtscll(tsc);

Is variable. That is, given a host with stable TSC, suppose that
two VCPUs read the same time via ktime_get_ts() above.

The time required to execute 2. is not the same on those two instances
executing in different VCPUS (cache misses, interrupts...).

If the TSC value that is used by the host to interpolate when
calculating the monotonic time is the same value used to calculate
the tsc_timestamp value stored in the pvclock data structure, and
a single <system_timestamp, tsc_timestamp> tuple is visible to all
vcpus simultaneously, this problem disappears. See comment on top
of pvclock_update_vm_gtod_copy for details.

Monotonicity is then guaranteed by synchronicity of the host TSCs
and guest TSCs.

Set TSC stable pvclock flag in that case, allowing the guest to read
clock from userspace.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d828199e

KVM: x86: notifier for clocksource changes · 16e8d74d

由 Marcelo Tosatti 提交于 11月 27, 2012

Register a notifier for clocksource change event. In case
the host switches to clock other than TSC, disable master
clock usage.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

16e8d74d

KVM: x86: pass host_tsc to read_l1_tsc · 886b470c

由 Marcelo Tosatti 提交于 11月 27, 2012

Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

886b470c

KVM: x86: retain pvclock guest stopped bit in guest memory · 78c0337a

由 Marcelo Tosatti 提交于 11月 27, 2012

Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU
migration) to clear the bit.

Noticed by Paolo Bonzini.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

78c0337a

14 11月, 2012 3 次提交

KVM: remove unnecessary return value check · 807f12e5

由 Guo Chao 提交于 11月 02, 2012

No need to check return value before breaking switch.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

807f12e5

KVM: x86: fix return value of kvm_vm_ioctl_set_tss_addr() · 951179ce

由 Guo Chao 提交于 11月 02, 2012

Return value of this function will be that of ioctl().

#include <stdio.h>
#include <linux/kvm.h>

int main () {
	int fd;
	fd = open ("/dev/kvm", 0);
	fd = ioctl (fd, KVM_CREATE_VM, 0);
	ioctl (fd, KVM_SET_TSS_ADDR, 0xfffff000);
	perror ("");
	return 0;
}

Output is "Operation not permitted". That's not what
we want.

Return -EINVAL in this case.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

951179ce

KVM: do not kfree error pointer · 18595411

由 Guo Chao 提交于 11月 02, 2012

We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and
kvm_arch_vcpu_ioctl().
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

18595411

30 10月, 2012 1 次提交

KVM: do not treat noslot pfn as a error pfn · 81c52c56

由 Xiao Guangrong 提交于 10月 16, 2012

This patch filters noslot pfn out from error pfns based on Marcelo comment:
noslot pfn is not a error pfn

After this patch,
- is_noslot_pfn indicates that the gfn is not in slot
- is_error_pfn indicates that the gfn is in slot but the error is occurred
  when translate the gfn to pfn
- is_error_noslot_pfn indicates that the pfn either it is error pfns or it
  is noslot pfn
And is_invalid_pfn can be removed, it makes the code more clean
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

81c52c56

23 10月, 2012 3 次提交

KVM: Take kvm instead of vcpu to mmu_notifier_retry · 8ca40a70

由 Christoffer Dall 提交于 10月 14, 2012

The mmu_notifier_retry is not specific to any vcpu (and never will be)
so only take struct kvm as a parameter.

The motivation is the ARM mmu code that needs to call this from
somewhere where we long let go of the vcpu pointer.
Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8ca40a70

KVM: apic: fix LDR calculation in x2apic mode · 7f46ddbd

由 Gleb Natapov 提交于 10月 14, 2012

Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NChegu Vinod  <chegu_vinod@hp.com>
Tested-by: NChegu Vinod <chegu_vinod@hp.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f46ddbd

KVM: MMU: fix release noslot pfn · f3ac1a4b

由 Xiao Guangrong 提交于 10月 16, 2012

We can not directly call kvm_release_pfn_clean to release the pfn
since we can meet noslot pfn which is used to cache mmio info into
spte
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

f3ac1a4b

22 10月, 2012 1 次提交

KVM: SVM: Cleanup error statements · 1f5b77f5

由 Borislav Petkov 提交于 10月 20, 2012

Use __func__ instead of the function name in svm_hardware_enable since
those things tend to get out of sync. This also slims down printk line
length in conjunction with using pr_err.

No functionality change.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1f5b77f5

18 10月, 2012 3 次提交

KVM: VMX: report internal error for MMIO #PF due to delivery event · bf4ca23e

由 Xiao Guangrong 提交于 10月 17, 2012

The #PF with PFEC.RSV = 1 indicates that the guest is accessing MMIO, we
can not fix it if it is caused by delivery event. Reporting internal error
for this case
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bf4ca23e

KVM: VMX: report internal error for the unhandleable event · b9bf6882

由 Xiao Guangrong 提交于 10月 17, 2012

VM exits during Event Delivery is really unexpected if it is not caused
by Exceptions/EPT-VIOLATION/TASK_SWITCH, we'd better to report an internal
and freeze the guest, the VMM has the chance to check the guest
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b9bf6882

G
KVM: do not de-cache cr4 bits needlessly · 471842ec
由 Gleb Natapov 提交于 10月 15, 2012
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
471842ec

17 10月, 2012 4 次提交

KVM: MMU: introduce FNAME(prefetch_gpte) · bd6360cc

由 Xiao Guangrong 提交于 10月 16, 2012

The only difference between FNAME(update_pte) and FNAME(pte_prefetch)
is that the former is allowed to prefetch gfn from dirty logged slot,
so introduce a common function to prefetch spte
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bd6360cc

KVM: MMU: move prefetch_invalid_gpte out of pagaing_tmp.h · a052b42b

由 Xiao Guangrong 提交于 10月 16, 2012

The function does not depend on guest mmu mode, move it out from
paging_tmpl.h
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a052b42b

KVM: MMU: cleanup FNAME(page_fault) · d4878f24

由 Xiao Guangrong 提交于 10月 16, 2012

Let it return emulate state instead of spte like __direct_map
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d4878f24

KVM: MMU: remove mmu_is_invalid · bd660776

由 Xiao Guangrong 提交于 10月 16, 2012

Remove mmu_is_invalid and use is_invalid_pfn instead
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bd660776

09 10月, 2012 2 次提交

KVM: x86: Make emulator_fix_hypercall static · b6785def

由 Jan Kiszka 提交于 9月 20, 2012

No users outside of kvm/x86.c.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b6785def

KVM: x86: Convert kvm_arch_vcpu_reset into private kvm_vcpu_reset · 8b6e4547

由 Jan Kiszka 提交于 9月 20, 2012

There are no external callers of this function as there is no concept of
resetting a vcpu from generic code.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8b6e4547

23 9月, 2012 2 次提交

KVM: x86: Fix guest debug across vcpu INIT reset · c8639010

由 Jan Kiszka 提交于 9月 21, 2012

If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.

Fix this by saving the dr7 used for guest debugging and calculating the
effective register value as well as switch_db_regs on any potential
change. This will change to focus of the set_guest_debug vendor op to
update_dp_bp_intercept.

Found while trying to stop on start_secondary.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c8639010

KVM: Add resampling irqfds for level triggered interrupts · 7a84428a

由 Alex Williamson 提交于 9月 21, 2012

To emulate level triggered interrupts, add a resample option to
KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
the user when the irqchip has been resampled by the VM.  This may, for
instance, indicate an EOI.  Also in this mode, posting of an interrupt
through an irqfd only asserts the interrupt.  On resampling, the
interrupt is automatically de-asserted prior to user notification.
This enables level triggered interrupts to be posted and re-enabled
from vfio with no userspace intervention.

All resampling irqfds can make use of a single irq source ID, so we
reserve a new one for this interface.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7a84428a

22 9月, 2012 1 次提交

x86, kvm: fix kvm's usage of kernel_fpu_begin/end() · b1a74bf8

由 Suresh Siddha 提交于 9月 20, 2012

Preemption is disabled between kernel_fpu_begin/end() and as such
it is not a good idea to use these routines in kvm_load/put_guest_fpu()
which can be very far apart.

kvm_load/put_guest_fpu() routines are already called with
preemption disabled and KVM already uses the preempt notifier to save
the guest fpu state using kvm_put_guest_fpu().

So introduce __kernel_fpu_begin/end() routines which don't touch
preemption and use them instead of kernel_fpu_begin/end()
for KVM's use model of saving/restoring guest FPU state.

Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit
state in the case of VMX. For eagerFPU case, host cr0.TS is always clear.
So no need to worry about it. For the traditional lazyFPU restore case,
change the cr0.TS bit for the host state during vm-exit to be always clear
and cr0.TS bit is set in the __vmx_load_host_state() when the FPU
(guest FPU or the host task's FPU) state is not active. This ensures
that the host/guest FPU state is properly saved, restored
during context-switch and with interrupts (using irq_fpu_usable()) not
stomping on the active FPU state.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b1a74bf8

21 9月, 2012 1 次提交

KVM: x86: Export svm/vmx exit code and vector code to userspace · 26bf264e

由 Xiao Guangrong 提交于 9月 17, 2012

Exporting KVM exit information to userspace to be consumed by perf.
Signed-off-by: NDong Hao <haodong@linux.vnet.ibm.com>
[ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ]
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1347870675-31495-2-git-send-email-haodong@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

26bf264e

20 9月, 2012 8 次提交

KVM: optimize apic interrupt delivery · 1e08ec4a

由 Gleb Natapov 提交于 9月 13, 2012

Most interrupt are delivered to only one vcpu. Use pre-build tables to
find interrupt destination instead of looping through all vcpus. In case
of logical mode loop only through vcpus in a logical cluster irq is sent
to.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1e08ec4a

KVM: MMU: Eliminate pointless temporary 'ac' · c5421519

由 Avi Kivity 提交于 9月 19, 2012

'ac' essentially reconstructs the 'access' variable we already
have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK.  As
these are not used by callees, just use 'access' directly.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c5421519

KVM: MMU: Avoid access/dirty update loop if all is well · b514c30f

由 Avi Kivity 提交于 9月 16, 2012

Keep track of accessed/dirty bits; if they are all set, do not
enter the accessed/dirty update loop.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b514c30f

KVM: MMU: Eliminate eperm temporary · 71331a1d

由 Avi Kivity 提交于 9月 16, 2012

'eperm' is no longer used in the walker loop, so we can eliminate it.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

71331a1d

KVM: MMU: Optimize is_last_gpte() · 6fd01b71

由 Avi Kivity 提交于 9月 12, 2012

Instead of branchy code depending on level, gpte.ps, and mmu configuration,
prepare everything in a bitmap during mode changes and look it up during
runtime.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6fd01b71

KVM: MMU: Simplify walk_addr_generic() loop · 13d22b6a

由 Avi Kivity 提交于 9月 12, 2012

The page table walk is coded as an infinite loop, with a special
case on the last pte.

Code it as an ordinary loop with a termination condition on the last
pte (large page or walk length exhausted), and put the last pte handling
code after the loop where it belongs.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

13d22b6a

KVM: MMU: Optimize pte permission checks · 97d64b78

由 Avi Kivity 提交于 9月 12, 2012

walk_addr_generic() permission checks are a maze of branchy code, which is
performed four times per lookup.  It depends on the type of access, efer.nxe,
cr0.wp, cr4.smep, and in the near future, cr4.smap.

Optimize this away by precalculating all variants and storing them in a
bitmap.  The bitmap is recalculated when rarely-changing variables change
(cr0, cr4) and is indexed by the often-changing variables (page fault error
code, pte access permissions).

The permission check is moved to the end of the loop, otherwise an SMEP
fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
Noted by Xiao Guangrong.

The result is short, branch-free code.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

97d64b78

KVM: MMU: Update accessed and dirty bits after guest pagetable walk · 8cbc7069

由 Avi Kivity 提交于 9月 16, 2012

While unspecified, the behaviour of Intel processors is to first
perform the page table walk, then, if the walk was successful, to
atomically update the accessed and dirty bits of walked paging elements.

While we are not required to follow this exactly, doing so will allow us
to perform the access permissions check after the walk is complete, rather
than after each walk step.

(the tricky case is SMEP: a zero in any pte's U bit makes the referenced
page a supervisor page, so we can't fault on a one bit during the walk
itself).
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8cbc7069

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功