提交 · c2bae8939487c101b0516c4c8ad80f543b22bb29 · openeuler / raspberrypi-kernel

18 7月, 2013 2 次提交

KVM: x86: Avoid zapping mmio sptes twice for generation wraparound · e6dff7d1

由 Takuya Yoshikawa 提交于 7月 04, 2013

Now that kvm_arch_memslots_updated() catches every increment of the
memslots->generation, checking if the mmio generation has reached its
maximum value is enough.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e6dff7d1

KVM: Introduce kvm_arch_memslots_updated() · e59dbe09

由 Takuya Yoshikawa 提交于 7月 04, 2013

This is called right after the memslots is updated, i.e. when the result
of update_memslots() gets installed in install_new_memslots().  Since
the memslots needs to be updated twice when we delete or move a memslot,
kvm_arch_commit_memory_region() does not correspond to this exactly.

In the following patch, x86 will use this new API to check if the mmio
generation has reached its maximum value, in which case mmio sptes need
to be flushed out.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Acked-by: NAlexander Graf <agraf@suse.de>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e59dbe09

27 6月, 2013 2 次提交

kvm: Add a tracepoint write_tsc_offset · 489223ed

由 Yoshihiro YUNOMAE 提交于 6月 12, 2013

Add a tracepoint write_tsc_offset for tracing TSC offset change.
We want to merge ftrace's trace data of guest OSs and the host OS using
TSC for timestamp in chronological order. We need "TSC offset" values for
each guest when merge those because the TSC value on a guest is always the
host TSC plus guest's TSC offset. If we get the TSC offset values, we can
calculate the host TSC value for each guest events from the TSC offset and
the event TSC value. The host TSC values of the guest events are used when we
want to merge trace data of guests and the host in chronological order.
(Note: the trace_clock of both the host and the guest must be set x86-tsc in
this case)

This tracepoint also records vcpu_id which can be used to merge trace data for
SMP guests. A merge tool will read TSC offset for each vcpu, then the tool
converts guest TSC values to host TSC values for each vcpu.

TSC offset is stored in the VMCS by vmx_write_tsc_offset() or
vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots.
The latter function is executed when kvm clock is updated. Only host can read
TSC offset value from VMCS, so a host needs to output TSC offset value
when TSC offset is changed.

Since the TSC offset is not often changed, it could be overwritten by other
frequent events while tracing. To avoid that, I recommend to use a special
instance for getting this event:

1. set a instance before booting a guest
 # cd /sys/kernel/debug/tracing/instances
 # mkdir tsc_offset
 # cd tsc_offset
 # echo x86-tsc > trace_clock
 # echo 1 > events/kvm/kvm_write_tsc_offset/enable

2. boot a guest
Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

489223ed

KVM: MMU: fast invalidate all mmio sptes · f8f55942

由 Xiao Guangrong 提交于 6月 07, 2013

This patch tries to introduce a very simple and scale way to invalidate
all mmio sptes - it need not walk any shadow pages and hold mmu-lock

KVM maintains a global mmio valid generation-number which is stored in
kvm->memslots.generation and every mmio spte stores the current global
generation-number into his available bits when it is created

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global generation-number,
it will go to the normal #PF handler to update the mmio spte

Since 19 bits are used to store generation-number on mmio spte, we zap all
mmio sptes when the number is round
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8f55942

26 6月, 2013 1 次提交

x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE · afcbf13f

由 H. Peter Anvin 提交于 4月 27, 2013

Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE to match the SDM.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Link: http://lkml.kernel.org/n/tip-buq1evi5dpykxx7ak6amaam0@git.kernel.org

afcbf13f

21 6月, 2013 1 次提交

KVM: MMU: retain more available bits on mmio spte · 885032b9

由 Xiao Guangrong 提交于 6月 07, 2013

Let mmio spte only use bit62 and bit63 on upper 32 bits, then bit 52 ~ bit 61
can be used for other purposes
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

885032b9

18 6月, 2013 1 次提交

KVM: x86: remove vcpu's CPL check in host-invoked XCR set · 764bcbc5

由 Zhanghaoyu (A) 提交于 6月 14, 2013

__kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr is
called in two flows, one is invoked by guest, call stack shown as below,

  handle_xsetbv(or xsetbv_interception)
    kvm_set_xcr
      __kvm_set_xcr

the other one is invoked by host, for example during system reset:

  kvm_arch_vcpu_ioctl
    kvm_vcpu_ioctl_x86_set_xcrs
      __kvm_set_xcr

The former does need the CPL check, but the latter does not.

Cc: stable@vger.kernel.org
Signed-off-by: NZhang Haoyu <haoyu.zhang@huawei.com>
[Tweaks to commit message. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

764bcbc5

12 6月, 2013 1 次提交

KVM: x86: handle idiv overflow at kvm_write_tsc · 8915aa27

由 Marcelo Tosatti 提交于 6月 11, 2013

Its possible that idivl overflows (due to large delta stored in usdiff,
valid scenario).

Create an exception handler to catch the overflow exception (division by zero
is protected by vcpu->arch.virtual_tsc_khz check), and interpret it accordingly
(delta is larger than USEC_PER_SEC).

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=969644Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8915aa27

05 6月, 2013 4 次提交

KVM: MMU: reclaim the zapped-obsolete page first · 365c8868

由 Xiao Guangrong 提交于 5月 31, 2013

As Marcelo pointed out that
| "(retention of large number of pages while zapping)
| can be fatal, it can lead to OOM and host crash"

We introduce a list, kvm->arch.zapped_obsolete_pages, to link all
the pages which are deleted from the mmu cache but not actually
freed. When page reclaiming is needed, we always zap this kind of
pages first.
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

365c8868

KVM: x86: use the fast way to invalidate all pages · 6ca18b69

由 Xiao Guangrong 提交于 5月 31, 2013

Replace kvm_mmu_zap_all by kvm_mmu_invalidate_zap_all_pages
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

6ca18b69

KVM: MMU: drop unnecessary kvm_reload_remote_mmus · a2ae1622

由 Xiao Guangrong 提交于 5月 31, 2013

It is the responsibility of kvm_mmu_zap_all that keeps the
consistent of mmu and tlbs. And it is also unnecessary after
zap all mmio sptes since no mmio spte exists on root shadow
page and it can not be cached into tlb
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

a2ae1622

KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall · 758ccc89

由 Xiao Guangrong 提交于 5月 31, 2013

Quote Gleb's mail:

| Back then kvm->lock protected memslot access so code like:
|
| mutex_lock(&vcpu->kvm->lock);
| kvm_mmu_zap_all(vcpu->kvm);
| mutex_unlock(&vcpu->kvm->lock);
|
| which is what 7aa81cc0 does was enough to guaranty that no vcpu will
| run while code is patched. This is no longer the case and
| mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago,
| so now kvm_mmu_zap_all() there is useless and the code is incorrect.

So we drop it and it will be fixed later
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

758ccc89

16 5月, 2013 1 次提交

KVM: x86: limit difference between kvmclock updates · 0061d53d

由 Marcelo Tosatti 提交于 5月 09, 2013

kvmclock updates which are isolated to a given vcpu, such as vcpu->cpu
migration, should not allow system_timestamp from the rest of the vcpus
to remain static. Otherwise ntp frequency correction applies to one
vcpu's system_timestamp but not the others.

So in those cases, request a kvmclock update for all vcpus. The worst
case for a remote vcpu to update its kvmclock is then bounded by maximum
nohz sleep latency.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0061d53d

08 5月, 2013 1 次提交

KVM: x86: fix maintenance of guest/host xcr0 state · 42bdf991

由 Marcelo Tosatti 提交于 4月 15, 2013

Emulation of xcr0 writes zero guest_xcr0_loaded variable so that
subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value.

However, this is incorrect because guest_xcr0_loaded variable is
read to decide whether to reload hosts xcr0.

In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0
assignment, and scheduler decides to preload FPU:

switch_to
{
  __switch_to
    __math_state_restore
      restore_fpu_checking
        fpu_restore_checking
          if (use_xsave())
              fpu_xrstor_checking
		xrstor64 with CPU's xcr0 == guests xcr0

Fix by properly restoring hosts xcr0 during emulation of xcr0 writes.
Analyzed-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

42bdf991

03 5月, 2013 1 次提交

KVM: x86: Account for failing enable_irq_window for NMI window request · 03b28f81

由 Jan Kiszka 提交于 4月 29, 2013

With VMX, enable_irq_window can now return -EBUSY, in which case an
immediate exit shall be requested before entering the guest. Account for
this also in enable_nmi_window which uses enable_irq_window in absence
of vnmi support, e.g.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

03b28f81

30 4月, 2013 1 次提交

kvm: KVM_CAP_IOMMU only available with device assignment · 4cee4b72

由 Alex Williamson 提交于 4月 29, 2013

Fix build with CONFIG_PCI unset by linking KVM_CAP_IOMMU to
device assignment config option.  It has no purpose otherwise.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4cee4b72

28 4月, 2013 2 次提交

kvm: Allow build-time configuration of KVM device assignment · 2a5bab10

由 Alex Williamson 提交于 4月 16, 2013

We hope to at some point deprecate KVM legacy device assignment in
favor of VFIO-based assignment.  Towards that end, allow legacy
device assignment to be deconfigured.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NAlexander Graf <agraf@suse.de>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

2a5bab10

KVM: x86: Rework request for immediate exit · 730dca42

由 Jan Kiszka 提交于 4月 28, 2013

The VMX implementation of enable_irq_window raised
KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This
caused infinite loops on vmentry. Fix it by letting enable_irq_window
signal the need for an immediate exit via its return value and drop
KVM_REQ_IMMEDIATE_EXIT.

This issue only affects nested VMX scenarios.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

730dca42

27 4月, 2013 1 次提交

KVM: Move irqfd resample cap handling to generic code · 7df35f54

由 Alexander Graf 提交于 4月 16, 2013

Now that we have most irqfd code completely platform agnostic, let's move
irqfd's resample capability return to generic code as well.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>

7df35f54

22 4月, 2013 3 次提交

KVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFER · 384bb783

由 Jan Kiszka 提交于 4月 20, 2013

As we may emulate the loading of EFER on VM-entry and VM-exit, implement
the checks that VMX performs on the guest and host values on vmlaunch/
vmresume. Factor out kvm_valid_efer for this purpose which checks for
set reserved bits.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

384bb783

KVM: x86: Fix memory leak in vmx.c · 27469d29

由 Andrew Honig 提交于 4月 18, 2013

If userspace creates and destroys multiple VMs within the same process
we leak 20k of memory in the userspace process context per VM.  This
patch frees the memory in kvm_arch_destroy_vm.  If the process exits
without closing the VM file descriptor or the file descriptor has been
shared with another process then we don't free the memory.

It's still possible for a user space process to leak memory if the last
process to close the fd for the VM is not the process that created it.
However, this is an unexpected case that's only caused by a user space
process that's misbehaving.
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

27469d29

KVM: x86: fix error return code in kvm_arch_vcpu_init() · f1797359

由 Wei Yongjun 提交于 4月 18, 2013

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f1797359

17 4月, 2013 4 次提交

KVM: VMX: Use posted interrupt to deliver virtual interrupt · 5a71785d

由 Yang Zhang 提交于 4月 11, 2013

If posted interrupt is avaliable, then uses it to inject virtual
interrupt to guest.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5a71785d

KVM: Set TMR when programming ioapic entry · cf9e65b7

由 Yang Zhang 提交于 4月 11, 2013

We already know the trigger mode of a given interrupt when programming
the ioapice entry. So it's not necessary to set it in each interrupt
delivery.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cf9e65b7

KVM: Call common update function when ioapic entry changed. · 3d81bc7e

由 Yang Zhang 提交于 4月 11, 2013

Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3d81bc7e

KVM: VMX: Enable acknowledge interupt on vmexit · a547c6db

由 Yang Zhang 提交于 4月 11, 2013

The "acknowledge interrupt on exit" feature controls processor behavior
for external interrupt acknowledgement. When this control is set, the
processor acknowledges the interrupt controller to acquire the
interrupt vector on VM exit.

After enabling this feature, an interrupt which arrived when target cpu is
running in vmx non-root mode will be handled by vmx handler instead of handler
in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
table to let real handler to handle it. Further, we will recognize the interrupt
and only delivery the interrupt which not belong to current vcpu through idt table.
The interrupt which belonged to current vcpu will be handled inside vmx handler.
This will reduce the interrupt handle cost of KVM.

Also, interrupt enable logic is changed if this feature is turnning on:
Before this patch, hypervior call local_irq_enable() to enable it directly.
Now IF bit is set on interrupt stack frame, and will be enabled on a return from
interrupt handler if exterrupt interrupt exists. If no external interrupt, still
call local_irq_enable() to enable it.

Refer to Intel SDM volum 3, chapter 33.2.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a547c6db

16 4月, 2013 1 次提交

KVM: Let ioapic know the irq line status · aa2fbe6d

由 Yang Zhang 提交于 4月 11, 2013

Userspace may deliver RTC interrupt without query the status. So we
want to track RTC EOI for this case.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

aa2fbe6d

14 4月, 2013 1 次提交

KVM: VMX: do not try to reexecute failed instruction while emulating invalid guest state · 991eebf9

由 Gleb Natapov 提交于 4月 11, 2013

During invalid guest state emulation vcpu cannot enter guest mode to try
to reexecute instruction that emulator failed to emulate, so emulation
will happen again and again. Prevent that by telling the emulator that
instruction reexecution should not be attempted.
Signed-off-by: NGleb Natapov <gleb@redhat.com>

991eebf9

08 4月, 2013 1 次提交

KVM: Move kvm_spurious_fault to x86.c · e3ba45b8

由 Geoff Levand 提交于 4月 05, 2013

The routine kvm_spurious_fault() is an x86 specific routine, so
move it from virt/kvm/kvm_main.c to arch/x86/kvm/x86.c.

Fixes this sparse warning when building on arm64:

  virt/kvm/kvm_main.c:warning: symbol 'kvm_spurious_fault' was not declared. Should it be static?
Signed-off-by: NGeoff Levand <geoff@infradead.org>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e3ba45b8

07 4月, 2013 1 次提交

KVM: Allow cross page reads and writes from cached translations. · 8f964525

由 Andrew Honig 提交于 3月 29, 2013

This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page.  If the range falls within
the same memslot, then this will be a fast operation.  If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.

Tested: Test against kvm_clock unit tests.
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8f964525

02 4月, 2013 1 次提交

pmu: prepare for migration support · afd80d85

由 Paolo Bonzini 提交于 3月 28, 2013

In order to migrate the PMU state correctly, we need to restore the
values of MSR_CORE_PERF_GLOBAL_STATUS (a read-only register) and
MSR_CORE_PERF_GLOBAL_OVF_CTRL (which has side effects when written).
We also need to write the full 40-bit value of the performance counter,
which would only be possible with a v3 architectural PMU's full-width
counter MSRs.

To distinguish host-initiated writes from the guest's, pass the
full struct msr_data to kvm_pmu_set_msr.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

afd80d85

20 3月, 2013 2 次提交

KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) · 0b79459b

由 Andy Honig 提交于 2月 20, 2013

There is a potential use after free issue with the handling of
MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable
memory such as frame buffers then KVM might continue to write to that
address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins
the page in memory so it's unlikely to cause an issue, but if the user
space component re-purposes the memory previously used for the guest, then
the guest will be able to corrupt that memory.

Tested: Tested against kvmclock unit test
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0b79459b

KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) · c300aa64

由 Andy Honig 提交于 3月 11, 2013

If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page.  The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls.  Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.

Tested: Tested against kvmclock unit test.
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c300aa64

19 3月, 2013 1 次提交

KVM: x86: fix deadlock in clock-in-progress request handling · c09664bb

由 Marcelo Tosatti 提交于 3月 18, 2013

There is a deadlock in pvclock handling:

cpu0:                                               cpu1:
kvm_gen_update_masterclock()
                                              kvm_guest_time_update()
 spin_lock(pvclock_gtod_sync_lock)
                                               local_irq_save(flags)

spin_lock(pvclock_gtod_sync_lock)

 kvm_make_mclock_inprogress_request(kvm)
  make_all_cpus_request()
   smp_call_function_many()

Now if smp_call_function_many() called by cpu0 tries to call function on
cpu1 there will be a deadlock.

Fix by moving pvclock_gtod_sync_lock protected section outside irq
disabled section.

Analyzed by Gleb Natapov <gleb@redhat.com>
Acked-by: NGleb Natapov <gleb@redhat.com>
Reported-and-Tested-by: NYongjie Ren <yongjie.ren@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c09664bb

14 3月, 2013 1 次提交

KVM: x86: Optimize mmio spte zapping when creating/moving memslot · 982b3394

由 Takuya Yoshikawa 提交于 3月 12, 2013

When we create or move a memory slot, we need to zap mmio sptes.
Currently, zap_all() is used for this and this is causing two problems:
 - extra page faults after zapping mmu pages
 - long mmu_lock hold time during zapping mmu pages

For the latter, Marcelo reported a disastrous mmu_lock hold time during
hot-plug, which made the guest unresponsive for a long time.

This patch takes a simple way to fix these problems: do not zap mmu
pages unless they are marked mmio cached.  On our test box, this took
only 50us for the 4GB guest and we did not see ms of mmu_lock hold time
any more.

Note that we still need to do zap_all() for other cases.  So another
work is also needed: Xiao's work may be the one.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

982b3394

13 3月, 2013 1 次提交

KVM: x86: Rework INIT and SIPI handling · 66450a21

由 Jan Kiszka 提交于 3月 13, 2013

A VCPU sending INIT or SIPI to some other VCPU races for setting the
remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
was overwritten by kvm_emulate_halt and, thus, got lost.

This introduces APIC events for those two signals, keeping them in
kvm_apic until kvm_apic_accept_events is run over the target vcpu
context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
are pending events, thus if vcpu blocking should end.

The patch comes with the side effect of effectively obsoleting
KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
state. That also means we no longer exit to user space after receiving a
SIPI event.

Furthermore, we already reset the VCPU on INIT, only fixing up the code
segment later on when SIPI arrives. Moreover, we fix INIT handling for
the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

66450a21

12 3月, 2013 2 次提交

KVM: x86: Drop unused return code from VCPU reset callback · 57f252f2

由 Jan Kiszka 提交于 3月 12, 2013

Neither vmx nor svm nor the common part may generate an error on
kvm_vcpu_reset. So drop the return code.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

57f252f2

VMX: x86: handle host TSC calibration failure · 03ba32ca

由 Marcelo Tosatti 提交于 3月 11, 2013

If the host TSC calibration fails, tsc_khz is zero (see tsc_init.c).
Handle such case properly in KVM (instead of dividing by zero).

https://bugzilla.redhat.com/show_bug.cgi?id=859282Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

03ba32ca

05 3月, 2013 2 次提交

KVM: set_memory_region: Refactor commit_memory_region() · 8482644a

由 Takuya Yoshikawa 提交于 2月 27, 2013

This patch makes the parameter old a const pointer to the old memory
slot and adds a new parameter named change to know the change being
requested: the former is for removing extra copying and the latter is
for cleaning up the code.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8482644a

KVM: set_memory_region: Refactor prepare_memory_region() · 7b6195a9

由 Takuya Yoshikawa 提交于 2月 27, 2013

This patch drops the parameter old, a copy of the old memory slot, and
adds a new parameter named change to know the change being requested.

This not only cleans up the code but also removes extra copying of the
memory slot structure.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7b6195a9