提交 · 2dea4c84bc936731668b5a7a9fba5b436a422668 · OpenHarmony / kernel_linux

10 6月, 2009 1 次提交

KVM: x86: silence preempt warning on kvm_write_guest_time · 2dea4c84

由 Matt T. Yourst 提交于 2月 24, 2009

This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64)
with PREEMPT enabled.

We're getting syslog warnings like this many (but not all) times qemu
tells KVM to run the VCPU:

BUG: using smp_processor_id() in preemptible [00000000] code:
qemu-system-x86/28938
caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit
Call Trace:
debug_smp_processor_id+0xf7/0x100
kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
? __wake_up+0x4e/0x70
? wake_futex+0x27/0x40
kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm]
enqueue_hrtimer+0x8a/0x110
_spin_unlock_irqrestore+0x27/0x50
vfs_ioctl+0x31/0xa0
do_vfs_ioctl+0x74/0x480
sys_futex+0xb4/0x140
sys_ioctl+0x99/0xa0
system_call_fastpath+0x16/0x1b

As it turns out, the call trace is messed up due to gcc's inlining, but
I isolated the problem anyway: kvm_write_guest_time() is being used in a
non-thread-safe manner on preemptable kernels.

Basically kvm_write_guest_time()'s body needs to be surrounded by
preempt_disable() and preempt_enable(), since the kernel won't let us
query any per-CPU data (indirectly using smp_processor_id()) without
preemption disabled. The attached patch fixes this issue by disabling
preemption inside kvm_write_guest_time().

[marcelo: surround only __get_cpu_var calls since the warning
is harmless]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2dea4c84

26 5月, 2009 1 次提交

KVM: Fix PDPTR reloading on CR4 writes · a2edf57f

由 Avi Kivity 提交于 5月 24, 2009

The processor is documented to reload the PDPTRs while in PAE mode if any
of the CR4 bits PSE, PGE, or PAE change.  Linux relies on this
behaviour when zapping the low mappings of PAE kernels during boot.

The code already handled changes to CR4.PAE; augment it to also notice changes
to PSE and PGE.

This triggered while booting an F11 PAE kernel; the futex initialization code
runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem
ended up uninitialized, killing PI futexes and pulseaudio which uses them.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

a2edf57f

11 5月, 2009 2 次提交

KVM: Make EFER reads safe when EFER does not exist · e286e86e

由 Avi Kivity 提交于 5月 03, 2009

Some processors don't have EFER; don't oops if userspace wants us to
read EFER when we check NX.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

e286e86e

A
KVM: Fix NX support reporting · 334b8ad7
由 Avi Kivity 提交于 5月 03, 2009
```
NX support is bit 20, not bit 1.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
334b8ad7

22 4月, 2009 2 次提交

KVM: Unregister cpufreq notifier on unload · 888d256e

由 Jan Kiszka 提交于 4月 17, 2009

Properly unregister cpufreq notifier on onload if it was registered
during init.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

888d256e

KVM: x86: release time_page on vcpu destruction · 7f1ea208

由 Joerg Roedel 提交于 2月 25, 2009

Not releasing the time_page causes a leak of that page or the compound
page it is situated in.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f1ea208

24 3月, 2009 19 次提交

KVM: fix sparse warnings: Should it be static? · cded19f3

由 Hannes Eder 提交于 2月 21, 2009

Impact: Make symbols static.

Fix this sparse warnings:
arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static?
arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static?
arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static?
virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static?
Signed-off-by: NHannes Eder <hannes@hanneseder.net>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cded19f3

KVM: Report IRQ injection status to userspace. · 4925663a

由 Gleb Natapov 提交于 2月 04, 2009

IRQ injection status is either -1 (if there was no CPU found
that should except the interrupt because IRQ was masked or
ioapic was misconfigured or ...) or >= 0 in that case the
number indicates to how many CPUs interrupt was injected.
If the value is 0 it means that the interrupt was coalesced
and probably should be reinjected.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4925663a

KVM: Fix kvmclock on !constant_tsc boxes · c8076604

由 Gerd Hoffmann 提交于 2月 04, 2009

kvmclock currently falls apart on machines without constant tsc.
This patch fixes it.  Changes:

  * keep tsc frequency in a per-cpu variable.
  * handle kvmclock update using a new request flag, thus checking
    whenever we need an update each time we enter guest context.
  * use a cpufreq notifier to track frequency changes and force
    kvmclock updates.
  * send ipis to kick cpu out of guest context if needed to make
    sure the guest doesn't see stale values.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c8076604

KVM: Add FFXSR support · 1b2fd70c

由 Alexander Graf 提交于 2月 02, 2009

AMD K10 CPUs implement the FFXSR feature that gets enabled using
EFER. Let's check if the virtual CPU description includes that
CPUID feature bit and allow enabling it then.

This is required for Windows Server 2008 in Hyper-V mode.

v2 adds CPUID capability exposure
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1b2fd70c

KVM: Userspace controlled irq routing · 399ec807

由 Avi Kivity 提交于 11月 19, 2008

Currently KVM has a static routing from GSI numbers to interrupts (namely,
0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to
the IOAPIC).  This is insufficient for several reasons:

- HPET requires non 1:1 mapping for the timer interrupt
- MSIs need a new method to assign interrupt numbers and dispatch them
- ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the
  ioapics

This patch implements an interrupt routing table (as a linked list, but this
can be easily changed) and a userspace interface to replace the table.  The
routing table is initialized according to the current hardwired mapping.
Signed-off-by: NAvi Kivity <avi@redhat.com>

399ec807

KVM: x86: Fix typos and whitespace errors · 19355475

由 Amit Shah 提交于 1月 14, 2009

Some typos, comments, whitespace errors corrected in the cpuid code
Signed-off-by: NAmit Shah <amit.shah@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

19355475

A
KVM: MMU: Only enable cr4_pge role in shadow mode · 5a41accd
由 Avi Kivity 提交于 1月 11, 2009
```
Two dimensional paging is only confused by it.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
5a41accd

KVM: Properly lock PIT creation · 269e05e4

由 Avi Kivity 提交于 1月 05, 2009

Otherwise, two threads can create a PIT in parallel and cause a memory leak.
Signed-off-by: NAvi Kivity <avi@redhat.com>

269e05e4

KVM: PIT: provide an option to disable interrupt reinjection · 52d939a0

由 Marcelo Tosatti 提交于 12月 30, 2008

Certain clocks (such as TSC) in older 2.6 guests overaccount for lost
ticks, causing severe time drift. Interrupt reinjection magnifies the
problem.

Provide an option to disable it.

[avi: allow room for expansion in case we want to disable reinjection
      of other timers]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

52d939a0

KVM: Fallback support for MSR_VM_HSAVE_PA · 61a6bd67

由 Avi Kivity 提交于 12月 29, 2008

Since we advertise MSR_VM_HSAVE_PA, userspace will attempt to read it
even on Intel.  Implement fake support for this MSR to avoid the
warnings.
Signed-off-by: NAvi Kivity <avi@redhat.com>

61a6bd67

KVM: remove the vmap usage · 0f346074

由 Izik Eidus 提交于 12月 29, 2008

vmap() on guest pages hides those pages from the Linux mm for an extended
(userspace determined) amount of time.  Get rid of it.
Signed-off-by: NIzik Eidus <ieidus@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0f346074

KVM: introduce kvm_read_guest_virt, kvm_write_guest_virt · 77c2002e

由 Izik Eidus 提交于 12月 29, 2008

This commit change the name of emulator_read_std into kvm_read_guest_virt,
and add new function name kvm_write_guest_virt that allow writing into a
guest virtual address.
Signed-off-by: NIzik Eidus <ieidus@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

77c2002e

KVM: VMX: initialize TSC offset relative to vm creation time · 53f658b3

由 Marcelo Tosatti 提交于 12月 11, 2008

VMX initializes the TSC offset for each vcpu at different times, and
also reinitializes it for vcpus other than 0 on APIC SIPI message.

This bug causes the TSC's to appear unsynchronized in the guest, even if
the host is good.

Older Linux kernels don't handle the situation very well, so
gettimeofday is likely to go backwards in time:

http://www.mail-archive.com/kvm@vger.kernel.org/msg02955.html
http://sourceforge.net/tracker/index.php?func=detail&aid=2025534&group_id=180599&atid=893831

Fix it by initializating the offset of each vcpu relative to vm creation
time, and moving it from vmx_vcpu_reset to vmx_vcpu_setup, out of the
APIC MP init path.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

53f658b3

KVM: MMU: Segregate mmu pages created with different cr4.pge settings · 2f0b3d60

由 Avi Kivity 提交于 12月 21, 2008

Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
global bit set (the global bit has no effect if !cr4.pge).

This can only occur on smp with different cr4.pge settings for different
vcpus (since a cr4 change will resync the shadow ptes), but there's no
cost to being correct here.
Signed-off-by: NAvi Kivity <avi@redhat.com>

2f0b3d60

KVM: x86: Wire-up hardware breakpoints for guest debugging · ae675ef0

由 Jan Kiszka 提交于 12月 15, 2008

Add the remaining bits to make use of debug registers also for guest
debugging, thus enabling the use of hardware breakpoints and
watchpoints.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ae675ef0

KVM: x86: Virtualize debug registers · 42dbaa5a

由 Jan Kiszka 提交于 12月 15, 2008

So far KVM only had basic x86 debug register support, once introduced to
realize guest debugging that way. The guest itself was not able to use
those registers.

This patch now adds (almost) full support for guest self-debugging via
hardware registers. It refactors the code, moving generic parts out of
SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and
it ensures that the registers are properly switched between host and
guest.

This patch also prepares debug register usage by the host. The latter
will (once wired-up by the following patch) allow for hardware
breakpoints/watchpoints in guest code. If this is enabled, the guest
will only see faked debug registers without functionality, but with
content reflecting the guest's modifications.

Tested on Intel only, but SVM /should/ work as well, but who knows...

Known limitations: Trapping on tss switch won't work - most probably on
Intel.

Credits also go to Joerg Roedel - I used his once posted debugging
series as platform for this patch.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

42dbaa5a

KVM: New guest debug interface · d0bfb940

由 Jan Kiszka 提交于 12月 15, 2008

This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.

To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.

The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.

Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d0bfb940

KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set · d8017474

由 Alexander Graf 提交于 11月 25, 2008

Userspace has to tell the kernel module somehow that nested SVM should be used.
The easiest way that doesn't break anything I could think of is to implement

if (cpuid & svm)
    allow write to efer
else
    deny write to efer

Old userspaces mask the SVM capability bit, so they don't break.
In order to find out that the SVM capability is set, I had to split the
kvm_emulate_cpuid into a finding and an emulating part.

(introduced in v6)
Acked-by: NJoerg Roedel <joro@8bytes.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d8017474

KVM: SVM: Implement hsave · b286d5d8

由 Alexander Graf 提交于 11月 25, 2008

Implement the hsave MSR, that gives the VCPU a GPA to save the
old guest state in.

v2 allows userspace to save/restore hsave
v4 dummys out the hsave MSR, so we use a host page
v6 remembers the guest's hsave and exports the MSR
Acked-by: NJoerg Roedel <joro@8bytes.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b286d5d8

15 2月, 2009 3 次提交

KVM: x86: disable kvmclock on non constant TSC hosts · abe6655d

由 Marcelo Tosatti 提交于 2月 10, 2009

This is better.

Currently, this code path is posing us big troubles,
and we won't have a decent patch in time. So, temporarily
disable it.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

abe6655d

KVM: Fix racy in kvm_free_assigned_irq · ba4cef31

由 Sheng Yang 提交于 1月 06, 2009

In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq
handler and interrupt_work, in order to prevent cancel_work_sync() in
kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done.
But it's tricky and still got two problems:

1. A bug ignored two conditions that cancel_work_sync() would return true result
in a additional kvm_put_kvm().

2. If interrupt type is MSI, we would got a window between cancel_work_sync()
and free_irq(), which interrupt would be injected again...

This patch discard the reference count used for irq handler and interrupt_work,
and ensure the legal state by moving the free function at the very beginning of
kvm_destroy_vm(). And the patch fix the second bug by disable irq before
cancel_work_sync(), which may result in nested disable of irq but OK for we are
going to free it.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ba4cef31

KVM: Add kvm_arch_sync_events to sync with asynchronize events · ad8ba2cd

由 Sheng Yang 提交于 1月 06, 2009

kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.

For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ad8ba2cd

03 1月, 2009 1 次提交
- J
  KVM: change KVM to use IOMMU API · 19de40a8
  由 Joerg Roedel 提交于 12月 03, 2008
```
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
```
  19de40a8
31 12月, 2008 11 次提交

KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI · 4531220b

由 Jan Kiszka 提交于 12月 11, 2008

There is no point in doing the ready_for_nmi_injection/
request_nmi_window dance with user space. First, we don't do this for
in-kernel irqchip anyway, while the code path is the same as for user
space irqchip mode. And second, there is nothing to loose if a pending
NMI is overwritten by another one (in contrast to IRQs where we have to
save the number). Actually, there is even the risk of raising spurious
NMIs this way because the reason for the held-back NMI might already be
handled while processing the first one.

Therefore this patch creates a simplified user space NMI injection
interface, exporting it under KVM_CAP_USER_NMI and dropping the old
KVM_CAP_NMI capability. And this time we also take care to provide the
interface only on archs supporting NMIs via KVM (right now only x86).
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4531220b

A
KVM: Consolidate userspace memory capability reporting into common code · ca9edaee
由 Avi Kivity 提交于 12月 08, 2008
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
ca9edaee

KVM: MMU: prepopulate the shadow on invlpg · ad218f85

由 Marcelo Tosatti 提交于 12月 01, 2008

If the guest executes invlpg, peek into the pagetable and attempt to
prepopulate the shadow entry.

Also stop dirty fault updates from interfering with the fork detector.

2% improvement on RHEL3/AIM7.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ad218f85

KVM: MMU: skip global pgtables on sync due to cr3 switch · 6cffe8ca

由 Marcelo Tosatti 提交于 12月 01, 2008

Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
important for Linux 32-bit guests with PAE, where the kmap page is
marked as global.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6cffe8ca

KVM: Fix cpuid iteration on multiple leaves per eac · 0fdf8e59

由 Nitin A Kamble 提交于 11月 05, 2008

The code to traverse the cpuid data array list for counting type of leaves is
currently broken.

This patches fixes the 2 things in it.

 1. Set the 1st counting entry's flag KVM_CPUID_FLAG_STATE_READ_NEXT. Without
    it the code will never find a valid entry.

 2. Also the stop condition in the for loop while looking for the next unflaged
    entry is broken. It needs to stop when it find one matching entry;
    and in the case of count of 1, it will be the same entry found in this
    iteration.
Signed-Off-By: NNitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0fdf8e59

KVM: Fix cpuid leaf 0xb loop termination · 0853d2c1

由 Nitin A Kamble 提交于 11月 05, 2008

For cpuid leaf 0xb the bits 8-15 in ECX register define the end of counting
leaf.      The previous code was using bits 0-7 for this purpose, which is
a bug.
Signed-off-by: NNitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0853d2c1

KVM: Enable Function Level Reset for assigned device · 6eb55818

由 Sheng Yang 提交于 10月 31, 2008

Ideally, every assigned device should in a clear condition before and after
assignment, so that the former state of device won't affect later work.
Some devices provide a mechanism named Function Level Reset, which is
defined in PCI/PCI-e document. We should execute it before and after device
assignment.

(But sadly, the feature is new, and most device on the market now don't
support it. We are considering using D0/D3hot transmit to emulate it later,
but not that elegant and reliable as FLR itself.)

[Update: Reminded by Xiantao, execute FLR after we ensure that the device can
be assigned to the guest.]
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6eb55818

KVM: allow emulator to adjust rip for emulated pio instructions · e93f36bc

由 Guillaume Thouvenin 提交于 10月 28, 2008

If we call the emulator we shouldn't call skip_emulated_instruction()
in the first place, since the emulator already computes the next rip
for us. Thus we move ->skip_emulated_instruction() out of
kvm_emulate_pio() and into handle_io() (and the svm equivalent). We
also replaced "return 0" by "break" in the "do_io:" case because now
the shadow register state needs to be committed. Otherwise eip will never
be updated.
Signed-off-by: NGuillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e93f36bc

KVM: x86: Fix typo in function name · b8222ad2

由 Amit Shah 提交于 10月 22, 2008

get_segment_descritptor_dtable() contains an obvious type.
Signed-off-by: NAmit Shah <amit.shah@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b8222ad2

KVM: Enable MTRR for EPT · 64d4d521

由 Sheng Yang 提交于 10月 09, 2008

The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory
type field of EPT entry.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

64d4d521

KVM: VMX: Add PAT support for EPT · 468d472f

由 Sheng Yang 提交于 10月 09, 2008

GUEST_PAT support is a new feature introduced by Intel Core i7 architecture.
With this, cpu would save/load guest and host PAT automatically, for EPT memory
type in guest depends on MSR_IA32_CR_PAT.

Also add save/restore for MSR_IA32_CR_PAT.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

468d472f

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多