提交 · 20c466b56168ddccf034c136510d73e4a0e18605 · Linux-御风守护者 / linux

10 6月, 2009 9 次提交

KVM: Use rsvd_bits_mask in load_pdptrs() · 20c466b5

由 Dong, Eddie 提交于 3月 31, 2009

Also remove bit 5-6 from rsvd_bits_mask per latest SDM.
Signed-off-by: NEddie Dong <Eddie.Dong@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

20c466b5

KVM: remove pointless conditional before kfree() in lapic initialization · 7a6ce84c

由 Wei Yongjun 提交于 3月 31, 2009

Remove pointless conditional before kfree().
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7a6ce84c

KVM: MMU: Use different shadows when EFER.NXE changes · 9645bb56

由 Avi Kivity 提交于 3月 31, 2009

A pte that is shadowed when the guest EFER.NXE=1 is not valid when
EFER.NXE=0; if bit 63 is set, the pte should cause a fault, and since the
shadow EFER always has NX enabled, this won't happen.

Fix by using a different shadow page table for different EFER.NXE bits.  This
allows vcpus to run correctly with different values of EFER.NXE, and for
transitions on this bit to be handled correctly without requiring a full
flush.
Signed-off-by: NAvi Kivity <avi@redhat.com>

9645bb56

KVM: MMU: Emulate #PF error code of reserved bits violation · 82725b20

由 Dong, Eddie 提交于 3月 30, 2009

Detect, indicate, and propagate page faults where reserved bits are set.
Take care to handle the different paging modes, each of which has different
sets of reserved bits.

[avi: fix pte reserved bits for efer.nxe=0]
Signed-off-by: NEddie Dong <eddie.dong@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

82725b20

KVM: Fix interrupt unhalting a vcpu when it shouldn't · 78646121

由 Gleb Natapov 提交于 3月 23, 2009

kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
if interrupt window is actually opened.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

78646121

KVM: Timer event should not unconditionally unhalt vcpu. · 09cec754

由 Gleb Natapov 提交于 3月 23, 2009

Currently timer events are processed before entering guest mode. Move it
to main vcpu event loop since timer events should be processed even while
vcpu is halted.  Timer may cause interrupt/nmi to be injected and only then
vcpu will be unhalted.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

09cec754

KVM: x86: Ignore reads to EVNTSEL MSRs · 7fe29e0f

由 Amit Shah 提交于 3月 20, 2009

We ignore writes to the performance counters and performance event
selector registers already. Kaspersky antivirus reads the eventsel
MSR causing it to crash with the current behaviour.

Return 0 as data when the eventsel registers are read to stop the
crash.
Signed-off-by: NAmit Shah <amit.shah@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7fe29e0f

KVM: Device assignment framework rework · e56d532f

由 Sheng Yang 提交于 3月 12, 2009

After discussion with Marcelo, we decided to rework device assignment framework
together. The old problems are kernel logic is unnecessary complex. So Marcelo
suggest to split it into a more elegant way:

1. Split host IRQ assign and guest IRQ assign. And userspace determine the
combination. Also discard msi2intx parameter, userspace can specific
KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
enable MSI to INTx convertion.

2. Split assign IRQ and deassign IRQ. Import two new ioctls:
KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.

This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
old interface).

[avi: replace homemade bitcount() by hweight_long()]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e56d532f

KVM: x86: silence preempt warning on kvm_write_guest_time · 2dea4c84

由 Matt T. Yourst 提交于 2月 24, 2009

This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64)
with PREEMPT enabled.

We're getting syslog warnings like this many (but not all) times qemu
tells KVM to run the VCPU:

BUG: using smp_processor_id() in preemptible [00000000] code:
qemu-system-x86/28938
caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit
Call Trace:
debug_smp_processor_id+0xf7/0x100
kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
? __wake_up+0x4e/0x70
? wake_futex+0x27/0x40
kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm]
enqueue_hrtimer+0x8a/0x110
_spin_unlock_irqrestore+0x27/0x50
vfs_ioctl+0x31/0xa0
do_vfs_ioctl+0x74/0x480
sys_futex+0xb4/0x140
sys_ioctl+0x99/0xa0
system_call_fastpath+0x16/0x1b

As it turns out, the call trace is messed up due to gcc's inlining, but
I isolated the problem anyway: kvm_write_guest_time() is being used in a
non-thread-safe manner on preemptable kernels.

Basically kvm_write_guest_time()'s body needs to be surrounded by
preempt_disable() and preempt_enable(), since the kernel won't let us
query any per-CPU data (indirectly using smp_processor_id()) without
preemption disabled. The attached patch fixes this issue by disabling
preemption inside kvm_write_guest_time().

[marcelo: surround only __get_cpu_var calls since the warning
is harmless]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2dea4c84

26 5月, 2009 1 次提交

KVM: Fix PDPTR reloading on CR4 writes · a2edf57f

由 Avi Kivity 提交于 5月 24, 2009

The processor is documented to reload the PDPTRs while in PAE mode if any
of the CR4 bits PSE, PGE, or PAE change.  Linux relies on this
behaviour when zapping the low mappings of PAE kernels during boot.

The code already handled changes to CR4.PAE; augment it to also notice changes
to PSE and PGE.

This triggered while booting an F11 PAE kernel; the futex initialization code
runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem
ended up uninitialized, killing PI futexes and pulseaudio which uses them.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

a2edf57f

11 5月, 2009 2 次提交

KVM: Make EFER reads safe when EFER does not exist · e286e86e

由 Avi Kivity 提交于 5月 03, 2009

Some processors don't have EFER; don't oops if userspace wants us to
read EFER when we check NX.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

e286e86e

A
KVM: Fix NX support reporting · 334b8ad7
由 Avi Kivity 提交于 5月 03, 2009
```
NX support is bit 20, not bit 1.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
334b8ad7

22 4月, 2009 2 次提交

KVM: Unregister cpufreq notifier on unload · 888d256e

由 Jan Kiszka 提交于 4月 17, 2009

Properly unregister cpufreq notifier on onload if it was registered
during init.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

888d256e

KVM: x86: release time_page on vcpu destruction · 7f1ea208

由 Joerg Roedel 提交于 2月 25, 2009

Not releasing the time_page causes a leak of that page or the compound
page it is situated in.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f1ea208

24 3月, 2009 19 次提交

KVM: fix sparse warnings: Should it be static? · cded19f3

由 Hannes Eder 提交于 2月 21, 2009

Impact: Make symbols static.

Fix this sparse warnings:
arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static?
arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static?
arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static?
virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static?
Signed-off-by: NHannes Eder <hannes@hanneseder.net>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cded19f3

KVM: Report IRQ injection status to userspace. · 4925663a

由 Gleb Natapov 提交于 2月 04, 2009

IRQ injection status is either -1 (if there was no CPU found
that should except the interrupt because IRQ was masked or
ioapic was misconfigured or ...) or >= 0 in that case the
number indicates to how many CPUs interrupt was injected.
If the value is 0 it means that the interrupt was coalesced
and probably should be reinjected.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4925663a

KVM: Fix kvmclock on !constant_tsc boxes · c8076604

由 Gerd Hoffmann 提交于 2月 04, 2009

kvmclock currently falls apart on machines without constant tsc.
This patch fixes it.  Changes:

  * keep tsc frequency in a per-cpu variable.
  * handle kvmclock update using a new request flag, thus checking
    whenever we need an update each time we enter guest context.
  * use a cpufreq notifier to track frequency changes and force
    kvmclock updates.
  * send ipis to kick cpu out of guest context if needed to make
    sure the guest doesn't see stale values.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c8076604

KVM: Add FFXSR support · 1b2fd70c

由 Alexander Graf 提交于 2月 02, 2009

AMD K10 CPUs implement the FFXSR feature that gets enabled using
EFER. Let's check if the virtual CPU description includes that
CPUID feature bit and allow enabling it then.

This is required for Windows Server 2008 in Hyper-V mode.

v2 adds CPUID capability exposure
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1b2fd70c

KVM: Userspace controlled irq routing · 399ec807

由 Avi Kivity 提交于 11月 19, 2008

Currently KVM has a static routing from GSI numbers to interrupts (namely,
0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to
the IOAPIC).  This is insufficient for several reasons:

- HPET requires non 1:1 mapping for the timer interrupt
- MSIs need a new method to assign interrupt numbers and dispatch them
- ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the
  ioapics

This patch implements an interrupt routing table (as a linked list, but this
can be easily changed) and a userspace interface to replace the table.  The
routing table is initialized according to the current hardwired mapping.
Signed-off-by: NAvi Kivity <avi@redhat.com>

399ec807

KVM: x86: Fix typos and whitespace errors · 19355475

由 Amit Shah 提交于 1月 14, 2009

Some typos, comments, whitespace errors corrected in the cpuid code
Signed-off-by: NAmit Shah <amit.shah@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

19355475

A
KVM: MMU: Only enable cr4_pge role in shadow mode · 5a41accd
由 Avi Kivity 提交于 1月 11, 2009
```
Two dimensional paging is only confused by it.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
5a41accd

KVM: Properly lock PIT creation · 269e05e4

由 Avi Kivity 提交于 1月 05, 2009

Otherwise, two threads can create a PIT in parallel and cause a memory leak.
Signed-off-by: NAvi Kivity <avi@redhat.com>

269e05e4

KVM: PIT: provide an option to disable interrupt reinjection · 52d939a0

由 Marcelo Tosatti 提交于 12月 30, 2008

Certain clocks (such as TSC) in older 2.6 guests overaccount for lost
ticks, causing severe time drift. Interrupt reinjection magnifies the
problem.

Provide an option to disable it.

[avi: allow room for expansion in case we want to disable reinjection
      of other timers]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

52d939a0

KVM: Fallback support for MSR_VM_HSAVE_PA · 61a6bd67

由 Avi Kivity 提交于 12月 29, 2008

Since we advertise MSR_VM_HSAVE_PA, userspace will attempt to read it
even on Intel.  Implement fake support for this MSR to avoid the
warnings.
Signed-off-by: NAvi Kivity <avi@redhat.com>

61a6bd67

KVM: remove the vmap usage · 0f346074

由 Izik Eidus 提交于 12月 29, 2008

vmap() on guest pages hides those pages from the Linux mm for an extended
(userspace determined) amount of time.  Get rid of it.
Signed-off-by: NIzik Eidus <ieidus@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0f346074

KVM: introduce kvm_read_guest_virt, kvm_write_guest_virt · 77c2002e

由 Izik Eidus 提交于 12月 29, 2008

This commit change the name of emulator_read_std into kvm_read_guest_virt,
and add new function name kvm_write_guest_virt that allow writing into a
guest virtual address.
Signed-off-by: NIzik Eidus <ieidus@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

77c2002e

KVM: VMX: initialize TSC offset relative to vm creation time · 53f658b3

由 Marcelo Tosatti 提交于 12月 11, 2008

VMX initializes the TSC offset for each vcpu at different times, and
also reinitializes it for vcpus other than 0 on APIC SIPI message.

This bug causes the TSC's to appear unsynchronized in the guest, even if
the host is good.

Older Linux kernels don't handle the situation very well, so
gettimeofday is likely to go backwards in time:

http://www.mail-archive.com/kvm@vger.kernel.org/msg02955.html
http://sourceforge.net/tracker/index.php?func=detail&aid=2025534&group_id=180599&atid=893831

Fix it by initializating the offset of each vcpu relative to vm creation
time, and moving it from vmx_vcpu_reset to vmx_vcpu_setup, out of the
APIC MP init path.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

53f658b3

KVM: MMU: Segregate mmu pages created with different cr4.pge settings · 2f0b3d60

由 Avi Kivity 提交于 12月 21, 2008

Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
global bit set (the global bit has no effect if !cr4.pge).

This can only occur on smp with different cr4.pge settings for different
vcpus (since a cr4 change will resync the shadow ptes), but there's no
cost to being correct here.
Signed-off-by: NAvi Kivity <avi@redhat.com>

2f0b3d60

KVM: x86: Wire-up hardware breakpoints for guest debugging · ae675ef0

由 Jan Kiszka 提交于 12月 15, 2008

Add the remaining bits to make use of debug registers also for guest
debugging, thus enabling the use of hardware breakpoints and
watchpoints.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ae675ef0

KVM: x86: Virtualize debug registers · 42dbaa5a

由 Jan Kiszka 提交于 12月 15, 2008

So far KVM only had basic x86 debug register support, once introduced to
realize guest debugging that way. The guest itself was not able to use
those registers.

This patch now adds (almost) full support for guest self-debugging via
hardware registers. It refactors the code, moving generic parts out of
SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and
it ensures that the registers are properly switched between host and
guest.

This patch also prepares debug register usage by the host. The latter
will (once wired-up by the following patch) allow for hardware
breakpoints/watchpoints in guest code. If this is enabled, the guest
will only see faked debug registers without functionality, but with
content reflecting the guest's modifications.

Tested on Intel only, but SVM /should/ work as well, but who knows...

Known limitations: Trapping on tss switch won't work - most probably on
Intel.

Credits also go to Joerg Roedel - I used his once posted debugging
series as platform for this patch.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

42dbaa5a

KVM: New guest debug interface · d0bfb940

由 Jan Kiszka 提交于 12月 15, 2008

This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.

To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.

The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.

Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d0bfb940

KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set · d8017474

由 Alexander Graf 提交于 11月 25, 2008

Userspace has to tell the kernel module somehow that nested SVM should be used.
The easiest way that doesn't break anything I could think of is to implement

if (cpuid & svm)
    allow write to efer
else
    deny write to efer

Old userspaces mask the SVM capability bit, so they don't break.
In order to find out that the SVM capability is set, I had to split the
kvm_emulate_cpuid into a finding and an emulating part.

(introduced in v6)
Acked-by: NJoerg Roedel <joro@8bytes.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d8017474

KVM: SVM: Implement hsave · b286d5d8

由 Alexander Graf 提交于 11月 25, 2008

Implement the hsave MSR, that gives the VCPU a GPA to save the
old guest state in.

v2 allows userspace to save/restore hsave
v4 dummys out the hsave MSR, so we use a host page
v6 remembers the guest's hsave and exports the MSR
Acked-by: NJoerg Roedel <joro@8bytes.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b286d5d8

15 2月, 2009 3 次提交

KVM: x86: disable kvmclock on non constant TSC hosts · abe6655d

由 Marcelo Tosatti 提交于 2月 10, 2009

This is better.

Currently, this code path is posing us big troubles,
and we won't have a decent patch in time. So, temporarily
disable it.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

abe6655d

KVM: Fix racy in kvm_free_assigned_irq · ba4cef31

由 Sheng Yang 提交于 1月 06, 2009

In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq
handler and interrupt_work, in order to prevent cancel_work_sync() in
kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done.
But it's tricky and still got two problems:

1. A bug ignored two conditions that cancel_work_sync() would return true result
in a additional kvm_put_kvm().

2. If interrupt type is MSI, we would got a window between cancel_work_sync()
and free_irq(), which interrupt would be injected again...

This patch discard the reference count used for irq handler and interrupt_work,
and ensure the legal state by moving the free function at the very beginning of
kvm_destroy_vm(). And the patch fix the second bug by disable irq before
cancel_work_sync(), which may result in nested disable of irq but OK for we are
going to free it.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ba4cef31

KVM: Add kvm_arch_sync_events to sync with asynchronize events · ad8ba2cd

由 Sheng Yang 提交于 1月 06, 2009

kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.

For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ad8ba2cd

03 1月, 2009 1 次提交
- J
  KVM: change KVM to use IOMMU API · 19de40a8
  由 Joerg Roedel 提交于 12月 03, 2008
```
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
```
  19de40a8
31 12月, 2008 3 次提交

KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI · 4531220b

由 Jan Kiszka 提交于 12月 11, 2008

There is no point in doing the ready_for_nmi_injection/
request_nmi_window dance with user space. First, we don't do this for
in-kernel irqchip anyway, while the code path is the same as for user
space irqchip mode. And second, there is nothing to loose if a pending
NMI is overwritten by another one (in contrast to IRQs where we have to
save the number). Actually, there is even the risk of raising spurious
NMIs this way because the reason for the held-back NMI might already be
handled while processing the first one.

Therefore this patch creates a simplified user space NMI injection
interface, exporting it under KVM_CAP_USER_NMI and dropping the old
KVM_CAP_NMI capability. And this time we also take care to provide the
interface only on archs supporting NMIs via KVM (right now only x86).
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4531220b

A
KVM: Consolidate userspace memory capability reporting into common code · ca9edaee
由 Avi Kivity 提交于 12月 08, 2008
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
ca9edaee

KVM: MMU: prepopulate the shadow on invlpg · ad218f85

由 Marcelo Tosatti 提交于 12月 01, 2008

If the guest executes invlpg, peek into the pagetable and attempt to
prepopulate the shadow entry.

Also stop dirty fault updates from interfering with the fork detector.

2% improvement on RHEL3/AIM7.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ad218f85

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致