提交 · ffde22ac53b6d6b1d7206f1172176a667eead778 · openanolis / cloud-kernel

03 12月, 2009 40 次提交

KVM: Xen PV-on-HVM guest support · ffde22ac

由 Ed Swierk 提交于 10月 15, 2009

Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: NEd Swierk <eswierk@aristanetworks.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ffde22ac

KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check · 94c30d9c

由 Jan Kiszka 提交于 10月 12, 2009

This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

94c30d9c

M
KVM: VMX: fix handle_pause declaration · 9fb41ba8
由 Marcelo Tosatti 提交于 10月 12, 2009
```
There's no kvm_run argument anymore.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
9fb41ba8

KVM: x86: Harden against cpufreq · 6b7d7e76

由 Zachary Amsden 提交于 10月 09, 2009

If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6b7d7e76

KVM: SVM: Support Pause Filter in AMD processors · 565d0998

由 Mark Langsdorf 提交于 10月 06, 2009

New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature.  This feature creates a new field in the VMCB
called Pause Filter Count.  If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.
Signed-off-by: NMark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

565d0998

KVM: VMX: Add support for Pause-Loop Exiting · 4b8d54f9

由 Zhai, Edwin 提交于 10月 09, 2009

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: NZhai Edwin <edwin.zhai@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4b8d54f9

KVM: introduce kvm_vcpu_on_spin · d255f4f2

由 Zhai, Edwin 提交于 10月 09, 2009

Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.
Signed-off-by: N"Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d255f4f2

KVM: SVM: Remove nsvm_printk debugging code · d36f19e9

由 Joerg Roedel 提交于 10月 09, 2009

With all important informations now delivered through
tracepoints we can savely remove the nsvm_printk debugging
code for nested svm.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d36f19e9

KVM: SVM: Add tracepoint for skinit instruction · 532a46b9

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

532a46b9

KVM: SVM: Add tracepoint for invlpga instruction · ec1ff790

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ec1ff790

KVM: SVM: Add tracepoint for #vmexit because intr pending · 236649de

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

236649de

KVM: SVM: Add tracepoint for injected #vmexit · 17897f36

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

17897f36

KVM: SVM: Add tracepoint for nested #vmexit · d8cabddf

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for every #vmexit we get from a
nested guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d8cabddf

KVM: SVM: Add tracepoint for nested vmrun · 0ac406de

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a dedicated kvm tracepoint for a nested
vmrun.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0ac406de

KVM: SVM: Move INTR vmexit out of atomic code · cd3ff653

由 Joerg Roedel 提交于 10月 09, 2009

The nested SVM code emulates a #vmexit caused by a request
to open the irq window right in the request function. This
is a bug because the request function runs with preemption
and interrupts disabled but the #vmexit emulation might
sleep. This can cause a schedule()-while-atomic bug and is
fixed with this patch.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cd3ff653

KVM: SVM: Notify nested hypervisor of lost event injections · 8d23c466

由 Alexander Graf 提交于 10月 09, 2009

If event_inj is valid on a #vmexit the host CPU would write
the contents to exit_int_info, so the hypervisor knows that
the event wasn't injected.

We don't do this in nested SVM by now which is a bug and
fixed by this patch.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8d23c466

KVM: x86: include pvclock MSRs in msrs_to_save · e3267cbb

由 Glauber Costa 提交于 10月 06, 2009

For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: stable@kernel.org
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e3267cbb

KVM: x86: Rework guest single-step flag injection and filtering · 91586a3b

由 Jan Kiszka 提交于 10月 05, 2009

Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

91586a3b

KVM: x86: disable paravirt mmu reporting · a68a6a72

由 Marcelo Tosatti 提交于 10月 01, 2009

Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a68a6a72

KVM: x86: Refactor guest debug IOCTL handling · 355be0b9

由 Jan Kiszka 提交于 10月 03, 2009

Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

355be0b9

KVM: remove pre_task_link setting in save_state_to_tss16 · 201d945b

由 Juan Quintela 提交于 9月 30, 2009

Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37
  Author: Gleb Natapov <gleb@redhat.com>
  Date:   Mon Mar 30 16:03:24 2009 +0300

    KVM: Fix task switch back link handling.

CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

201d945b

KVM: Fix hotplug of CPUs · 3230bb47

由 Zachary Amsden 提交于 9月 29, 2009

Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.

Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3230bb47

KVM: Fix printk name error in svm.c · e6732a5a

由 Zachary Amsden 提交于 9月 29, 2009

Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e6732a5a

KVM: Kill the confusing tsc_ref_khz and ref_freq variables · 0cca7907

由 Zachary Amsden 提交于 9月 29, 2009

They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.

On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0cca7907

Z
KVM: Separate timer intialization into an indepedent function · b820cc0c
由 Zachary Amsden 提交于 9月 29, 2009
```
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
b820cc0c

KVM: fix lock imbalance in kvm_*_irq_source_id() · 0c6ddceb

由 Jiri Slaby 提交于 9月 25, 2009

Stanse found 2 lock imbalances in kvm_request_irq_source_id and
kvm_free_irq_source_id. They omit to unlock kvm->irq_lock on fail paths.

Fix that by adding unlock labels at the end of the functions and jump
there from the fail paths.
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0c6ddceb

KVM: SVM: Remove remaining occurences of rdtscll · e935d48e

由 Joerg Roedel 提交于 9月 16, 2009

This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e935d48e

KVM: SVM: don't copy exit_int_info on nested vmrun · 33527ad7

由 Joerg Roedel 提交于 9月 16, 2009

The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

33527ad7

KVM: SVM: reorganize svm_interrupt_allowed · 7fcdb510

由 Joerg Roedel 提交于 9月 16, 2009

This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7fcdb510

KVM: remove duplicated #include · bfc33bea

由 Huang Weiyi 提交于 9月 16, 2009

Remove duplicated #include('s) in
  arch/x86/kvm/lapic.c
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

bfc33bea

KVM: Activate Virtualization On Demand · 10474ae8

由 Alexander Graf 提交于 9月 15, 2009

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

10474ae8

KVM: SVM: remove needless mmap_sem acquision from nested_svm_map · e8b3433a

由 Marcelo Tosatti 提交于 9月 08, 2009

nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Acked-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e8b3433a

KVM: VMX: Enhance invalid guest state emulation · 80ced186

由 Mohammed Gamal 提交于 9月 01, 2009

- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that have already failed
Signed-off-by: NMohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

80ced186

KVM: x86 emulator: Add pusha and popa instructions · abcf14b5

由 Mohammed Gamal 提交于 9月 01, 2009

This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.

[marcelo: remove unused variable]
Signed-off-by: NMohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

abcf14b5

KVM: x86 emulator: Add missing decoder flags for 'or' instructions · 94677e61

由 Mohammed Gamal 提交于 8月 28, 2009

Add missing decoder flags for or instructions (0xc-0xd).
Signed-off-by: NMohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

94677e61

A
KVM: Move assigned device code to own file · bfd99ff5
由 Avi Kivity 提交于 8月 26, 2009
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
bfd99ff5
A
KVM: Return -ENOTTY on unrecognized ioctls · 367e1319
由 Avi Kivity 提交于 8月 26, 2009
```
Not the incorrect -EINVAL.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
367e1319

KVM: Drop kvm->irq_lock lock from irq injection path · 680b3648

由 Gleb Natapov 提交于 8月 24, 2009

The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

680b3648

KVM: Move IO APIC to its own lock · eba0226b

由 Gleb Natapov 提交于 8月 24, 2009

The allows removal of irq_lock from the injection path.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

eba0226b

KVM: Convert irq notifiers lists to RCU locking · 280aa177

由 Gleb Natapov 提交于 8月 24, 2009

Use RCU locking for mask/ack notifiers lists.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

280aa177

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功