提交 · 453423dce2785b8e22077e3b3eeecb4f60fe3470 · openeuler / raspberrypi-kernel

27 4月, 2008 40 次提交

KVM: s390: intercepts for privileged instructions · 453423dc

由 Christian Borntraeger 提交于 3月 25, 2008

This patch introduces in-kernel handling of some intercepts for privileged
instructions:

handle_set_prefix()        sets the prefix register of the local cpu
handle_store_prefix()      stores the content of the prefix register to memory
handle_store_cpu_address() stores the cpu number of the current cpu to memory
handle_skey()              just decrements the instruction address and retries
handle_stsch()             delivers condition code 3 "operation not supported"
handle_chsc()              same here
handle_stfl()              stores the facility list which contains the
                           capabilities of the cpu
handle_stidp()             stores cpu type/model/revision and such
handle_stsi()              stores information about the system topology
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

453423dc

KVM: s390: interrupt subsystem, cpu timer, waitpsw · ba5c1e9b

由 Carsten Otte 提交于 3月 25, 2008

This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).

In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.

This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.

The following interrupts are supported:
SIGP STOP       - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
                  (stopped) remote cpu
INT EMERGENCY   - interprocessor interrupt, usually used to signal need_reshed
                  and for smp_call_function() in the guest.
PROGRAM INT     - exception during program execution such as page fault, illegal
                  instruction and friends
RESTART         - interprocessor signal that starts a stopped cpu
INT VIRTIO      - floating interrupt for virtio signalisation
INT SERVICE     - floating interrupt for signalisations from the system
                  service processor

struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.

kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.

[christian: change virtio interrupt to 0x2603]
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

ba5c1e9b

KVM: s390: sie intercept handling · 8f2abe6a

由 Christian Borntraeger 提交于 3月 25, 2008

This path introduces handling of sie intercepts in three flavors: Intercepts
are either handled completely in-kernel by kvm_handle_sie_intercept(),
or passed to userspace with corresponding data in struct kvm_run in case
kvm_handle_sie_intercept() returns -ENOTSUPP.
In case of partial execution in kernel with the need of userspace support,
kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
-EREMOTE.

The trivial intercept reasons are handled in this patch:
handle_noop() just does nothing for intercepts that don't require our support
  at all
handle_stop() is called when a cpu enters stopped state, and it drops out to
  userland after updating our vcpu state
handle_validity() faults in the cpu lowcore if needed, or passes the request
  to userland
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

8f2abe6a

KVM: s390: arch backend for the kvm kernel module · b0c632db

由 Heiko Carstens 提交于 3月 25, 2008

This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
(aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.

The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all
arch callbacks for kvm. __vcpu_run calls back into
sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest
context via SIE, and switches world before and after that
include/asm-s390/kvm_host.h contains all vital data structures needed to run
virtual machines on the mainframe
include/asm-s390/kvm.h defines kvm_regs and friends for user access to
guest register content
arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by
later patches
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

b0c632db

s390: KVM preparation: address of the 64bit extint parm in lowcore · 8a88ac61

由 Christian Borntraeger 提交于 3月 25, 2008

The address 0x11b8 is used by z/VM for pfault and diag 250 I/O to
provide a 64 bit extint parameter. virtio uses the same address, so
its time to update the lowcore structure.
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

8a88ac61

s390: KVM preparation: host memory management changes for s390 kvm · 5b7baf05

由 Christian Borntraeger 提交于 3月 25, 2008

This patch changes the s390 memory management defintions to use the pgste field
for dirty and reference bit tracking of host and guest code. Usually on s390,
dirty and referenced are tracked in storage keys, which belong to the physical
page. This changes with virtualization: The guest and host dirty/reference bits
are defined to be the logical OR of the values for the mapping and the physical
page. This patch implements the necessary changes in pgtable.h for s390.

There is a common code change in mm/rmap.c, the call to
page_test_and_clear_young must be moved. This is a no-op for all
architecture but s390. page_referenced checks the referenced bits for
the physiscal page and for all mappings:
o The physical page is checked with page_test_and_clear_young.
o The mappings are checked with ptep_test_and_clear_young and friends.

Without pgstes (the current implementation on Linux s390) the physical page
check is implemented but the mapping callbacks are no-ops because dirty
and referenced are not tracked in the s390 page tables. The pgstes introduces
guest and host dirty and reference bits for s390 in the host mapping. These
mapping must be checked before page_test_and_clear_young resets the reference
bit.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

5b7baf05

s390: KVM preparation: provide hook to enable pgstes in user pagetable · 402b0862

由 Carsten Otte 提交于 3月 25, 2008

The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.

Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.

This patch has a small common code hit, namely making dup_mm non-static.

Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

402b0862

KVM: x86: hardware task switching support · 37817f29

由 Izik Eidus 提交于 3月 24, 2008

This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm.  It allows operating systems which use it,
like freedos, to run as kvm guests.
Signed-off-by: NIzik Eidus <izike@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

37817f29

I
KVM: x86: add functions to get the cpl of vcpu · 2e4d2653
由 Izik Eidus 提交于 3月 24, 2008
```
Signed-off-by: NIzik Eidus <izike@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
2e4d2653
A
KVM: VMX: Add module option to disable flexpriority · 4c9fc8ef
由 Avi Kivity 提交于 3月 24, 2008
```
Useful for debugging.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
4c9fc8ef
A
KVM: no longer EXPERIMENTAL · 268fe02a
由 Avi Kivity 提交于 3月 23, 2008
```
Long overdue.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
268fe02a
A
KVM: MMU: Introduce and use spte_to_page() · 0b49ea86
由 Avi Kivity 提交于 3月 23, 2008
```
Encapsulate the pte mask'n'shift in a function.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
0b49ea86

KVM: MMU: fix dirty bit setting when removing write permissions · 855149aa

由 Izik Eidus 提交于 3月 20, 2008

When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.

This patch fix this issue by marking the page as dirty inside
rmap_write_protect().
Signed-off-by: NIzik Eidus <izike@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

855149aa

A
KVM: Move some x86 specific constants and structures to include/asm-x86 · 69a9f69b
由 Avi Kivity 提交于 3月 21, 2008
```
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
69a9f69b

KVM: MMU: Set the accessed bit on non-speculative shadow ptes · 947da538

由 Avi Kivity 提交于 3月 18, 2008

If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction.  This saves a read-modify-write
operation.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

947da538

KVM: kvm.h: __user requires compiler.h · 97646202

由 Christian Borntraeger 提交于 3月 12, 2008

include/linux/kvm.h defines struct kvm_dirty_log to
	[...]
	union {
		void __user *dirty_bitmap; /* one bit per page */
		__u64 padding;
	};

__user requires compiler.h to compile. Currently, this works on x86
only coincidentally due to other include files. This patch makes
kvm.h compile in all cases.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

97646202

x86: KVM guest: disable clock before rebooting. · 1e977aa1

由 Glauber Costa 提交于 3月 17, 2008

This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.

Without it, we can have a random memory location being written
when the guest comes back

It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)
Signed-off-by: NGlauber Costa <gcosta@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1e977aa1

x86: make native_machine_shutdown non-static · 3c62c625

由 Glauber Costa 提交于 3月 17, 2008

it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done
Signed-off-by: NGlauber Costa <gcosta@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

3c62c625

x86: allow machine_crash_shutdown to be replaced · ed23dc6f

由 Glauber Costa 提交于 3月 17, 2008

This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops
Signed-off-by: NGlauber Costa <gcosta@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

ed23dc6f

x86: KVM guest: hypercall batching · 096d14a3

由 Marcelo Tosatti 提交于 2月 22, 2008

Batch pte updates and tlb flushes in lazy MMU mode.

[avi:
 - adjust to mmu_op
 - helper for getting para_state without debug warnings]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

096d14a3

x86: KVM guest: hypercall based pte updates and TLB flushes · 1da8a77b

由 Marcelo Tosatti 提交于 2月 22, 2008

Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - guest/host split
 - fix 32-bit truncation issues
 - adjust to mmu_op
 - adjust to ->release_*() renamed
 - add ->release_pud()]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1da8a77b

KVM: MMU: hypercall based pte updates and TLB flushes · 2f333bcb

由 Marcelo Tosatti 提交于 2月 22, 2008

Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - one mmu_op hypercall instead of one per op
 - allow 64-bit gpa on hypercall
 - don't pass host errors (-ENOMEM) to guest]

[akpm: warning fix on i386]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

2f333bcb

A
KVM: Provide unlocked version of emulator_write_phys() · 9f811285
由 Avi Kivity 提交于 3月 02, 2008
```
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
9f811285

x86: KVM guest: add basic paravirt support · 0cf1bfd2

由 Marcelo Tosatti 提交于 2月 22, 2008

Add basic KVM paravirt support. Avoid vm-exits on IO delays.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

0cf1bfd2

KVM: add basic paravirt support · a28e4f5a

由 Marcelo Tosatti 提交于 2月 22, 2008

Add basic KVM paravirt support. Avoid vm-exits on IO delays.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

a28e4f5a

KVM: Add reset support for in kernel PIT · 308b0f23

由 Sheng Yang 提交于 3月 13, 2008

Separate the reset part and prepare for reset support.
Signed-off-by: NSheng Yang <sheng.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

308b0f23

S
KVM: Add save/restore supporting of in kernel PIT · e0f63cb9
由 Sheng Yang 提交于 3月 04, 2008
```
Signed-off-by: NSheng Yang <sheng.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
e0f63cb9

KVM: In kernel PIT model · 7837699f

由 Sheng Yang 提交于 1月 28, 2008

The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.

[marcelo: make last_injected_time per-guest]
Signed-off-by: NSheng Yang <sheng.yang@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: NAlex Davis <alex14641@yahoo.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

7837699f

KVM: Remove pointless desc_ptr #ifdef · 4fcaa982

由 Avi Kivity 提交于 3月 05, 2008

The desc_struct changes left an unnecessary #ifdef; remove it.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

4fcaa982

KVM: VMX: Don't adjust tsc offset forward · 019960ae

由 Avi Kivity 提交于 3月 04, 2008

Most Intel hosts have a stable tsc, and playing with the offset only
reduces accuracy. By limiting tsc offset adjustment only to forward updates,
we effectively disable tsc offset adjustment on these hosts.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

019960ae

KVM: replace remaining __FUNCTION__ occurances · b8688d51

由 Harvey Harrison 提交于 3月 03, 2008

__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

b8688d51

KVM: detect if VCPU triple faults · 71c4dfaf

由 Joerg Roedel 提交于 2月 26, 2008

In the current inject_page_fault path KVM only checks if there is another PF
pending and injects a DF then. But it has to check for a pending DF too to
detect a shutdown condition in the VCPU. If this is not detected the VCPU goes
to a PF -> DF -> PF loop when it should triple fault. This patch detects this
condition and handles it with an KVM_SHUTDOWN exit to userspace.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

71c4dfaf

KVM: Use kzalloc to avoid allocating kvm_regs from kernel stack · 3e4bb3ac

由 Xiantao Zhang 提交于 2月 25, 2008

Since the size of kvm_regs is too big to allocate from kernel stack on ia64,
use kzalloc to allocate it.
Signed-off-by: NXiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

3e4bb3ac

A
KVM: Prefix control register accessors with kvm_ to avoid namespace pollution · 2d3ad1f4
由 Avi Kivity 提交于 2月 24, 2008
```
Names like 'set_cr3()' look dangerously close to affecting the host.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
2d3ad1f4

KVM: MMU: large page support · 05da4558

由 Marcelo Tosatti 提交于 2月 23, 2008

Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed.  If the largepage contains
write-protected pages, a large pte is not used.

Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.

Anthony measures a 4% improvement on 4-way kernbench, with NPT.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

05da4558

KVM: MMU: ignore zapped root pagetables · 2e53d63a

由 Marcelo Tosatti 提交于 2月 20, 2008

Mark zapped root pagetables as invalid and ignore such pages during lookup.

This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

2e53d63a

KVM: Implement dummy values for MSR_PERF_STATUS · 847f0ad8

由 Alexander Graf 提交于 2月 21, 2008

Darwin relies on this and ceases to work without.
Signed-off-by: NAlexander Graf <alex@csgraf.de>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

847f0ad8

KVM: sparse fixes for kvm/x86.c · 14af3f3c

由 Harvey Harrison 提交于 2月 19, 2008

In two case statements, use the ever popular 'i' instead of index:
arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here
arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here

Make it static.
arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static?

Drop the return statements.
arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression
arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

14af3f3c

KVM: SVM: make iopm_base static · 4866d5e3

由 Harvey Harrison 提交于 2月 19, 2008

Fixes sparse warning as well.
arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static?
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

4866d5e3

KVM: x86 emulator: fix sparse warnings in x86_emulate.c · 77cd337f

由 Harvey Harrison 提交于 2月 19, 2008

Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed
variable warnings on the internal variable _tmp used by both macros.

Change the outer macro to use __tmp.

Avoids a sparse warning like the following at every call site of __emulate_2op
arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one
arch/x86/kvm/x86_emulate.c:1091:3: originally declared here
[18 more warnings suppressed]
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

77cd337f