提交 · 4a157d61b48c7cdb8d751001442a14ebac80229f · openanolis / cloud-kernel

17 12月, 2014 3 次提交

KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61

由 Paul Mackerras 提交于 12月 03, 2014

There are two ways in which a guest instruction can be obtained from
the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
instruction), the offending instruction is in the HEIR register
(Hypervisor Emulation Instruction Register).  If the exit was caused
by a load or store to an emulated MMIO device, we load the instruction
from the guest by turning data relocation on and loading the instruction
with an lwz instruction.

Unfortunately, in the case where the guest has opposite endianness to
the host, these two methods give results of different endianness, but
both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
using guest endianness, whereas the lwz will load the instruction using
host endianness.  The rest of the code that uses vcpu->arch.last_inst
assumes it was loaded using host endianness.

To fix this, we define a new vcpu field to store the HEIR value.  Then,
in kvmppc_handle_exit_hv(), we transfer the value from this new field to
vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
differ.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4a157d61

KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf

由 Paul Mackerras 提交于 12月 03, 2014

This removes the code that was added to enable HV KVM to work
on PPC970 processors.  The PPC970 is an old CPU that doesn't
support virtualizing guest memory.  Removing PPC970 support also
lets us remove the code for allocating and managing contiguous
real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
case, the code for pinning pages of guest memory when first
accessed and keeping track of which pages have been pinned, and
the code for handling H_ENTER hypercalls in virtual mode.

Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
The KVM_CAP_PPC_RMA capability now always returns 0.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c17b98cf

KVM: PPC: Book3S HV: Simplify locking around stolen time calculations · 2711e248

由 Paul Mackerras 提交于 12月 04, 2014

Currently the calculations of stolen time for PPC Book3S HV guests
uses fields in both the vcpu struct and the kvmppc_vcore struct.  The
fields in the kvmppc_vcore struct are protected by the
vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
running the virtual core.  This works correctly but confuses lockdep,
because it sees that the code takes the tbacct_lock for a vcpu in
kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
vcore_stolen_time(), and it thinks there is a possibility of deadlock,
causing it to print reports like this:

=============================================
[ INFO: possible recursive locking detected ]
3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
---------------------------------------------
qemu-system-ppc/6188 is trying to acquire lock:
 (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]

but task is already holding lock:
 (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&vcpu->arch.tbacct_lock)->rlock);
  lock(&(&vcpu->arch.tbacct_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by qemu-system-ppc/6188:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm]
 #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv]
 #2:  (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]

stack backtrace:
CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89
Call Trace:
[c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable)
[c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190
[c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0
[c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70
[c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
[c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv]
[c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv]
[c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
[c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
[c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm]
[c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770
[c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0
[c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98

In order to make the locking easier to analyse, we change the code to
use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
preempt_tb fields.  This lock needs to be an irq-safe lock since it is
used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
functions, which are called with the scheduler rq lock held, which is
an irq-safe lock.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

2711e248

24 9月, 2014 2 次提交

kvm: Add arch specific mmu notifier for page invalidation · fe71557a

由 Tang Chen 提交于 9月 24, 2014

This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe71557a

kvm: Fix page ageing bugs · 57128468

由 Andres Lagar-Cavilla 提交于 9月 22, 2014

1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57128468

22 9月, 2014 3 次提交

KVM: PPC: Remove the tasklet used by the hrtimer · d02d4d15

由 Mihai Caraman 提交于 9月 01, 2014

Powerpc timer implementation is a copycat version of s390. Now that they removed
the tasklet with commit ea74c0ea follow this
optimization.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NBogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

d02d4d15

KVM: PPC: BOOKE: Guest and hardware visible debug registers are same · 348ba710

由 Bharat Bhushan 提交于 8月 06, 2014

Guest visible debug register and hardware visible debug registers are
same, so ther is no need to have arch->shadow_dbg_reg, instead use
arch->dbg_reg.
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

348ba710

KVM: PPC: BOOKE : Emulate rfdi instruction · c8ca97ca

由 Bharat Bhushan 提交于 8月 06, 2014

This patch adds "rfdi" instruction emulation which is required for
guest debug hander on BOOKE-HV
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c8ca97ca

29 8月, 2014 3 次提交

KVM: remove garbage arg to *hardware_{en,dis}able · 13a34e06

由 Radim Krčmář 提交于 8月 28, 2014

In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.
Signed-off-by: NRadim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13a34e06

KVM: static inline empty kvm_arch functions · 0865e636

由 Radim Krčmář 提交于 8月 28, 2014

Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)

This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
  2df72e9b KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
  e790d9ef KVM: add kvm_arch_sched_in
Signed-off-by: NRadim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0865e636

KVM: forward declare structs in kvm_types.h · 65647300

由 Paolo Bonzini 提交于 8月 29, 2014

Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).

Move them from individual files to a generic place in linux/kvm_types.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

65647300

29 7月, 2014 1 次提交

KVM: PPC: Remove DCR handling · ce91ddc4

由 Alexander Graf 提交于 7月 28, 2014

DCR handling was only needed for 440 KVM. Since we removed it, we can also
remove handling of DCR accesses.
Signed-off-by: NAlexander Graf <agraf@suse.de>

ce91ddc4

28 7月, 2014 9 次提交

KVM: PPC: Move kvmppc_ld/st to common code · 35c4a733

由 Alexander Graf 提交于 6月 20, 2014

We have enough common infrastructure now to resolve GVA->GPA mappings at
runtime. With this we can move our book3s specific helpers to load / store
in guest virtual address space to common code as well.
Signed-off-by: NAlexander Graf <agraf@suse.de>

35c4a733

Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 · 9678cdaa

由 Stewart Smith 提交于 7月 18, 2014

The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying "has way to store and load contents of L2 or
L2+MRU way of L3 cache". We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.

The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
  - 32kb for L2
  - 128kb for L2+L3
  - Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)

We should abort any ongoing logging before initiating one.

To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
  32 cycles)

The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.

The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.

This implementation just does L2 store/load. In my benchmarks this proves
to be useful.

Benchmark 1:
 - 16 core POWER8
 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
 - No split core/SMT
 - two guests running sysbench memory test.
   sysbench --test=memory --num-threads=8 run
 - one guest running apache bench (of default HTML page)
   ab -n 490000 -c 400 http://localhost/

This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.

In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.

benchmark 2:
 - Same VM config as benchmark 1
 - all three guests running sysbench memory benchmark

This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)

The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.

benchmark 3:
 - same 3 guests and CPU configuration as benchmark 1 and 2.
 - two idle guests
 - 1 guest running STREAM benchmark

This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).

benchmark 4:
 - same 3 guests as previous benchmarks
 - two guests running sysbench --memory, distinctly different cache heavy
   workload
 - one guest running STREAM benchmark.

Similar improvements to benchmark 3.

benchmark 5:
 - 1 guest, 8 VCPUs, Ubuntu 14.04
 - Host configured with split core (SMT8, subcores-per-core=4)
 - STREAM benchmark

In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.

Based on preliminary investigation and microbenchmarks
by Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9678cdaa

KVM: PPC: Remove 440 support · b2677b8d

由 Alexander Graf 提交于 7月 25, 2014

The 440 target hasn't been properly functioning for a few releases and
before I was the only one who fixes a very serious bug that indicates to
me that nobody used it before either.

Furthermore KVM on 440 is slow to the extent of unusable.

We don't have to carry along completely unused code. Remove 440 and give
us one less thing to worry about.
Signed-off-by: NAlexander Graf <agraf@suse.de>

b2677b8d

kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit · 99e99d19

由 Bharat Bhushan 提交于 7月 21, 2014

SPRN_SPRG is used by debug interrupt handler, so this is required for
debug support.
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

99e99d19

KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct · 1287cb3f

由 Alexander Graf 提交于 7月 04, 2014

When building KVM with a lot of vcores (NR_CPUS is big), we can potentially
get out of the ld immediate range for dereferences inside that struct.

Move the array to the end of our kvm_arch struct. This fixes compilation
issues with NR_CPUS=2048 for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

1287cb3f

KVM: PPC: e500: Emulate power management control SPR · debf27d6

由 Mihai Caraman 提交于 7月 04, 2014

For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as a general store.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

debf27d6

KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0

由 Paul Mackerras 提交于 6月 02, 2014

This provides a way for userspace controls which sPAPR hcalls get
handled in the kernel.  Each hcall can be individually enabled or
disabled for in-kernel handling, except for H_RTAS.  The exception
for H_RTAS is because userspace can already control whether
individual RTAS functions are handled in-kernel or not via the
KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
H_RTAS is out of the normal sequence of hcall numbers.

Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
The args field of the struct kvm_enable_cap specifies the hcall number
in args[0] and the enable/disable flag in args[1]; 0 means disable
in-kernel handling (so that the hcall will always cause an exit to
userspace) and 1 means enable.  Enabling or disabling in-kernel
handling of an hcall is effective across the whole VM.

The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
capability advertises that this ability exists.

When a VM is created, an initial set of hcalls are enabled for
in-kernel handling.  The set that is enabled is the set that have
an in-kernel implementation at this point.  Any new hcall
implementations from this point onwards should not be added to the
default set without a good reason.

No distinction is made between real-mode and virtual-mode hcall
implementations; the one setting controls them both.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

699a0ea0

KVM: PPC: BOOK3S: PR: Emulate instruction counter · 06da28e7

由 Aneesh Kumar K.V 提交于 6月 05, 2014

Writing to IC is not allowed in the privileged mode.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

06da28e7

KVM: PPC: BOOK3S: PR: Emulate virtual timebase register · 8f42ab27

由 Aneesh Kumar K.V 提交于 6月 05, 2014

virtual time base register is a per VM, per cpu register that needs
to be saved and restored on vm exit and entry. Writing to VTB is not
allowed in the privileged mode.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix compile error]
Signed-off-by: NAlexander Graf <agraf@suse.de>

8f42ab27

06 7月, 2014 1 次提交

KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation · 3cd60e31

由 Aneesh Kumar K.V 提交于 6月 04, 2014

We use time base for PURR and SPURR emulation with PR KVM since we
are emulating a single threaded core. When using time base
we need to make sure that we don't accumulate time spent in the host
in PURR and SPURR value.

Also we don't need to emulate mtspr because both the registers are
hypervisor resource.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

3cd60e31

30 5月, 2014 5 次提交

KVM: PPC: Disable NX for old magic page using guests · f3383cf8

由 Alexander Graf 提交于 5月 12, 2014

Old guests try to use the magic page, but map their trampoline code inside
of an NX region.

Since we can't fix those old kernels, try to detect whether the guest is sane
or not. If not, just disable NX functionality in KVM so that old guests at
least work at all. For newer guests, add a bit that we can set to keep NX
functionality available.
Signed-off-by: NAlexander Graf <agraf@suse.de>

f3383cf8

KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e

由 Alexander Graf 提交于 4月 22, 2014

POWER8 implements a new register called TAR. This register has to be
enabled in FSCR and then from KVM's point of view is mere storage.

This patch enables the guest to use TAR.
Signed-off-by: NAlexander Graf <agraf@suse.de>

e14e7a1e

KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR · 616dff86

由 Alexander Graf 提交于 4月 29, 2014

POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
which contains its status message in a new register called FSCR.

Handle these exits and try to emulate instructions for unhandled facilities.
Follow-on patches enable KVM to expose specific facilities into the guest.
Signed-off-by: NAlexander Graf <agraf@suse.de>

616dff86

KVM: PPC: Make shared struct aka magic page guest endian · 5deb8e7a

由 Alexander Graf 提交于 4月 24, 2014

The shared (magic) page is a data structure that contains often used
supervisor privileged SPRs accessible via memory to the user to reduce
the number of exits we have to take to read/write them.

When we actually share this structure with the guest we have to maintain
it in guest endianness, because some of the patch tricks only work with
native endian load/store operations.

Since we only share the structure with either host or guest in little
endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.

For booke, the shared struct stays big endian. For book3s_64 hv we maintain
the struct in host native endian, since it never gets shared with the guest.

For book3s_64 pr we introduce a variable that tells us which endianness the
shared struct is in and route every access to it through helper inline
functions that evaluate this variable.
Signed-off-by: NAlexander Graf <agraf@suse.de>

5deb8e7a

KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest · e5ee5422

由 Aneesh Kumar K.V 提交于 5月 05, 2014

This patch make sure we inherit the LE bit correctly in different case
so that we can run Little Endian distro in PR mode
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e5ee5422

27 1月, 2014 5 次提交

KVM: PPC: Book3S HV: Add new state for transactional memory · 7b490411

由 Michael Neuling 提交于 1月 08, 2014

Add new state for transactional memory (TM) to kvm_vcpu_arch.  Also add
asm-offset bits that are going to be required.

This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a
CONFIG_PPC_TRANSACTIONAL_MEM section.  This requires some code changes to
ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N.  Much of the added
the added #ifdefs are removed in a later patch when the bulk of the TM code is
added.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix merge conflict]
Signed-off-by: NAlexander Graf <agraf@suse.de>

7b490411

KVM: PPC: Book3S HV: Basic little-endian guest support · d682916a

由 Anton Blanchard 提交于 1月 08, 2014

We create a guest MSR from scratch when delivering exceptions in
a few places.  Instead of extracting LPCR[ILE] and inserting it
into MSR_LE each time, we simply create a new variable intr_msr which
contains the entire MSR to use.  For a little-endian guest, userspace
needs to set the ILE (interrupt little-endian) bit in the LPCR for
each vcpu (or at least one vcpu in each virtual core).

[paulus@samba.org - removed H_SET_MODE implementation from original
version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.]
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

d682916a

KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52

由 Paul Mackerras 提交于 1月 08, 2014

The DABRX (DABR extension) register on POWER7 processors provides finer
control over which accesses cause a data breakpoint interrupt. It
contains 3 bits which indicate whether to enable accesses in user,
kernel and hypervisor modes respectively to cause data breakpoint
interrupts, plus one bit that enables both real mode and virtual mode
accesses to cause interrupts. Currently, KVM sets DABRX to allow
both kernel and user accesses to cause interrupts while in the guest.

This adds support for the guest to specify other values for DABRX.
PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
and DABRX with one call. This adds a real-mode implementation of
H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
implementation. To support this, we add a per-vcpu field to store the
DABRX value plus code to get and set it via the ONE_REG interface.

For Linux guests to use this new hcall, userspace needs to add
"hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
property in the device tree. If userspace does this and then migrates
the guest to a host where the kernel doesn't include this patch, then
userspace will need to implement H_SET_XDABR by writing the specified
DABR value to the DABR using the ONE_REG interface. In that case, the
old kernel will set DABRX to DABRX_USER | DABRX_KERNEL. That should
still work correctly, at least for Linux guests, since Linux guests
cope with getting data breakpoint interrupts in modes that weren't
requested by just ignoring the interrupt, and Linux guests never set
DABRX_BTI.

The other thing this does is to make H_SET_DABR and H_SET_XDABR work
on POWER8, which has the DAWR and DAWRX instead of DABR/X. Guests that
know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
For them, this adds the logic to convert DABR/X values into DAWR/X values
on POWER8.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8563bf52

KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e

由 Michael Neuling 提交于 1月 08, 2014

This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.

Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b005255e

KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers · e0b7ec05

由 Paul Mackerras 提交于 1月 08, 2014

On a threaded processor such as POWER7, we group VCPUs into virtual
cores and arrange that the VCPUs in a virtual core run on the same
physical core.  Currently we don't enforce any correspondence between
virtual thread numbers within a virtual core and physical thread
numbers.  Physical threads are allocated starting at 0 on a first-come
first-served basis to runnable virtual threads (VCPUs).

POWER8 implements a new "msgsndp" instruction which guest kernels can
use to interrupt other threads in the same core or sub-core.  Since
the instruction takes the destination physical thread ID as a parameter,
it becomes necessary to align the physical thread IDs with the virtual
thread IDs, that is, to make sure virtual thread N within a virtual
core always runs on physical thread N.

This means that it's possible that thread 0, which is where we call
__kvmppc_vcore_entry, may end up running some other vcpu than the
one whose task called kvmppc_run_core(), or it may end up running
no vcpu at all, if for example thread 0 of the virtual core is
currently executing in userspace.  However, we do need thread 0
to be responsible for switching the MMU -- a previous version of
this patch that had other threads switching the MMU was found to
be responsible for occasional memory corruption and machine check
interrupts in the guest on POWER7 machines.

To accommodate this, we no longer pass the vcpu pointer to
__kvmppc_vcore_entry, but instead let the assembly code load it from
the PACA.  Since the assembly code will need to know the kvm pointer
and the thread ID for threads which don't have a vcpu, we move the
thread ID into the PACA and we add a kvm pointer to the virtual core
structure.

In the case where thread 0 has no vcpu to run, it still calls into
kvmppc_hv_entry in order to do the MMU switch, and then naps until
either its vcpu is ready to run in the guest, or some other thread
needs to exit the guest.  In the latter case, thread 0 jumps to the
code that switches the MMU back to the host.  This control flow means
that now we switch the MMU before loading any guest vcpu state.
Similarly, on guest exit we now save all the guest vcpu state before
switching the MMU back to the host.  This has required substantial
code movement, making the diff rather large.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e0b7ec05

09 1月, 2014 2 次提交

kvm: powerpc: use caching attributes as per linux pte · 08c9a188

由 Bharat Bhushan 提交于 11月 18, 2013

KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

08c9a188

KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures · efff1912

由 Paul Mackerras 提交于 10月 15, 2013

This uses struct thread_fp_state and struct thread_vr_state to store
the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
This makes transferring the state to/from the thread_struct simpler
and allows us to unify the get/set_one_reg implementations for the
VSX registers.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

efff1912

18 10月, 2013 1 次提交

kvm: powerpc: book3s: Allow the HV and PR selection per virtual machine · cbbc58d4

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This moves the kvmppc_ops callbacks to be a per VM entity. This
enables us to select HV and PR mode when creating a VM. We also
allow both kvm-hv and kvm-pr kernel module to be loaded. To
achieve this we move /dev/kvm ownership to kvm.ko module. Depending on
which KVM mode we select during VM creation we take a reference
count on respective module
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix coding style]
Signed-off-by: NAlexander Graf <agraf@suse.de>

cbbc58d4

17 10月, 2013 5 次提交

kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLE · 9975f5e3

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This help ups to select the relevant code in the kernel code
when we later move HV and PR bits as seperate modules. The patch
also makes the config options for PR KVM selectable
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9975f5e3

kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE · 7aa79938

由 Aneesh Kumar K.V 提交于 10月 07, 2013

With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7aa79938

KVM: PPC: E500: Add userspace debug stub support · ce11e48b

由 Bharat Bhushan 提交于 7月 04, 2013

This patch adds the debug stub support on booke/bookehv.
Now QEMU debug stub can use hw breakpoint, watchpoint and
software breakpoint to debug guest.

This is how we save/restore debug register context when switching
between guest, userspace and kernel user-process:

When QEMU is running
 -> thread->debug_reg == QEMU debug register context.
 -> Kernel will handle switching the debug register on context switch.
 -> no vcpu_load() called

QEMU makes ioctls (except RUN)
 -> This will call vcpu_load()
 -> should not change context.
 -> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs

QEMU Makes RUN ioctl
 -> Save thread->debug_reg on STACK
 -> Store thread->debug_reg == vcpu->debug_reg
 -> load thread->debug_reg
 -> RUN VCPU ( So thread points to vcpu context )

Context switch happens When VCPU running
 -> makes vcpu_load() should not load any context
 -> kernel loads the vcpu context as thread->debug_regs points to vcpu context.

On heavyweight_exit
 -> Load the context saved on stack in thread->debug_reg

Currently we do not support debug resource emulation to guest,
On debug exception, always exit to user space irrespective of
user space is expecting the debug exception or not. If this is
unexpected exception (breakpoint/watchpoint event not set by
userspace) then let us leave the action on user space. This
is similar to what it was before, only thing is that now we
have proper exit state available to user space.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ce11e48b

KVM: PPC: E500: Using "struct debug_reg" · 547465ef

由 Bharat Bhushan 提交于 7月 04, 2013

For KVM also use the "struct debug_reg" defined in asm/processor.h
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

547465ef

KVM: PPC: Book3S PR: Better handling of host-side read-only pages · 93b159b4

由 Paul Mackerras 提交于 9月 20, 2013

Currently we request write access to all pages that get mapped into the
guest, even if the guest is only loading from the page. This reduces
the effectiveness of KSM because it means that we unshare every page we
access. Also, we always set the changed (C) bit in the guest HPTE if
it allows writing, even for a guest load.

This fixes both these problems. We pass an 'iswrite' flag to the
mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
the access is a load or a store. The mmu.xlate() functions now only
set C for stores. kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
instead of gfn_to_pfn() so that it can indicate whether we need write
access to the page, and get back a 'writable' flag to indicate whether
the page is writable or not. If that 'writable' flag is clear, we then
make the host HPTE read-only even if the guest HPTE allowed writing.

This means that we can get a protection fault when the guest writes to a
page that it has mapped read-write but which is read-only on the host
side (perhaps due to KSM having merged the page). Thus we now call
kvmppc_handle_pagefault() for protection faults as well as HPTE not found
faults. In kvmppc_handle_pagefault(), if the access was allowed by the
guest HPTE and we thus need to install a new host HPTE, we then need to
remove the old host HPTE if there is one. This is done with a new
function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
find and remove the old host HPTE.

Since the memslot-related functions require the KVM SRCU read lock to
be held, this adds srcu_read_lock/unlock pairs around the calls to
kvmppc_handle_pagefault().

Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore
guest HPTEs that don't permit access, and to return -EPERM for accesses
that are not permitted by the page protections.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

93b159b4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功