提交 · 29577fc00ba40a89fc824f030bcc80c583259346 · openanolis / cloud-kernel

30 7月, 2014 2 次提交

KVM: PPC: HV: Remove generic instruction emulation · 29577fc0

由 Alexander Graf 提交于 7月 30, 2014

Now that we have properly split load/store instruction emulation and generic
instruction emulation, we can move the generic one from kvm.ko to kvm-pr.ko
on book3s_64.

This reduces the attack surface and amount of code loaded on HV KVM kernels.
Signed-off-by: NAlexander Graf <agraf@suse.de>

29577fc0

KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr · 5a484c7c

由 Bharat Bhushan 提交于 7月 30, 2014

This are not specific to e500hv but applicable for bookehv
(As per comment from Scott Wood on my patch
"kvm: ppc: bookehv: Added wrapper macros for shadow registers")
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5a484c7c

29 7月, 2014 3 次提交

KVM: PPC: Remove DCR handling · ce91ddc4

由 Alexander Graf 提交于 7月 28, 2014

DCR handling was only needed for 440 KVM. Since we removed it, we can also
remove handling of DCR accesses.
Signed-off-by: NAlexander Graf <agraf@suse.de>

ce91ddc4

KVM: PPC: Expose helper functions for data/inst faults · 8de12015

由 Alexander Graf 提交于 6月 18, 2014

We're going to implement guest code interpretation in KVM for some rare
corner cases. This code needs to be able to inject data and instruction
faults into the guest when it encounters them.

Expose generic APIs to do this in a reasonably subarch agnostic fashion.
Signed-off-by: NAlexander Graf <agraf@suse.de>

8de12015

KVM: PPC: Separate loadstore emulation from priv emulation · d69614a2

由 Alexander Graf 提交于 6月 18, 2014

Today the instruction emulator can get called via 2 separate code paths. It
can either be called by MMIO emulation detection code or by privileged
instruction traps.

This is bad, as both code paths prepare the environment differently. For MMIO
emulation we already know the virtual address we faulted on, so instructions
there don't have to actually fetch that information.

Split out the two separate use cases into separate files.
Signed-off-by: NAlexander Graf <agraf@suse.de>

d69614a2

28 7月, 2014 35 次提交

KVM: PPC: Handle magic page in kvmppc_ld/st · c12fb43c

由 Alexander Graf 提交于 6月 20, 2014

We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as
well access the magic page. Special case it out so that we can properly access
it.
Signed-off-by: NAlexander Graf <agraf@suse.de>

c12fb43c

KVM: PPC: Use kvm_read_guest in kvmppc_ld · c45c5514

由 Alexander Graf 提交于 6月 20, 2014

We have a nice and handy helper to read from guest physical address space,
so we should make use of it in kvmppc_ld as we already do for its counterpart
in kvmppc_st.
Signed-off-by: NAlexander Graf <agraf@suse.de>

c45c5514

KVM: PPC: Remove kvmppc_bad_hva() · 9897e88a

由 Alexander Graf 提交于 6月 20, 2014

We have a proper define for invalid HVA numbers. Use those instead of the
ppc specific kvmppc_bad_hva().
Signed-off-by: NAlexander Graf <agraf@suse.de>

9897e88a

KVM: PPC: Move kvmppc_ld/st to common code · 35c4a733

由 Alexander Graf 提交于 6月 20, 2014

We have enough common infrastructure now to resolve GVA->GPA mappings at
runtime. With this we can move our book3s specific helpers to load / store
in guest virtual address space to common code as well.
Signed-off-by: NAlexander Graf <agraf@suse.de>

35c4a733

KVM: PPC: Implement kvmppc_xlate for all targets · 7d15c06f

由 Alexander Graf 提交于 6月 20, 2014

We have a nice API to find the translated GPAs of a GVA including protection
flags. So far we only use it on Book3S, but there's no reason the same shouldn't
be used on BookE as well.

Implement a kvmppc_xlate() version for BookE and clean it up to make it more
readable in general.
Signed-off-by: NAlexander Graf <agraf@suse.de>

7d15c06f

KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page · 63fff5c1

由 Aneesh Kumar K.V 提交于 6月 29, 2014

When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

63fff5c1

KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode · 7a58777a

由 Alexander Graf 提交于 7月 14, 2014

With Book3S KVM we can create both PR and HV VMs in parallel on the same
machine. That gives us new challenges on the CAPs we return - both have
different capabilities.

When we get asked about CAPs on the kvm fd, there's nothing we can do. We
can try to be smart and assume we're running HV if HV is available, PR
otherwise. However with the newly added VM CHECK_EXTENSION we can now ask
for capabilities directly on a VM which knows whether it's PR or HV.

With this patch I can successfully expose KVM PVINFO data to user space
in the PR case, fixing magic page mapping for PAPR guests.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

7a58777a

KVM: Allow KVM_CHECK_EXTENSION on the vm fd · 92b591a4

由 Alexander Graf 提交于 7月 14, 2014

The KVM_CHECK_EXTENSION is only available on the kvm fd today. Unfortunately
on PPC some of the capabilities change depending on the way a VM was created.

So instead we need a way to expose capabilities as VM ioctl, so that we can
see which VM type we're using (HV or PR). To enable this, add the
KVM_CHECK_EXTENSION ioctl to our vm ioctl portfolio.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

92b591a4

KVM: Rename and add argument to check_extension · 784aa3d7

由 Alexander Graf 提交于 7月 14, 2014

In preparation to make the check_extension function available to VM scope
we add a struct kvm * argument to the function header and rename the function
accordingly. It will still be called from the /dev/kvm fd, but with a NULL
argument for struct kvm *.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

784aa3d7

Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 · 9678cdaa

由 Stewart Smith 提交于 7月 18, 2014

The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying "has way to store and load contents of L2 or
L2+MRU way of L3 cache". We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.

The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
  - 32kb for L2
  - 128kb for L2+L3
  - Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)

We should abort any ongoing logging before initiating one.

To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
  32 cycles)

The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.

The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.

This implementation just does L2 store/load. In my benchmarks this proves
to be useful.

Benchmark 1:
 - 16 core POWER8
 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
 - No split core/SMT
 - two guests running sysbench memory test.
   sysbench --test=memory --num-threads=8 run
 - one guest running apache bench (of default HTML page)
   ab -n 490000 -c 400 http://localhost/

This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.

In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.

benchmark 2:
 - Same VM config as benchmark 1
 - all three guests running sysbench memory benchmark

This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)

The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.

benchmark 3:
 - same 3 guests and CPU configuration as benchmark 1 and 2.
 - two idle guests
 - 1 guest running STREAM benchmark

This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).

benchmark 4:
 - same 3 guests as previous benchmarks
 - two guests running sysbench --memory, distinctly different cache heavy
   workload
 - one guest running STREAM benchmark.

Similar improvements to benchmark 3.

benchmark 5:
 - 1 guest, 8 VCPUs, Ubuntu 14.04
 - Host configured with split core (SMT8, subcores-per-core=4)
 - STREAM benchmark

In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.

Based on preliminary investigation and microbenchmarks
by Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9678cdaa

Split out struct kvmppc_vcore creation to separate function · de9bdd1a

由 Stewart Smith 提交于 7月 18, 2014

No code changes, just split it out to a function so that with the addition
of micro partition prefetch buffer allocation (in subsequent patch) looks
neater and doesn't require excessive indentation.
Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

de9bdd1a

KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication · 1b2e33b0

由 Paul Mackerras 提交于 7月 19, 2014

At present, kvmppc_ld calls kvmppc_xlate, and if kvmppc_xlate returns
any error indication, it returns -ENOENT, which is taken to mean an
HPTE not found error.  However, the error could have been a segment
found (no SLB entry) or a permission error.  Similarly,
kvmppc_pte_to_hva currently does permission checking, but any error
from it is taken by kvmppc_ld to mean that the access is an emulated
MMIO access.  Also, kvmppc_ld does no execute permission checking.

This fixes these problems by (a) returning any error from kvmppc_xlate
directly, (b) moving the permission check from kvmppc_pte_to_hva
into kvmppc_ld, and (c) adding an execute permission check to kvmppc_ld.

This is similar to what was done for kvmppc_st() by commit 82ff911317c3
("KVM: PPC: Deflect page write faults properly in kvmppc_st").
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1b2e33b0

KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call · ef1af2e2

由 Paul Mackerras 提交于 7月 19, 2014

This does for PR KVM what c9438092 ("KVM: PPC: Book3S HV: Take SRCU
read lock around kvm_read_guest() call") did for HV KVM, that is,
eliminate a "suspicious rcu_dereference_check() usage!" warning by
taking the SRCU lock around the call to kvmppc_rtas_hcall().

It also fixes a return of RESUME_HOST to return EMULATE_FAIL instead,
since kvmppc_h_pr() is supposed to return EMULATE_* values.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: NAlexander Graf <agraf@suse.de>

ef1af2e2

KVM: PPC: Book3S: Fix LPCR one_reg interface · a0840240

由 Alexey Kardashevskiy 提交于 7月 19, 2014

Unfortunately, the LPCR got defined as a 32-bit register in the
one_reg interface.  This is unfortunate because KVM allows userspace
to control the DPFD (default prefetch depth) field, which is in the
upper 32 bits.  The result is that DPFD always get set to 0, which
reduces performance in the guest.

We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID,
since that would break existing userspace binaries.  Instead we define
a new KVM_REG_PPC_LPCR_64 id which is 64-bit.  Userspace can still use
the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in
the bottom 32 bits that userspace can modify (ILE, TC and AIL).
If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD
as well.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: NAlexander Graf <agraf@suse.de>

a0840240

KVM: PPC: Remove 440 support · b2677b8d

由 Alexander Graf 提交于 7月 25, 2014

The 440 target hasn't been properly functioning for a few releases and
before I was the only one who fixes a very serious bug that indicates to
me that nobody used it before either.

Furthermore KVM on 440 is slow to the extent of unusable.

We don't have to carry along completely unused code. Remove 440 and give
us one less thing to worry about.
Signed-off-by: NAlexander Graf <agraf@suse.de>

b2677b8d

KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer · 8c95ead6

由 Bharat Bhushan 提交于 7月 25, 2014

Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer,
but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment
is not valid now.

Note: SPRN_SPRG3R is not supported (do not see any need as of now),
and if we want to support this in future then we have to shift to using
SPRG1 for VCPU pointer.
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8c95ead6

KVM: PPC: Booke-hv: Add one reg interface for SPRG9 · 28d2f421

由 Bharat Bhushan 提交于 7月 25, 2014

We now support SPRG9 for guest, so also add a one reg interface for same
Note: Changes are in bookehv code only as we do not have SPRG9 on booke-pr.
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

28d2f421

kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit · 99e99d19

由 Bharat Bhushan 提交于 7月 21, 2014

SPRN_SPRG is used by debug interrupt handler, so this is required for
debug support.
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

99e99d19

KVM: PPC: Bookehv: Get vcpu's last instruction for emulation · f5250471

由 Mihai Caraman 提交于 7月 23, 2014

On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
and LRAT), generated by loading a guest address, needs to be handled by KVM.
These exceptions are generated in a substituted guest translation context
(EPLC[EGS] = 1) from host context (MSR[GS] = 0).

Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1),
doing minimal checks on the fast path to avoid host performance degradation.
lwepx exceptions originate from host state (MSR[GS] = 0) which implies
additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking
at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context
Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
too intrusive for the host.

Read guest last instruction from kvmppc_load_last_inst() by searching for the
physical address and kmap it. This address the TODO for TLB eviction and
execute-but-not-read entries, and allow us to get rid of lwepx until we are
able to handle failures.

A simple stress benchmark shows a 1% sys performance degradation compared with
previous approach (lwepx without failure handling):

time for i in `seq 1 10000`; do /bin/echo > /dev/null; done

real    0m 8.85s
user    0m 4.34s
sys     0m 4.48s

vs

real    0m 8.84s
user    0m 4.36s
sys     0m 4.44s

A solution to use lwepx and to handle its exceptions in KVM would be to temporary
highjack the interrupt vector from host. This imposes additional synchronizations
for cores like FSL e6500 that shares host IVOR registers between hardware threads.
This optimized solution can be later developed on top of this patch.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f5250471

KVM: PPC: Allow kvmppc_get_last_inst() to fail · 51f04726

由 Mihai Caraman 提交于 7月 23, 2014

On book3e, guest last instruction is read on the exit path using load
external pid (lwepx) dedicated instruction. This load operation may fail
due to TLB eviction and execute-but-not-read entries.

This patch lay down the path for an alternative solution to read the guest
last instruction, by allowing kvmppc_get_lat_inst() function to fail.
Architecture specific implmentations of kvmppc_load_last_inst() may read
last guest instruction and instruct the emulation layer to re-execute the
guest in case of failure.

Make kvmppc_get_last_inst() definition common between architectures.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

51f04726

KVM: PPC: Book3s: Remove kvmppc_read_inst() function · 9a26af64

由 Mihai Caraman 提交于 7月 23, 2014

In the context of replacing kvmppc_ld() function calls with a version of
kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this:

"If we get EMULATE_AGAIN, we just have to make sure we go back into the guest.
No need to inject an ISI into the guest - it'll do that all by itself.
With an error returning kvmppc_get_last_inst we can just use completely
get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst() instead."

As a intermediate step get rid of kvmppc_read_inst() and only use kvmppc_ld()
instead.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9a26af64

KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 · 9c0d4e0d

由 Mihai Caraman 提交于 7月 23, 2014

Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9c0d4e0d

KVM: PPC: e500mc: Revert "add load inst fixup" · b5741bb3

由 Mihai Caraman 提交于 7月 23, 2014

The commit 1d628af7 "add load inst fixup" made an attempt to handle
failures generated by reading the guest current instruction. The fixup
code that was added works by chance hiding the real issue.

Load external pid (lwepx) instruction, used by KVM to read guest
instructions, is executed in a subsituted guest translation context
(EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
interrupts need to be handled by KVM, even though these interrupts
are generated from host context (MSR[GS] = 0) where lwepx is executed.

Currently, KVM hooks only interrupts generated from guest context
(MSR[GS] = 1), doing minimal checks on the fast path to avoid host
performance degradation. As a result, the host kernel handles lwepx
faults searching the faulting guest data address (loaded in DEAR) in
its own Logical Partition ID (LPID) 0 context. In case a host translation
is found the execution returns to the lwepx instruction instead of the
fixup, the host ending up in an infinite loop.

Revert the commit "add load inst fixup". lwepx issue will be addressed
in a subsequent patch without needing fixup code.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b5741bb3

kvm: ppc: Add SPRN_EPR get helper function · 34f754b9

由 Bharat Bhushan 提交于 7月 17, 2014

kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So
rename and move get_epr helper function to same file.
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
[agraf: remove duplicate return]
Signed-off-by: NAlexander Graf <agraf@suse.de>

34f754b9

kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 · c1b8a01b

由 Bharat Bhushan 提交于 7月 17, 2014

Use kvmppc_set_sprg[0-7]() and kvmppc_get_sprg[0-7]() helper
functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c1b8a01b

kvm: ppc: booke: Add shared struct helpers of SPRN_ESR · dc168549

由 Bharat Bhushan 提交于 7月 17, 2014

Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

dc168549

kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR · a5414d4b

由 Bharat Bhushan 提交于 7月 17, 2014

Uses kvmppc_set_dar() and kvmppc_get_dar() helper functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a5414d4b

kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 · 31579eea

由 Bharat Bhushan 提交于 7月 17, 2014

Use kvmppc_set_srr0/srr1() and kvmppc_get_srr0/srr1() helper functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

31579eea

kvm: ppc: bookehv: Added wrapper macros for shadow registers · 1dc0c5b8

由 Bharat Bhushan 提交于 7月 17, 2014

There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on
BOOKE-HV and these shadow registers are guest accessible.
So these shadow registers needs to be updated on BOOKE-HV.
This patch adds new macro for get/set helper of shadow register .
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1dc0c5b8

KVM: PPC: Book3S: Make magic page properly 4k mappable · 89b68c96

由 Alexander Graf 提交于 7月 13, 2014

The magic page is defined as a 4k page of per-vCPU data that is shared
between the guest and the host to accelerate accesses to privileged
registers.

However, when the host is using 64k page size granularity we weren't quite
as strict about that rule anymore. Instead, we partially treated all of the
upper 64k as magic page and mapped only the uppermost 4k with the actual
magic contents.

This works well enough for Linux which doesn't use any memory in kernel
space in the upper 64k, but Mac OS X got upset. So this patch makes magic
page actually stay in a 4k range even on 64k page size hosts.

This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE
hosts for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

89b68c96

KVM: PPC: Book3S: Add hack for split real mode · c01e3f66

由 Alexander Graf 提交于 7月 11, 2014

Today we handle split real mode by mapping both instruction and data faults
into a special virtual address space that only exists during the split mode
phase.

This is good enough to catch 32bit Linux guests that use split real mode for
copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
instruction pointer and can map the user space process freely below there.

However, that approach fails when we're running KVM inside of KVM. Here the 1st
level last_inst reader may well be in the same virtual page as a 2nd level
interrupt handler.

It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
kernel copy_from/to_user implementation can easily overlap with user space
addresses.

The architecturally correct way to fix this would be to implement an instruction
interpreter in KVM that kicks in whenever we go into split real mode. This
interpreter however would not receive a great amount of testing and be a lot of
bloat for a reasonably isolated corner case.

So I went back to the drawing board and tried to come up with a way to make
split real mode work with a single flat address space. And then I realized that
we could get away with the same trick that makes it work for Linux:

Whenever we see an instruction address during split real mode that may collide,
we just move it higher up the virtual address space to a place that hopefully
does not collide (keep your fingers crossed!).

That approach does work surprisingly well. I am able to successfully run
Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
broken split real mode :).
Signed-off-by: NAlexander Graf <agraf@suse.de>

c01e3f66

KVM: PPC: Book3S: Stop PTE lookup on write errors · 2e27ecc9

由 Alexander Graf 提交于 7月 10, 2014

When a page lookup failed because we're not allowed to write to the page, we
should not overwrite that value with another lookup on the second PTEG which
will return "page not found". Instead, we should just tell the caller that we
had a permission problem.

This fixes Mac OS X guests looping endlessly in page lookup code for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

2e27ecc9

KVM: PPC: Deflect page write faults properly in kvmppc_st · 17824b5a

由 Alexander Graf 提交于 7月 10, 2014

When we have a page that we're not allowed to write to, xlate() will already
tell us -EPERM on lookup of that page. With the code as is we change it into
a "page missing" error which a guest may get confused about. Instead, just
tell the caller about the -EPERM directly.

This fixes Mac OS X guests when run with DCBZ32 emulation.
Signed-off-by: NAlexander Graf <agraf@suse.de>

17824b5a

KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct · 1287cb3f

由 Alexander Graf 提交于 7月 04, 2014

When building KVM with a lot of vcores (NR_CPUS is big), we can potentially
get out of the ld immediate range for dereferences inside that struct.

Move the array to the end of our kvm_arch struct. This fixes compilation
issues with NR_CPUS=2048 for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

1287cb3f

KVM: PPC: e500: Emulate power management control SPR · debf27d6

由 Mihai Caraman 提交于 7月 04, 2014

For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as a general store.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

debf27d6

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功