提交 · 66c9897d9d7675bfb8f4cc4d57ceb00b6a12a2e8 · openanolis / cloud-kernel

11 7月, 2012 6 次提交

KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests · 66c9897d

由 Mihai Caraman 提交于 6月 25, 2012

tlbilxva emulation was using an u32 variable for guest effective address.
Replace it with gva_t type to handle 64-bit guests.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

66c9897d

KVM: PPC64: booke: Set interrupt computation mode for 64-bit host · c7ba7771

由 Mihai Caraman 提交于 6月 25, 2012

64-bit host needs to remain in 64-bit mode when an exception take place.
Set interrupt computaion mode in EPCR register.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c7ba7771

KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt · 9997782e

由 Mihai Caraman 提交于 6月 22, 2012

ESR register is required by Data Storage Interrupt handling code.
Add the specific flag to the interrupt handler.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9997782e

KVM: PPC: bookehv64: Add support for std/ld emulation. · 6c5cb739

由 Varun Sethi 提交于 6月 18, 2012

Add support for std/ld emulation.
Signed-off-by: NVarun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6c5cb739

booke: Added crit/mc exception handler for e500v2 · 75c44bbb

由 Bharat Bhushan 提交于 6月 20, 2012

Watchdog is taken at critical exception level. So this patch
is tested with host watchdog exception happening when guest
is running.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

75c44bbb

B
booke/bookehv: Add host crit-watchdog exception support · 6328e593
由 Bharat Bhushan 提交于 6月 20, 2012
```
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
```
6328e593

30 5月, 2012 2 次提交

KVM: PPC: booke: Added DECAR support · 21bd000a

由 Bharat Bhushan 提交于 5月 20, 2012

Added the decrementer auto-reload support. DECAR is readable
on e500v2/e500mc and later cpus.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

21bd000a

KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281

由 Paul Mackerras 提交于 5月 04, 2012

This adds a new ioctl to enable userspace to control the size of the guest
hashed page table (HPT) and to clear it out when resetting the guest.
The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
a pointer to a u32 containing the desired order of the HPT (log base 2
of the size in bytes), which is updated on successful return to the
actual order of the HPT which was allocated.

There must be no vcpus running at the time of this ioctl.  To enforce
this, we now keep a count of the number of vcpus running in
kvm->arch.vcpus_running.

If the ioctl is called when a HPT has already been allocated, we don't
reallocate the HPT but just clear it out.  We first clear the
kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
the kvm->lock mutex, it will prevent any vcpus from starting to run until
we're done, and (b) it means that the first vcpu to run after we're done
will re-establish the VRMA if necessary.

If userspace doesn't call this ioctl before running the first vcpu, the
kernel will allocate a default-sized HPT at that point.  We do it then
rather than when creating the VM, as the code did previously, so that
userspace has a chance to do the ioctl if it wants.

When allocating the HPT, we can allocate either from the kernel page
allocator, or from the preallocated pool.  If userspace is asking for
a different size from the preallocated HPTs, we first try to allocate
using the kernel page allocator.  Then we try to allocate from the
preallocated pool, and then if that fails, we try allocating decreasing
sizes from the kernel page allocator, down to the minimum size allowed
(256kB).  Note that the kernel page allocator limits allocations to
1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
16MB (on 64-bit powerpc, at least).
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix module compilation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

32fad281

16 5月, 2012 5 次提交

KVM: PPC: Book3S HV: Fix bug leading to deadlock in guest HPT updates · 51bfd299

由 Paul Mackerras 提交于 5月 09, 2012

When handling the H_BULK_REMOVE hypercall, we were forgetting to
invalidate and unlock the hashed page table entry (HPTE) in the case
where the page had been paged out.  This fixes it by clearing the
first doubleword of the HPTE in that case.

This fixes a regression introduced in commit a92bce95 ("KVM: PPC:
Book3S HV: Keep HPTE locked when invalidating").  The effect of the
regression is that the host kernel will sometimes hang when under
memory pressure.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

51bfd299

powerpc/kvm: Fix VSID usage in 64-bit "PR" KVM · ffe36492

由 Benjamin Herrenschmidt 提交于 3月 23, 2012

The code forgot to scramble the VSIDs the way we normally do
and was basically using the "proto VSID" directly with the MMU.

This means that in practice, KVM used random VSIDs that could
collide with segments used by other user space programs.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: simplify ppc32 case]
Signed-off-by: NAlexander Graf <agraf@suse.de>

ffe36492

KVM: PPC: Book3S: PR: Fix hsrr code · 32c7dbfd

由 Alexander Graf 提交于 5月 10, 2012

When jumping back into the kernel to code that knows that it would be
using HSRR registers instead of SRR registers, we need to make sure we
pass it all information on where to jump to in HSRR registers.

Unfortunately, we used r10 to store the information to distinguish between
the HSRR and SRR case. That register got clobbered in between though,
rendering the later comparison invalid.

Instead, let's use cr1 to store this information. That way we don't
need yet another register and everyone's happy.

This fixes PR KVM on POWER7 bare metal for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

32c7dbfd

KVM: PPC: Fix PR KVM on POWER7 bare metal · 56e13dba

由 Alexander Graf 提交于 4月 27, 2012

When running on a system that is HV capable, some interrupts use HSRR
SPRs instead of the normal SRR SPRs. These are also used in the Linux
handlers to jump back to code after an interrupt got processed.

Unfortunately, in our "jump back to the real host handler after we've
done the context switch" code, we were only setting the SRR SPRs,
rendering Linux to jump back to some invalid IP after it's processed
the interrupt.

This fixes random crashes on p7 opal mode with PR KVM for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

56e13dba

KVM: PPC: Book3S: PR: Handle EMUL_ASSIST · 7ef4e985

由 Alexander Graf 提交于 5月 10, 2012

In addition to normal "priviledged instruction" traps, we can also receive
"emulation assist" traps on newer hardware that has the HV bit set.

Handle that one the same way as a privileged instruction, including the
instruction fetching. That way we don't execute old instructions that we
happen to still leave in that field when an emul assist trap comes.

This fixes -M mac99 / -M g3beige on p7 bare metal for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

7ef4e985

08 5月, 2012 1 次提交

KVM: PPC: Book3S HV: Fix refcounting of hugepages · de6c0b02

由 David Gibson 提交于 5月 08, 2012

The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
guest memory pages into host memory so that they can be safely accessed
from usermode.  It does this used get_user_pages_fast().  When the VPA is
unregistered, or the VCPUs are cleaned up, these pages are released using
put_page().

However, the get_user_pages() is invoked on the specific memory are of the
VPA which could lie within hugepages.  In case the pinned page is huge,
we explicitly find the head page of the compound page before calling
put_page() on it.

At least with the latest kernel, this is not correct.  put_page() already
handles finding the correct head page of a compound, and also deals with
various counts on the individual tail page which are important for
transparent huge pages.  We don't support transparent hugepages on Power,
but even so, bypassing this count maintenance can lead (when the VM ends)
to a hugepage being released back to the pool with a non-zero mapcount on
one of the tail pages.  This can then lead to a bad_page() when the page
is released from the hugepage pool.

This removes the explicit compound_head() call to correct this bug.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

de6c0b02

06 5月, 2012 15 次提交

KVM: PPC: Emulator: clean up SPR reads and writes · 54771e62

由 Alexander Graf 提交于 5月 04, 2012

When reading and writing SPRs, every SPR emulation piece had to read
or write the respective GPR the value was read from or stored in itself.

This approach is pretty prone to failure. What if we accidentally
implement mfspr emulation where we just do "break" and nothing else?
Suddenly we would get a random value in the return register - which is
always a bad idea.

So let's consolidate the generic code paths and only give the core
specific SPR handling code readily made variables to read/write from/to.

Functionally, this patch doesn't change anything, but it increases the
readability of the code and makes is less prone to bugs.
Signed-off-by: NAlexander Graf <agraf@suse.de>

54771e62

KVM: PPC: Emulator: clean up instruction parsing · c46dc9a8

由 Alexander Graf 提交于 5月 04, 2012

Instructions on PPC are pretty similarly encoded. So instead of
every instruction emulation code decoding the instruction fields
itself, we can move that code to more generic places and rely on
the compiler to optimize the unused bits away.

This has 2 advantages. It makes the code smaller and it makes the
code less error prone, as the instruction fields are always
available, so accidental misusage is reduced.

Functionally, this patch doesn't change anything.
Signed-off-by: NAlexander Graf <agraf@suse.de>

c46dc9a8

kvm/powerpc: Add new ioctl to retreive server MMU infos · 5b74716e

由 Benjamin Herrenschmidt 提交于 4月 26, 2012

This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: NAlexander Graf <agraf@suse.de>

5b74716e

kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM · f31e65e1

由 Benjamin Herrenschmidt 提交于 3月 15, 2012

There is nothing in the code for emulating TCE tables in the kernel
that prevents it from working on "PR" KVM... other than ifdef's and
location of the code.

This and moves the bulk of the code there to a new file called
book3s_64_vio.c.

This speeds things up a bit on my G5.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix for hv kvm, 32bit, whitespace]
Signed-off-by: NAlexander Graf <agraf@suse.de>

f31e65e1

KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler · 4444aa5f

由 Mihai Caraman 提交于 4月 16, 2012

Guest r8 register is held in the scratch register and stored correctly,
so remove the instruction that clobbers it. Guest r13 was missing from vcpu,
store it there.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4444aa5f

KVM: PPC: Book3S: Enable IRQs during exit handling · 3b1d9d7d

由 Alexander Graf 提交于 4月 30, 2012

While handling an exit, we should listen for interrupts and make sure to
receive them when they arrive, to keep our latencies low.
Signed-off-by: NAlexander Graf <agraf@suse.de>

3b1d9d7d

KVM: PPC: Fix PR KVM on POWER7 bare metal · 11f7d6c2

由 Alexander Graf 提交于 4月 27, 2012

When running on a system that is HV capable, some interrupts use HSRR
SPRs instead of the normal SRR SPRs. These are also used in the Linux
handlers to jump back to code after an interrupt got processed.

Unfortunately, in our "jump back to the real host handler after we've
done the context switch" code, we were only setting the SRR SPRs,
rendering Linux to jump back to some invalid IP after it's processed
the interrupt.

This fixes random crashes on p7 opal mode with PR KVM for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

11f7d6c2

KVM: PPC: Fix stbux emulation · 978b4fae

由 Alexander Graf 提交于 4月 27, 2012

Stbux writes the address it's operating on to the register specified in ra,
not into the data source register.
Signed-off-by: NAlexander Graf <agraf@suse.de>

978b4fae

KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields · 518f040c

由 Mihai Caraman 提交于 4月 16, 2012

Interrupt code used PPC_LL/PPC_STL macros to load/store some of u32 fields
which led to memory overflow on 64-bit. Use lwz/stw instead.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

518f040c

KVM: PPC: Book3S: PR: No isync in slbie path · af415087

由 Alexander Graf 提交于 4月 25, 2012

While messing around with the SLBs we're running in real mode. The
entry to guest space goes through rfid, which is context synchronizing,
so there's no need to manually synchronize anything through isync.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2035607 to 2181301 exits per second.
Signed-off-by: NAlexander Graf <agraf@suse.de>

af415087

KVM: PPC: Book3S: PR: Optimize entry path · 8c2d0be7

由 Alexander Graf 提交于 4月 25, 2012

By shuffling a few instructions around we can execute more memory
loads in parallel, giving us a small performance boost.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2013052 to 2035607 exits per second.
Signed-off-by: NAlexander Graf <agraf@suse.de>

8c2d0be7

KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs. · 30124906

由 Varun Sethi 提交于 4月 25, 2012

For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and
non-64 bit case. Use the PPC_STD/PPC_LD macros for saving/restoring to/from these registers.
Signed-off-by: NVarun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

30124906

KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their 64 bit copies. · 185e4188

由 Varun Sethi 提交于 4月 25, 2012

Introduced PPC_STD/PPC_LD macros for saving/restoring guest registers to/from their 64 bit copies.
Signed-off-by: NVarun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

185e4188

KVM: PPC: Use clockevent multiplier and shifter for decrementer · 6e35994d

由 Bharat Bhushan 提交于 4月 18, 2012

Time for which the hrtimer is started for decrementer emulation is calculated
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC
reprogramming (if needed) and which calculate timebase ticks using the
multiplier and shifter mechanism implemented within clockevent layer.

It was observed that this conversion (timebase->time->timebase) are not
correct because the mechanism are not consistent.
In our setup it adds 2% jitter.

With this patch clockevent multiplier and shifter mechanism are used when
starting hrtimer for decrementer emulation. Now the jitter is < 0.5%.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6e35994d

KVM: Use minimum and maximum address mapped by TLB1 · cc902ad4

由 Bharat Bhushan 提交于 3月 22, 2012

Keep track of minimum and maximum address mapped by tlb1.
This helps in TLBMISS handling in KVM to quick check whether the address lies in mapped range.
If address does not lies in this range then no need to look in each tlb1 entry of tlb1 array.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

cc902ad4

08 4月, 2012 11 次提交

powerpc/kvm: Fix magic page vs. 32-bit RTAS on ppc64 · bbcc9c06

由 Benjamin Herrenschmidt 提交于 3月 13, 2012

When the kernel calls into RTAS, it switches to 32-bit mode. The
magic page was is longer accessible in that case, causing the
patched instructions in the RTAS call wrapper to crash.

This fixes it by making available a 32-bit mapping of the magic
page in that case. This mapping is flushed whenever we switch
the kernel back to 64-bit mode.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: add a check if the magic page is mapped]
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bbcc9c06

KVM: PPC: Ignore unhalt request from kvm_vcpu_block · 966cd0f3

由 Alexander Graf 提交于 3月 14, 2012

When running kvm_vcpu_block and it realizes that the CPU is actually good
to run, we get a request bit set for KVM_REQ_UNHALT. Right now, there's
nothing we can do with that bit, so let's unset it right after the call
again so we don't get confused in our later checks for pending work.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

966cd0f3

KVM: PPC: Book3s: PR: Add HV traps so we can run in HV=1 mode on p7 · 4f225ae0

由 Alexander Graf 提交于 3月 13, 2012

When running PR KVM on a p7 system in bare metal, we get HV exits instead
of normal supervisor traps. Semantically they are identical though and the
HSRR vs SRR difference is already taken care of in the exit code.

So all we need to do is handle them in addition to our normal exits.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4f225ae0

KVM: PPC: Emulate tw and td instructions · 6df79df5

由 Alexander Graf 提交于 3月 13, 2012

There are 4 conditional trapping instructions: tw, twi, td, tdi. The
ones with an i take an immediate comparison, the others compare two
registers. All of them arrive in the emulator when the condition to
trap was successfully fulfilled.

Unfortunately, we were only implementing the i versions so far, so
let's also add support for the other two.

This fixes kernel booting with recents book3s_32 guest kernels.
Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6df79df5

KVM: PPC: Pass EA to updating emulation ops · 6020c0f6

由 Alexander Graf 提交于 3月 12, 2012

When emulating updating load/store instructions (lwzu, stwu, ...) we need to
write the effective address of the load/store into a register.

Currently, we write the physical address in there, which is very wrong. So
instead let's save off where the virtual fault was on MMIO and use that
information as value to put into the register.

While at it, also move the XOP variants of the above instructions to the new
scheme of using the already known vaddr instead of calculating it themselves.
Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6020c0f6

KVM: PPC: Work around POWER7 DABR corruption problem · 8943633c

由 Paul Mackerras 提交于 3月 02, 2012

It turns out that on POWER7, writing to the DABR can cause a corrupted
value to be written if the PMU is active and updating SDAR in continuous
sampling mode.  To work around this, we make sure that the PMU is inactive
and SDAR updates are disabled (via MMCRA) when we are context-switching
DABR.

When the guest sets DABR via the H_SET_DABR hypercall, we use a slightly
different workaround, which is to read back the DABR and write it again
if it got corrupted.

While we are at it, make it consistent that the saving and restoring
of the guest's non-volatile GPRs and the FPRs are done with the guest
setup of the PMU active.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8943633c

Restore guest CR after exit timing calculation · c0fe7b09

由 Bharat Bhushan 提交于 3月 05, 2012

No instruction which can change Condition Register (CR) should be executed after
Guest CR is loaded. So the guest CR is restored after the Exit Timing in
lightweight_exit executes cmpw, which can clobber CR.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c0fe7b09

KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace log · 0456ec4f

由 Paul Mackerras 提交于 2月 03, 2012

This adds code to measure "stolen" time per virtual core in units of
timebase ticks, and to report the stolen time to the guest using the
dispatch trace log (DTL).  The guest can register an area of memory
for the DTL for a given vcpu.  The DTL is a ring buffer where KVM
fills in one entry every time it enters the guest for that vcpu.

Stolen time is measured as time when the virtual core is not running,
either because the vcore is not runnable (e.g. some of its vcpus are
executing elsewhere in the kernel or in userspace), or when the vcpu
thread that is running the vcore is preempted.  This includes time
when all the vcpus are idle (i.e. have executed the H_CEDE hypercall),
which is OK because the guest accounts stolen time while idle as idle
time.

Each vcpu keeps a record of how much stolen time has been reported to
the guest for that vcpu so far.  When we are about to enter the guest,
we create a new DTL entry (if the guest vcpu has a DTL) and report the
difference between total stolen time for the vcore and stolen time
reported so far for the vcpu as the "enqueue to dispatch" time in the
DTL entry.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0456ec4f

KVM: PPC: Book3S HV: Make virtual processor area registration more robust · 2e25aa5f

由 Paul Mackerras 提交于 2月 19, 2012

The PAPR API allows three sorts of per-virtual-processor areas to be
registered (VPA, SLB shadow buffer, and dispatch trace log), and
furthermore, these can be registered and unregistered for another
virtual CPU.  Currently we just update the vcpu fields pointing to
these areas at the time of registration or unregistration.  If this
is done on another vcpu, there is the possibility that the target vcpu
is using those fields at the time and could end up using a bogus
pointer and corrupting memory.

This fixes the race by making the target cpu itself do the update, so
we can be sure that the update happens at a time when the fields
aren't being used.  Each area now has a struct kvmppc_vpa which is
used to manage these updates.  There is also a spinlock which protects
access to all of the kvmppc_vpa structs, other than to the pinned_addr
fields.  (We could have just taken the spinlock when using the vpa,
slb_shadow or dtl fields, but that would mean taking the spinlock on
every guest entry and exit.)

This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry',
which is what the rest of the kernel uses.

Thanks to Michael Ellerman <michael@ellerman.id.au> for pointing out
the need to initialize vcpu->arch.vpa_update_lock.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2e25aa5f

KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs · f0888f70

由 Paul Mackerras 提交于 2月 03, 2012

Currently on POWER7, if we are running the guest on a core and we don't
need all the hardware threads, we do nothing to ensure that the unused
threads aren't executing in the kernel (other than checking that they
are offline). We just assume they're napping and we don't do anything
to stop them trying to enter the kernel while the guest is running.
This means that a stray IPI can wake up the hardware thread and it will
then try to enter the kernel, but since the core is in guest context,
it will execute code from the guest in hypervisor mode once it turns the
MMU on, which tends to lead to crashes or hangs in the host.

This fixes the problem by adding two new one-byte flags in the
kvmppc_host_state structure in the PACA which are used to interlock
between the primary thread and the unused secondary threads when entering
the guest. With these flags, the primary thread can ensure that the
unused secondaries are not already in kernel mode (i.e. handling a stray
IPI) and then indicate that they should not try to enter the kernel
if they do get woken for any reason. Instead they will go into KVM code,
find that there is no vcpu to run, acknowledge and clear the IPI and go
back to nap mode.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f0888f70

KVM: PPC: Save/Restore CR over vcpu_run · f6127716

由 Alexander Graf 提交于 3月 05, 2012

On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls.
We didn't respect that for any architecture until Paul spotted it in his
patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f6127716

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功