提交 · 8563bf52d509213e746295341ab52896b562ca5e · openeuler / raspberrypi-kernel

27 1月, 2014 8 次提交

KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52

由 Paul Mackerras 提交于 1月 08, 2014

The DABRX (DABR extension) register on POWER7 processors provides finer
control over which accesses cause a data breakpoint interrupt. It
contains 3 bits which indicate whether to enable accesses in user,
kernel and hypervisor modes respectively to cause data breakpoint
interrupts, plus one bit that enables both real mode and virtual mode
accesses to cause interrupts. Currently, KVM sets DABRX to allow
both kernel and user accesses to cause interrupts while in the guest.

This adds support for the guest to specify other values for DABRX.
PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
and DABRX with one call. This adds a real-mode implementation of
H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
implementation. To support this, we add a per-vcpu field to store the
DABRX value plus code to get and set it via the ONE_REG interface.

For Linux guests to use this new hcall, userspace needs to add
"hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
property in the device tree. If userspace does this and then migrates
the guest to a host where the kernel doesn't include this patch, then
userspace will need to implement H_SET_XDABR by writing the specified
DABR value to the DABR using the ONE_REG interface. In that case, the
old kernel will set DABRX to DABRX_USER | DABRX_KERNEL. That should
still work correctly, at least for Linux guests, since Linux guests
cope with getting data breakpoint interrupts in modes that weren't
requested by just ignoring the interrupt, and Linux guests never set
DABRX_BTI.

The other thing this does is to make H_SET_DABR and H_SET_XDABR work
on POWER8, which has the DAWR and DAWRX instead of DABR/X. Guests that
know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
For them, this adds the logic to convert DABR/X values into DAWR/X values
on POWER8.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8563bf52

KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells · 5d00f66b

由 Paul Mackerras 提交于 1月 08, 2014

POWER8 has support for hypervisor doorbell interrupts.  Though the
kernel doesn't use them for IPIs on the powernv platform yet, it
probably will in future, so this makes KVM cope gracefully if a
hypervisor doorbell interrupt arrives while in a guest.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5d00f66b

KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs · aa31e843

由 Paul Mackerras 提交于 1月 08, 2014

* SRR1 wake reason field for system reset interrupt on wakeup from nap
  is now a 4-bit field on P8, compared to 3 bits on P7.

* Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
  will wake us up.

* Waking up from nap because of a guest doorbell interrupt is not a
  reason to exit the guest.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

aa31e843

KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap · e3bbbbfa

由 Paul Mackerras 提交于 1月 08, 2014

Currently in book3s_hv_rmhandlers.S we have three places where we
have woken up from nap mode and we check the reason field in SRR1
to see what event woke us up. This consolidates them into a new
function, kvmppc_check_wake_reason. It looks at the wake reason
field in SRR1, and if it indicates that an external interrupt caused
the wakeup, calls kvmppc_read_intr to check what sort of interrupt
it was.

This also consolidates the two places where we synthesize an external
interrupt (0x500 vector) for the guest. Now, if the guest exit code
finds that there was an external interrupt which has been handled
(i.e. it was an IPI indicating that there is now an interrupt pending
for the guest), it jumps to deliver_guest_interrupt, which is in the
last part of the guest entry code, where we synthesize guest external
and decrementer interrupts. That code has been streamlined a little
and now clears LPCR[MER] when appropriate as well as setting it.

The extra clearing of any pending IPI on a secondary, offline CPU
thread before going back to nap mode has been removed. It is no longer
necessary now that we have code to read and acknowledge IPIs in the
guest exit path.

This fixes a minor bug in the H_CEDE real-mode handling - previously,
if we found that other threads were already exiting the guest when we
were about to go to nap mode, we would branch to the cede wakeup path
and end up looking in SRR1 for a wakeup reason. Now we branch to a
point after we have checked the wakeup reason.

This also fixes a minor bug in kvmppc_read_intr - previously it could
return 0xff rather than 1, in the case where we find that a host IPI
is pending after we have cleared the IPI. Now it returns 1.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e3bbbbfa

KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8 · ca252055

由 Paul Mackerras 提交于 1月 08, 2014

POWER8 has 512 sets in the TLB, compared to 128 for POWER7, so we need
to do more tlbiel instructions when flushing the TLB on POWER8.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ca252055

KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e

由 Michael Neuling 提交于 1月 08, 2014

This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.

Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b005255e

KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers · e0b7ec05

由 Paul Mackerras 提交于 1月 08, 2014

On a threaded processor such as POWER7, we group VCPUs into virtual
cores and arrange that the VCPUs in a virtual core run on the same
physical core.  Currently we don't enforce any correspondence between
virtual thread numbers within a virtual core and physical thread
numbers.  Physical threads are allocated starting at 0 on a first-come
first-served basis to runnable virtual threads (VCPUs).

POWER8 implements a new "msgsndp" instruction which guest kernels can
use to interrupt other threads in the same core or sub-core.  Since
the instruction takes the destination physical thread ID as a parameter,
it becomes necessary to align the physical thread IDs with the virtual
thread IDs, that is, to make sure virtual thread N within a virtual
core always runs on physical thread N.

This means that it's possible that thread 0, which is where we call
__kvmppc_vcore_entry, may end up running some other vcpu than the
one whose task called kvmppc_run_core(), or it may end up running
no vcpu at all, if for example thread 0 of the virtual core is
currently executing in userspace.  However, we do need thread 0
to be responsible for switching the MMU -- a previous version of
this patch that had other threads switching the MMU was found to
be responsible for occasional memory corruption and machine check
interrupts in the guest on POWER7 machines.

To accommodate this, we no longer pass the vcpu pointer to
__kvmppc_vcore_entry, but instead let the assembly code load it from
the PACA.  Since the assembly code will need to know the kvm pointer
and the thread ID for threads which don't have a vcpu, we move the
thread ID into the PACA and we add a kvm pointer to the virtual core
structure.

In the case where thread 0 has no vcpu to run, it still calls into
kvmppc_hv_entry in order to do the MMU switch, and then naps until
either its vcpu is ready to run in the guest, or some other thread
needs to exit the guest.  In the latter case, thread 0 jumps to the
code that switches the MMU back to the host.  This control flow means
that now we switch the MMU before loading any guest vcpu state.
Similarly, on guest exit we now save all the guest vcpu state before
switching the MMU back to the host.  This has required substantial
code movement, making the diff rather large.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e0b7ec05

KVM: PPC: Book3S HV: Don't set DABR on POWER8 · eee7ff9d

由 Michael Neuling 提交于 1月 08, 2014

POWER8 doesn't have the DABR and DABRX registers; instead it has
new DAWR/DAWRX registers, which will be handled in a later patch.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

eee7ff9d

09 1月, 2014 2 次提交

KVM: PPC: Book3S HV: Use load/store_fp_state functions in HV guest entry/exit · 595e4f7e

由 Paul Mackerras 提交于 10月 15, 2013

This modifies kvmppc_load_fp and kvmppc_save_fp to use the generic
FP/VSX and VMX load/store functions instead of open-coding the
FP/VSX/VMX load/store instructions.  Since kvmppc_load/save_fp don't
follow C calling conventions, we make them private symbols within
book3s_hv_rmhandlers.S.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

595e4f7e

KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures · efff1912

由 Paul Mackerras 提交于 10月 15, 2013

This uses struct thread_fp_state and struct thread_vr_state to store
the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
This makes transferring the state to/from the thread_struct simpler
and allows us to unify the get/set_one_reg implementations for the
VSX registers.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

efff1912

21 11月, 2013 1 次提交

powerpc: kvm: optimize "sc 1" as fast return · 27025a60

由 Liu Ping Fan 提交于 11月 19, 2013

In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently,
this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL
to HV guest, so powernv can return to HV guest without heavy exit, i.e,
no need to swap TLB, HTAB,.. etc
Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

27025a60

17 10月, 2013 11 次提交

kvm: powerpc: book3s: Cleanup interrupt handling code · dd96b2c2

由 Aneesh Kumar K.V 提交于 10月 07, 2013

With this patch if HV is included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps
in enabling both HV and PR, which we do in later patch
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

dd96b2c2

KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode · 44a3add8

由 Paul Mackerras 提交于 10月 04, 2013

When an interrupt or exception happens in the guest that comes to the
host, the CPU goes to hypervisor real mode (MMU off) to handle the
exception but doesn't change the MMU context. After saving a few
registers, we then clear the "in guest" flag. If, for any reason,
we get an exception in the real-mode code, that then gets handled
by the normal kernel exception handlers, which turn the MMU on. This
is disastrous if the MMU is still set to the guest context, since we
end up executing instructions from random places in the guest kernel
with hypervisor privilege.

In order to catch this situation, we define a new value for the "in guest"
flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
mode with guest MMU context. If the "in guest" flag is set to this value,
we branch off to an emergency handler. For the moment, this just does
a branch to self to stop the CPU from doing anything further.

While we're here, we define another new flag value to indicate that we
are in a HV guest, as distinct from a PR guest. This will be useful
when we have a kernel that can support both PR and HV guests concurrently.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

44a3add8

KVM: PPC: Book3S: Move skip-interrupt handlers to common code · 4f6c11db

由 Paul Mackerras 提交于 9月 20, 2013

Both PR and HV KVM have separate, identical copies of the
kvmppc_skip_interrupt and kvmppc_skip_Hinterrupt handlers that are
used for the situation where an interrupt happens when loading the
instruction that caused an exit from the guest.  To eliminate this
duplication and make it easier to compile in both PR and HV KVM,
this moves this code to arch/powerpc/kernel/exceptions-64s.S along
with other kernel interrupt handler code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4f6c11db

KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1

由 Paul Mackerras 提交于 9月 21, 2013

This enables us to use the Processor Compatibility Register (PCR) on
POWER7 to put the processor into architecture 2.05 compatibility mode
when running a guest.  In this mode the new instructions and registers
that were introduced on POWER7 are disabled in user mode.  This
includes all the VSX facilities plus several other instructions such
as ldbrx, stdbrx, popcntw, popcntd, etc.

To select this mode, we have a new register accessible through the
set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
this to zero gives the full set of capabilities of the processor.
Setting it to one of the "logical" PVR values defined in PAPR puts
the vcpu into the compatibility mode for the corresponding
architecture level.  The supported values are:

0x0f000002	Architecture 2.05 (POWER6)
0x0f000003	Architecture 2.06 (POWER7)
0x0f100003	Architecture 2.06+ (POWER7+)

Since the PCR is per-core, the architecture compatibility level and
the corresponding PCR value are stored in the struct kvmppc_vcore, and
are therefore shared between all vcpus in a virtual core.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: squash in fix to add missing break statements and documentation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

388cc6e1

KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9

由 Paul Mackerras 提交于 9月 20, 2013

POWER7 and later IBM server processors have a register called the
Program Priority Register (PPR), which controls the priority of
each hardware CPU SMT thread, and affects how fast it runs compared
to other SMT threads.  This priority can be controlled by writing to
the PPR or by use of a set of instructions of the form or rN,rN,rN
which are otherwise no-ops but have been defined to set the priority
to particular levels.

This adds code to context switch the PPR when entering and exiting
guests and to make the PPR value accessible through the SET/GET_ONE_REG
interface.  When entering the guest, we set the PPR as late as
possible, because if we are setting a low thread priority it will
make the code run slowly from that point on.  Similarly, the
first-level interrupt handlers save the PPR value in the PACA very
early on, and set the thread priority to the medium level, so that
the interrupt handling code runs at a reasonable speed.
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4b8473c9

KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a

由 Paul Mackerras 提交于 9月 20, 2013

This adds the ability to have a separate LPCR (Logical Partitioning
Control Register) value relating to a guest for each virtual core,
rather than only having a single value for the whole VM.  This
corresponds to what real POWER hardware does, where there is a LPCR
per CPU thread but most of the fields are required to have the same
value on all active threads in a core.

The per-virtual-core LPCR can be read and written using the
GET/SET_ONE_REG interface.  Userspace can can only modify the
following fields of the LPCR value:

DPFD	Default prefetch depth
ILE	Interrupt little-endian
TC	Translation control (secondary HPT hash group search disable)

We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
contains bits relating to memory management, i.e. the Virtualized
Partition Memory (VPM) bits and the bits relating to guest real mode.
When this default value is updated, the update needs to be propagated
to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
that.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix whitespace]
Signed-off-by: NAlexander Graf <agraf@suse.de>

a0144e2a

KVM: PPC: Book3S HV: Avoid unbalanced increments of VPA yield count · 8c2dbb79

由 Paul Mackerras 提交于 9月 06, 2013

The yield count in the VPA is supposed to be incremented every time
we enter the guest, and every time we exit the guest, so that its
value is even when the vcpu is running in the guest and odd when it
isn't. However, it's currently possible that we increment the yield
count on the way into the guest but then find that other CPU threads
are already exiting the guest, so we go back to nap mode via the
secondary_too_late label. In this situation we don't increment the
yield count again, breaking the relationship between the LSB of the
count and whether the vcpu is in the guest.

To fix this, we move the increment of the yield count to a point
after we have checked whether other CPU threads are exiting.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8c2dbb79

KVM: PPC: Book3S HV: Pull out interrupt-reading code into a subroutine · c934243c

由 Paul Mackerras 提交于 9月 06, 2013

This moves the code in book3s_hv_rmhandlers.S that reads any pending
interrupt from the XICS interrupt controller, and works out whether
it is an IPI for the guest, an IPI for the host, or a device interrupt,
into a new function called kvmppc_read_intr.  Later patches will
need this.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c934243c

KVM: PPC: Book3S HV: Restructure kvmppc_hv_entry to be a subroutine · 218309b7

由 Paul Mackerras 提交于 9月 06, 2013

We have two paths into and out of the low-level guest entry and exit
code: from a vcpu task via kvmppc_hv_entry_trampoline, and from the
system reset vector for an offline secondary thread on POWER7 via
kvm_start_guest. Currently both just branch to kvmppc_hv_entry to
enter the guest, and on guest exit, we test the vcpu physical thread
ID to detect which way we came in and thus whether we should return
to the vcpu task or go back to nap mode.

In order to make the code flow clearer, and to keep the code relating
to each flow together, this turns kvmppc_hv_entry into a subroutine
that follows the normal conventions for call and return. This means
that kvmppc_hv_entry_trampoline() and kvmppc_hv_entry() now establish
normal stack frames, and we use the normal stack slots for saving
return addresses rather than local_paca->kvm_hstate.vmhandler. Apart
from that this is mostly moving code around unchanged.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

218309b7

KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc

由 Paul Mackerras 提交于 9月 06, 2013

This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  The kernel rounds up the value given
to a multiple of 2^24.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu->arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu->arch.dec_expires is a
host timebase value.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

93b0f4dc

KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789

由 Paul Mackerras 提交于 9月 06, 2013

Currently we are not saving and restoring the SIAR and SDAR registers in
the PMU (performance monitor unit) on guest entry and exit.  The result
is that performance monitoring tools in the guest could get false
information about where a program was executing and what data it was
accessing at the time of a performance monitor interrupt.  This fixes
it by saving and restoring these registers along with the other PMU
registers on guest entry/exit.

This also provides a way for userspace to access these values for a
vcpu via the one_reg interface.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

14941789

10 10月, 2013 1 次提交

KVM: PPC: Book3S HV: Fix typo in saving DSCR · cfc86025

由 Paul Mackerras 提交于 9月 21, 2013

This fixes a typo in the code that saves the guest DSCR (Data Stream
Control Register) into the kvm_vcpu_arch struct on guest exit.  The
effect of the typo was that the DSCR value was saved in the wrong place,
so changes to the DSCR by the guest didn't persist across guest exit
and entry, and some host kernel memory got corrupted.

Cc: stable@vger.kernel.org [v3.1+]
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cfc86025

14 8月, 2013 2 次提交

powerpc: Make rwlocks endian safe · 54bb7f4b

由 Anton Blanchard 提交于 8月 07, 2013

Our ppc64 spinlocks and rwlocks use a trick where a lock token and
the paca index are placed in the lock with a single store. Since we
are using two u16s they need adjusting for little endian.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

54bb7f4b

powerpc: Fix little endian lppaca, slb_shadow and dtl_entry · 7ffcf8ec

由 Anton Blanchard 提交于 8月 07, 2013

The lppaca, slb_shadow and dtl_entry hypervisor structures are
big endian, so we have to byte swap them in little endian builds.

LE KVM hosts will also need to be fixed but for now add an #error
to remind us.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7ffcf8ec

10 7月, 2013 1 次提交

KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers · 4baa1d87

由 Paul Mackerras 提交于 7月 08, 2013

The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S
can contain negative values, if some of the handlers end up before the
table in the vmlinux binary. Thus we need to use a sign-extending load
to read the values in the table rather than a zero-extending load.
Without this, the host crashes when the guest does one of the hcalls
with negative offsets, due to jumping to a bogus address.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4baa1d87

27 4月, 2013 4 次提交

KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts · 4619ac88

由 Paul Mackerras 提交于 4月 17, 2013

This streamlines our handling of external interrupts that come in
while we're in the guest. First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too. Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4619ac88

KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation · e7d26f28

由 Benjamin Herrenschmidt 提交于 4月 17, 2013

This adds an implementation of the XICS hypercalls in real mode for HV
KVM, which allows us to avoid exiting the guest MMU context on all
threads for a variety of operations such as fetching a pending
interrupt, EOI of messages, IPIs, etc.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e7d26f28

KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM · 54695c30

由 Benjamin Herrenschmidt 提交于 4月 17, 2013

Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.

This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.

We then use this new facility in the in-kernel XICS code.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

54695c30

KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map · c35635ef

由 Paul Mackerras 提交于 4月 18, 2013

At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
done by the host to the virtual processor areas (VPAs) and dispatch
trace logs (DTLs) registered by the guest. This is because those
modifications are done either in real mode or in the host kernel
context, and in neither case does the access go through the guest's
HPT, and thus no change (C) bit gets set in the guest's HPT.

However, the changes done by the host do need to be tracked so that
the modified pages get transferred when doing live migration. In
order to track these modifications, this adds a dirty flag to the
struct representing the VPA/DTL areas, and arranges to set the flag
when the VPA/DTL gets modified by the host. Then, when we are
collecting the dirty log, we also check the dirty flags for the
VPA and DTL for each vcpu and set the relevant bit in the dirty log
if necessary. Doing this also means we now need to keep track of
the guest physical address of the VPA/DTL areas.

So as not to lose track of modifications to a VPA/DTL area when it gets
unregistered, or when a new area gets registered in its place, we need
to transfer the dirty state to the rmap chain. This adds code to
kvmppc_unpin_guest_page() to do that if the area was dirty. To simplify
that code, we now require that all VPA, DTL and SLB shadow buffer areas
fit within a single host page. Guests already comply with this
requirement because pHyp requires that these areas not cross a 4k
boundary.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c35635ef

15 2月, 2013 1 次提交

powerpc/kvm/book3s_hv: Preserve guest CFAR register value · 0acb9111

由 Paul Mackerras 提交于 2月 04, 2013

The CFAR (Come-From Address Register) is a useful debugging aid that
exists on POWER7 processors.  Currently HV KVM doesn't save or restore
the CFAR register for guest vcpus, making the CFAR of limited use in
guests.

This adds the necessary code to capture the CFAR value saved in the
early exception entry code (it has to be saved before any branch is
executed), save it in the vcpu.arch struct, and restore it on entry
to the guest.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0acb9111

06 12月, 2012 2 次提交

KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking · b4072df4

由 Paul Mackerras 提交于 11月 23, 2012

Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.

To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.

If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.

Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: NAlexander Graf <agraf@suse.de>

b4072df4

KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations · 1b400ba0

由 Paul Mackerras 提交于 11月 21, 2012

When we change or remove a HPT (hashed page table) entry, we can do
either a global TLB invalidation (tlbie) that works across the whole
machine, or a local invalidation (tlbiel) that only affects this core.
Currently we do local invalidations if the VM has only one vcpu or if
the guest requests it with the H_LOCAL flag, though the guest Linux
kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
possibility that vcpus moving around to different physical cores might
expose stale TLB entries, there is some code in kvmppc_hv_entry to
flush the whole TLB of entries for this VM if either this vcpu is now
running on a different physical core from where it last ran, or if this
physical core last ran a different vcpu.

There are a number of problems on POWER7 with this as it stands:

- The TLB invalidation is done per thread, whereas it only needs to be
  done per core, since the TLB is shared between the threads.
- With the possibility of the host paging out guest pages, the use of
  H_LOCAL by an SMP guest is dangerous since the guest could possibly
  retain and use a stale TLB entry pointing to a page that had been
  removed from the guest.
- The TLB invalidations that we do when a vcpu moves from one physical
  core to another are unnecessary in the case of an SMP guest that isn't
  using H_LOCAL.
- The optimization of using local invalidations rather than global should
  apply to guests with one virtual core, not just one vcpu.

(None of this applies on PPC970, since there we always have to
invalidate the whole TLB when entering and leaving the guest, and we
can't support paging out guest memory.)

To fix these problems and simplify the code, we now maintain a simple
cpumask of which cpus need to flush the TLB on entry to the guest.
(This is indexed by cpu, though we only ever use the bits for thread
0 of each core.)  Whenever we do a local TLB invalidation, we set the
bits for every cpu except the bit for thread 0 of the core that we're
currently running on.  Whenever we enter a guest, we test and clear the
bit for our core, and flush the TLB if it was set.

On initial startup of the VM, and when resetting the HPT, we set all the
bits in the need_tlb_flush cpumask, since any core could potentially have
stale TLB entries from the previous VM to use the same LPID, or the
previous contents of the HPT.

Then, we maintain a count of the number of online virtual cores, and use
that when deciding whether to use a local invalidation rather than the
number of online vcpus.  The code to make that decision is extracted out
into a new function, global_invalidates().  For multi-core guests on
POWER7 (i.e. when we are using mmu notifiers), we now never do local
invalidations regardless of the H_LOCAL flag.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1b400ba0

30 10月, 2012 1 次提交

KVM: PPC: Book3S HV: Fix some races in starting secondary threads · 7b444c67

由 Paul Mackerras 提交于 10月 15, 2012

Subsequent patches implementing in-kernel XICS emulation will make it
possible for IPIs to arrive at secondary threads at arbitrary times.
This fixes some races in how we start the secondary threads, which
if not fixed could lead to occasional crashes of the host kernel.

This makes sure that (a) we have grabbed all the secondary threads,
and verified that they are no longer in the kernel, before we start
any thread, (b) that the secondary thread loads its vcpu pointer
after clearing the IPI that woke it up (so we don't miss a wakeup),
and (c) that the secondary thread clears its vcpu pointer before
incrementing the nap count. It also removes unnecessary setting
of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7b444c67

07 9月, 2012 1 次提交

powerpc: Restore VDSO information on critical exception om BookE · 0127262c

由 Mihai Caraman 提交于 9月 06, 2012

Critical exception on 64-bit booke uses user-visible SPRG3 as scratch.
Restore VDSO information in SPRG3 on exception prolog.

Use a common sprg3 field in PACA for all powerpc64 architectures.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0127262c

16 8月, 2012 1 次提交

KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code · 04f995a5

由 Paul Mackerras 提交于 8月 06, 2012

In handling the H_CEDE hypercall, if this vcpu has already been
prodded (with the H_PROD hypercall, which Linux guests don't in fact
use), we branch to a numeric label '1f'.  Unfortunately there is
another '1:' label before the one that we want to jump to.  This fixes
the problem by using a textual label, 'kvm_cede_prodded'.  It also
changes the label for another longish branch from '2:' to
'kvm_cede_exit' to avoid a possible future problem if code modifications
add another numeric '2:' label in between.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

04f995a5

11 7月, 2012 1 次提交

powerpc: Add VDSO version of getcpu · 18ad51dd

由 Anton Blanchard 提交于 7月 04, 2012

We have a request for a fast method of getting CPU and NUMA node IDs
from userspace. This patch implements a getcpu VDSO function,
similar to x86.

Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be
modified by a KVM guest, so we save the SPRG3 value in the paca and
restore it when transitioning from the guest to the host.

I have a glibc patch that implements sched_getcpu on top of this.
Testing on a POWER7:

baseline: 538 cycles
vdso:      30 cycles
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

18ad51dd

10 7月, 2012 2 次提交

powerpc: Merge VCPU_GPR · d72be892

由 Michael Neuling 提交于 6月 25, 2012

Merge the defines of VCPU_GPR from different places.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d72be892

powerpc: Fix usage of register macros getting ready for %r0 change · c75df6f9

由 Michael Neuling 提交于 6月 25, 2012

Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.

Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
	std	r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c75df6f9

02 7月, 2012 1 次提交

powerpc/kvm: sldi should be sld · 2f584a14

由 Michael Neuling 提交于 6月 25, 2012

Since we are taking a registers, this should never have been an sldi.
Talking to paulus offline, this is the correct fix.

Was introduced by:
 commit 19ccb76a
 Author: Paul Mackerras <paulus@samba.org>
 Date:   Sat Jul 23 17:42:46 2011 +1000

Talking to paulus, this shouldn't be a literal.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
CC: <stable@kernel.org> [v3.2+]
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2f584a14