提交 · db50f280cf5f714e64ff2b134aae138908f07502 · openeuler / qemu

17 10月, 2017 1 次提交

spapr: Correct RAM size calculation for HPT resizing · db50f280

由 David Gibson 提交于 10月 11, 2017

In order to prevent the guest from forcing the allocation of large amounts
of qemu memory (or host kernel memory, in the case of KVM HV), we limit
the size of Hashed Page Table (HPT) it is allowed to allocated, based on
its RAM size.

However, the current calculation is not correct: it only adds up the size
of plugged memory, ignoring the base memory size.  This patch corrects it.

While we're there, use get_plugged_memory_size() instead of directly
calling pc_existing_dimms_capacity().  The only difference is that it
will abort on failure, which is right: a failure here indicates something
wrong within qemu.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

db50f280

27 9月, 2017 1 次提交

spapr: fix the value of SDR1 in kvmppc_put_books_sregs() · 1ec26c75

由 Greg Kurz 提交于 9月 25, 2017

When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.

Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.

This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.

It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.
Suggested-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

1ec26c75

15 9月, 2017 2 次提交

spapr: fix CAS-generated reset · 30bf9ed1

由 Cédric Le Goater 提交于 9月 08, 2017

The OV5_MMU_RADIX_300 requires special handling in the CAS negotiation
process. It is cleared from the option vector of the guest before
evaluating the changes and re-added later. But, when testing for a
possible CAS reset :

    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
                                        ov5_cas_old, spapr->ov5_cas);

the bit OV5_MMU_RADIX_300 will each time be seen as removed from the
previous OV5 set, hence generating a reset loop.

Fix this problem by also clearing the same bit in the ov5_cas_old set.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

30bf9ed1

spapr: only update SDR1 once per-cpu during CAS · 4c563d9d

由 Greg Kurz 提交于 9月 04, 2017

Commit b55d295e added the possibility to support HPT resizing with KVM.
In the case of PR, we need to pass the userspace address of the HPT to KVM
using the SDR1 slot.
This is handled by kvmppc_update_sdr1() which uses CPU_FOREACH() to update
all CPUs. It is hence not needed to call kvmppc_update_sdr1() for each CPU.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

4c563d9d

08 9月, 2017 3 次提交

spapr: fallback to raw mode if best compat mode cannot be set during CAS · cc7b35b1

由 Greg Kurz 提交于 8月 17, 2017

KVM PR doesn't allow to set a compat mode. This causes ppc_set_compat_all()
to fail and we return H_HARDWARE to the guest right away.

This is excessive: even if we favor compat mode since commit 152ef803,
we should at least fallback to raw mode if the guest supports it.

This patch modifies cas_check_pvr() so that it also reports that the real
PVR was found in the table supplied by the guest. Note that this is only
makes sense if raw mode isn't explicitely disabled (ie, the user didn't
set the machine "max-cpu-compat" property). If this is the case, we can
simply ignore ppc_set_compat_all() failures, and let the guest run in raw
mode.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

cc7b35b1

ppc: spapr: Make VCPU ID handling private to SPAPR · 2e886fb3

由 Sam Bobroff 提交于 8月 09, 2017

The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

2e886fb3

ppc: spapr: Rename cpu_dt_id to vcpu_id · 81210c20

由 Sam Bobroff 提交于 8月 03, 2017

This field actually records the VCPU ID used by KVM and, although the
value is also used in the device tree it is primarily the VCPU ID so
rename it as such.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Updated comment missed in cpu.h]
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

81210c20

09 8月, 2017 1 次提交

spapr: Fix bug in h_signal_sys_reset() · f57467e3

由 Sam Bobroff 提交于 8月 03, 2017

The unicast case in h_signal_sys_reset() seems to be broken:
rather than selecting the target CPU, it looks like it will pick
either the first CPU or fail to find one at all.

Fix it by using the search function rather than open coding the
search.

This was found by inspection; the code appears to be unused because
the Linux kernel only uses the broadcast target.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f57467e3

17 7月, 2017 4 次提交

pseries: Allow HPT resizing with KVM · b55d295e

由 David Gibson 提交于 7月 12, 2017

So far, qemu implements the PAPR Hash Page Table (HPT) resizing extension
with TCG. The same implementation will work with KVM PR, but we don't
currently allow that. For KVM HV we can only implement resizing with the
assistance of the host kernel, which needs a new capability and ioctl()s.

This patch adds support for testing the new KVM capability and implementing
the resize in terms of KVM facilities when necessary. If we're running on
a kernel which doesn't have the new capability flag at all, we fall back to
testing for PR vs. HV KVM using the same hack that we already use in a
number of places for older kernels.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

b55d295e

pseries: Use smaller default hash page tables when guest can resize · 2772cf6b

由 David Gibson 提交于 7月 12, 2017

We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.

This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.

When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.

If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).

For now we make that reallocation assuming the guest has not yet used the
HPT at all. That's true in practice, but not, strictly, an architectural
or PAPR requirement. If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>

2772cf6b

pseries: Implement HPT resizing · 0b0b8310

由 David Gibson 提交于 5月 12, 2017

This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.

The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.

The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.

The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).

For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).

In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

0b0b8310

pseries: Stubs for HPT resizing · 30f4b05b

由 David Gibson 提交于 5月 12, 2017

This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table.  It
also adds a new machine property for controlling whether this new
facility is available.

For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.

Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls.  This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

30f4b05b

30 6月, 2017 1 次提交

pseries: Move CPU compatibility property to machine · 7843c0d6

由 David Gibson 提交于 6月 11, 2017

Server class POWER CPUs have a "compat" property, which is used to set the
backwards compatibility mode for the processor.  However, this only makes
sense for machine types which don't give the guest access to hypervisor
privilege - otherwise the compatibility level is under the guest's control.

To reflect this, this removes the CPU 'compat' property and instead
creates a 'max-cpu-compat' property on the pseries machine.  Strictly
speaking this breaks compatibility, but AFAIK the 'compat' option was
never (directly) used with -device or device_add.

The option was used with -cpu.  So, to maintain compatibility, this
patch adds a hack to the cpu option parsing to strip out any compat
options supplied with -cpu and set them on the machine property
instead of the now deprecated cpu property.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Tested-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Tested-by: NGreg Kurz <groug@kaod.org>
Tested-by: NAndrea Bolognani <abologna@redhat.com>

7843c0d6

06 6月, 2017 1 次提交

target/ppc: Fixup set_spr error in h_register_process_table · 60694bc6

由 Suraj Jitindar Singh 提交于 6月 05, 2017

set_spr is used in the function h_register_process_table() to update the
LPCR_GTSE and LPCR_UPRT values based on the flags passed by the guest.
The set_spr function takes the last two arguments mask and value used to
mask and set the value of the spr respectively.

The current call site passes these arguments in the wrong order and thus
bot GTSE and UPRT will be set irrespective, which is obviously
incorrect.

Rearrange the function call so that these arguments are passed in the
correct order and the correct behaviour is exhibited.

It is worth noting that this wasn't detected earlier since these were
always both set in all cases where this H_CALL was made.

Fixes: 6de83307 ("target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE")
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

60694bc6

24 5月, 2017 2 次提交

pseries: Split CAS PVR negotiation out into a separate function · 80c33d34

由 David Gibson 提交于 5月 18, 2017

Guests of the qemu machine type go through a feature negotiation process
known as "client architecture support" (CAS) during early boot.  This does
a number of things, one of which is finding a CPU compatibility mode which
can be supported by both guest and host.

In fact the CPU negotiation is probably the single most complex part of the
CAS process, so this splits it out into a helper function.  We've recently
made some mistakes in maintaining backward compatibility for old machine
types here.  Splitting this out will also make it easier to fix this.

This also adds a possibly useful error message if the negotiation fails
(i.e. if there isn't a CPU mode that's suitable for both guest and host).
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>

80c33d34

spapr: Consolidate HPT freeing code into a routine · 06ec79e8

由 Bharata B Rao 提交于 5月 17, 2017

Consolidate the code that frees HPT into a separate routine
spapr_free_hpt() as the same chunk of code is called from two places.
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

06ec79e8

23 5月, 2017 1 次提交

shutdown: Add source information to SHUTDOWN and RESET · cf83f140

由 Eric Blake 提交于 5月 15, 2017

Time to wire up all the call sites that request a shutdown or
reset to use the enum added in the previous patch.

It would have been less churn to keep the common case with no
arguments as meaning guest-triggered, and only modified the
host-triggered code paths, via a wrapper function, but then we'd
still have to audit that I didn't miss any host-triggered spots;
changing the signature forces us to double-check that I correctly
categorized all callers.

Since command line options can change whether a guest reset request
causes an actual reset vs. a shutdown, it's easy to also add the
information to reset requests.
Signed-off-by: NEric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au> [ppc parts]
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [SPARC part]
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x parts]
Message-Id: <20170515214114.15442-5-eblake@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>

cf83f140

11 5月, 2017 1 次提交

target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE · 6de83307

由 Suraj Jitindar Singh 提交于 5月 02, 2017

The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE
to choose determine how address translation is performed. Currently these
bits in the LPCR are only set for the cpu which handles the H_CALL, however
they need to be set for all cpus for that guest as address translation
cannot be performed differently on a per cpu basis.

Update the H_CALL handler to set these bits in the LPCR correctly for all
cpus of the guest.

Note it is the reponsibility of the guest to ensure that any secondary cpus
are suspended when the H_CALL is made and thus we can safely update these
values here.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

6de83307

26 4月, 2017 4 次提交

spapr: Workaround for broken radix guests · e957f6a9

由 Sam Bobroff 提交于 3月 20, 2017

For a little while around 4.9, Linux kernels that saw the radix bit in
ibm,pa-features would attempt to set up the MMU as if they were a
hypervisor, even if they were a guest, which would cause them to
crash.

Work around this by detecting pre-ISA 3.0 guests by their lack of that
bit in option vector 1, and then removing the radix bit from
ibm,pa-features. Note: This now requires regeneration of that node
after CAS negotiation.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Fix style nits]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

e957f6a9

spapr: Enable ISA 3.0 MMU mode selection via CAS · 9fb4541f

由 Sam Bobroff 提交于 3月 23, 2017

Add the new node, /chosen/ibm,arch-vec-5-platform-support to the
device tree. This allows the guest to determine which modes are
supported by the hypervisor.

Update the option vector processing in h_client_architecture_support()
to handle the new MMU bits. This allows guests to request hash or
radix mode and QEMU to create the guest's HPT at this time if it is
necessary but hasn't yet been done.  QEMU will terminate the guest if
it requests an unavailable mode, as required by the architecture.

Extend the ibm,pa-features node with the new ISA 3.0 values
and set the radix bit if KVM supports radix mode. This probably won't
be used directly by guests to determine the availability of radix mode
(that is indicated by the new node added above) but the architecture
requires that it be set when the hardware supports it.

If QEMU is using KVM, and KVM is capable of running in radix mode,
guests can be run in real-mode without allocating a HPT (because KVM
will use a minimal RPT). So in this case, we avoid creating the HPT
at reset time and later (during CAS) create it if it is necessary.

ISA 3.0 guests will now begin to call h_register_process_table(),
which has been added previously.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Strip some unneeded prefix from error messages]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

9fb4541f

target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL · b4db5413

由 Suraj Jitindar Singh 提交于 3月 20, 2017

The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the
hypervisor where in memory its process table is and how translation should
be performed using this process table.

Provide the implementation of this H_CALL for a guest.

We first check for invalid flags, then parse the flags to determine the
operation, and then check the other parameters for valid values based on
the operation (register new table/deregister table/maintain registration).
The process table is then stored in the appropriate location and registered
with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits
are updated as required.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Correct missing prototype and uninitialized variable]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

b4db5413

target/ppc: Add new H-CALL shells for in memory table translation · d77a98b0

由 Suraj Jitindar Singh 提交于 3月 20, 2017

The use of the new in memory tables introduced in ISAv3.00 for translation,
also referred to as process tables, requires the introduction of 3 new
H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID.

Add shells for each of these and register them as the hypercall handlers.
Currently they all log an unimplemented hypercall and return H_FUNCTION.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Fix style nits]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

d77a98b0

01 3月, 2017 4 次提交

target/ppc: Manage external HPT via virtual hypervisor · e57ca75c

由 David Gibson 提交于 2月 23, 2017

The pseries machine type implements the behaviour of a PAPR compliant
hypervisor, without actually executing such a hypervisor on the virtual
CPU. To do this we need some hooks in the CPU code to make hypervisor
facilities get redirected to the machine instead of emulated internally.

For hypercalls this is managed through the cpu->vhyp field, which points
to a QOM interface with a method implementing the hypercall.

For the hashed page table (HPT) - also a hypervisor resource - we use an
older hack. CPUPPCState has an 'external_htab' field which when non-NULL
indicates that the HPT is stored in qemu memory, rather than within the
guest's address space.

For consistency - and to make some future extensions easier - this merges
the external HPT mechanism into the vhyp mechanism. Methods are added
to vhyp for the basic operations the core hash MMU code needs: map_hptes()
and unmap_hptes() for reading the HPT, store_hpte() for updating it and
hpt_mask() to retrieve its size.

To match this, the pseries machine now sets these vhyp fields in its
existing vhyp class, rather than reaching into the cpu object to set the
external_htab field.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>

e57ca75c

target/ppc: Eliminate htab_base and htab_mask variables · 36778660

由 David Gibson 提交于 2月 24, 2017

CPUPPCState includes fields htab_base and htab_mask which store the base
address (GPA) and size (as a mask) of the guest's hashed page table (HPT).
These are set when the SDR1 register is updated.

Keeping these in sync with the SDR1 is actually a little bit fiddly, and
probably not useful for performance, since keeping them expands the size of
CPUPPCState.  It also makes some upcoming changes harder to implement.

This patch removes these fields, in favour of calculating them directly
from the SDR1 contents when necessary.

This does make a change to the behaviour of attempting to write a bad value
(invalid HPT size) to the SDR1 with an mtspr instruction.  Previously, the
bad value would be stored in SDR1 and could be retrieved with a later
mfspr, but the HPT size as used by the softmmu would be, clamped to the
allowed values.  Now, writing a bad value is treated as a no-op.  An error
message is printed in both new and old versions.

I'm not sure which behaviour, if either, matches real hardware.  I don't
think it matters that much, since it's pretty clear that if an OS writes
a bad value to SDR1, it's not going to boot.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

36778660

target/ppc: Cleanup HPTE accessors for 64-bit hash MMU · 7222b94a

由 David Gibson 提交于 2月 27, 2017

Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
   1) Within guest memory - when we're emulating a full guest CPU at the
      hardware level (e.g. powernv, mac99, g3beige)
   2) Within qemu, but outside guest memory - when we're emulating user and
      supervisor instructions within TCG, but instead of emulating
      the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
      (pseries in TCG or KVM-PR)
   3) Within the host kernel - a pseries machine using KVM-HV
      acceleration.  Mostly accesses to the HPT are handled by KVM,
      but there are a few cases where qemu needs to access it via a
      special fd for the purpose.

In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG).  For cases (1) & (2) it just returns an address value.  The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.

This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious.  Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers.  In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.

While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

7222b94a

pseries: Minor cleanups to HPT management hypercalls · c6404ade

由 David Gibson 提交于 2月 21, 2017

 * Standardize on 'ptex' instead of 'pte_index' for HPTE index variables
   for consistency and brevity
 * Avoid variables named 'index'; shadowing index(3) from libc can lead to
   surprising bugs if the variable is removed, because compiler errors
   might not appear for remaining references
 * Clarify index calculations in h_enter() - we have two cases, H_EXACT
   where the exact HPTE slot is given, and !H_EXACT where we search for
   an empty slot within the hash bucket.  Make the calculation more
   consistent between the cases.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>

c6404ade

31 1月, 2017 5 次提交

ppc: Add ppc_set_compat_all() · f6f242c7

由 David Gibson 提交于 11月 10, 2016

Once a compatiblity mode is negotiated with the guest,
h_client_architecture_support() uses run_on_cpu() to update each CPU to
the new mode. We're going to want this logic somewhere else shortly,
so make a helper function to do this global update.

We put it in target-ppc/compat.c - it makes as much sense at the CPU level
as it does at the machine level. We also move the cpu_synchronize_state()
into ppc_set_compat(), since it doesn't really make any sense to call that
without synchronizing state.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f6f242c7

pseries: Rewrite CAS PVR compatibility logic · 152ef803

由 David Gibson 提交于 11月 16, 2016

During boot, PAPR guests negotiate CPU model support with the
ibm,client-architecture-support mechanism.  The logic to implement this in
qemu is very convoluted.  This cleans it up to be cleaner, using the new
ppc_check_compat() call.

The new logic for choosing a compatibility mode is:
    1. Usually, use the most recent compatibility mode that is
            a) supported by the guest
            b) supported by the CPU
        and c) no later than the maximum allowed (if specified)
    2. If no suitable compatibility mode was found, the guest *does*
       support this CPU explicitly, and no maximum compatibility mode is
       specified, then use "raw" mode for the current CPU
    3. Otherwise, fail the boot.

This differs from the results of the old code: the old code preferred using
"raw" mode to a compatibility mode, whereas the new code prefers a
compatibility mode if available.  Using compatibility mode preferentially
means that we're more likely to be able to migrate the guest to a similar
but not identical host.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

152ef803

ppc/spapr: implement H_SIGNAL_SYS_RESET · 1c7ad77e

由 Nicholas Piggin 提交于 12月 05, 2016

The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset
exception on CPUs within the same guest -- all CPUs, all-but-self, or a
specific CPU (including self).

This has not made its way to a PAPR release yet, but we have an hcall
number assigned.

  H_SIGNAL_SYS_RESET = 0x380

  Syntax:
    hcall(uint64 H_SIGNAL_SYS_RESET, int64 target);

  Generate a system reset NMI on the threads indicated by target.

  Values for target:
    -1 = target all online threads including the caller
    -2 = target all online threads except for the caller
    All other negative values: reserved
    Positive values: The thread to be targeted, obtained from the value
    of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF
    device tree.

  Semantics:
    - Invalid target: return H_Parameter.
    - Otherwise: Generate a system reset NMI on target thread(s),
      return H_Success.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

1c7ad77e

ppc: Rename cpu_version to compat_pvr · d6e166c0

由 David Gibson 提交于 10月 28, 2016

The 'cpu_version' field in PowerPCCPU is badly named.  It's named after the
'cpu-version' device tree property where it is advertised, but that meaning
may not be obvious in most places it appears.

Worse, it doesn't even really correspond to that device tree property.  The
property contains either the processor's PVR, or, if the CPU is running in
a compatibility mode, a special "logical PVR" representing which mode.

Rename the cpu_version field, and a number of related variables to
compat_pvr to make this clearer.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NThomas Huth <thuth@redhat.com>

d6e166c0

pseries: Make cpu_update during CAS unconditional · 5b120785

由 David Gibson 提交于 10月 29, 2016

spapr_h_cas_compose_response() includes a cpu_update parameter which
controls whether it includes updated information on the CPUs in the device
tree fragment returned from the ibm,client-architecture-support (CAS) call.

Providing the updated information is essential when CAS has negotiated
compatibility options which require different cpu information to be
presented to the guest. However, it should be safe to provide in other
cases (it will just override the existing data in the device tree with
identical data). This simplifies the code by removing the parameter and
always providing the cpu update information.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

5b120785

20 1月, 2017 1 次提交

kvm: move cpu synchronization code · b3946626

由 Vincent Palatin 提交于 1月 10, 2017

Move the generic cpu_synchronize_ functions to the common hw_accel.h header,
in order to prepare for the addition of a second hardware accelerator.
Signed-off-by: NStefan Weil <sw@weilnetz.de>
Signed-off-by: NVincent Palatin <vpalatin@chromium.org>
Message-Id: <f5c3cffe8d520011df1c2e5437bb814989b48332.1484045952.git.vpalatin@chromium.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b3946626

31 10月, 2016 1 次提交

*_run_on_cpu: introduce run_on_cpu_data type · 14e6fe12

由 Paolo Bonzini 提交于 10月 31, 2016

This changes the *_run_on_cpu APIs (and helpers) to pass data in a
run_on_cpu_data type instead of a plain void *. This is because we
sometimes want to pass a target address (target_ulong) and this fails on
32 bit hosts emulating 64 bit guests.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14e6fe12

28 10月, 2016 2 次提交

spapr: add option vector handling in CAS-generated resets · 6787d27b

由 Michael Roth 提交于 10月 24, 2016

In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

6787d27b

spapr_hcall: use spapr_ovec_* interfaces for CAS options · facdb8b6

由 Michael Roth 提交于 10月 24, 2016

Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.

As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.

Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

facdb8b6

27 9月, 2016 1 次提交

cpus: pass CPUState to run_on_cpu helpers · e0eeb4a2

由 Alex Bennée 提交于 8月 02, 2016

CPUState is a fairly common pointer to pass to these helpers. This means
if you need other arguments for the async_run_on_cpu case you end up
having to do a g_malloc to stuff additional data into the routine. For
the current users this isn't a massive deal but for MTTCG this gets
cumbersome when the only other parameter is often an address.

This adds the typedef run_on_cpu_func for helper functions which has an
explicit CPUState * passed as the first parameter. All the users of
run_on_cpu and async_run_on_cpu have had their helpers updated to use
CPUState where available.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov:
 - eliminate more CPUState in user data;
 - remove unnecessary user data passing;
 - fix target-s390x/kvm.c and target-s390x/misc_helper.c]
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts)
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0eeb4a2

23 9月, 2016 2 次提交

target-ppc: tlbie/tlbivax should have global effect · d76ab5e1

由 Nikunj A Dadhania 提交于 9月 20, 2016

tlbie (BookS) and tlbivax (BookE) plus the H_CALLs(pseries) should have
a global effect.

Introduces TLB_NEED_GLOBAL_FLUSH flag. During lazy tlb flush, after
taking care of pending local flushes, check broadcast flush(at context
synchronizing event ptesync/tlbsync, etc) is needed. Depending on the
bitmask state of the tlb_need_flush, tlb is flushed from other cpus if
needed and the flags are cleared.
Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[dwg: Use 'true' instead of '1' for call to check_tlb_flush()]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

d76ab5e1

target-ppc: add flag in check_tlb_flush() · e3cffe6f

由 Nikunj A Dadhania 提交于 9月 20, 2016

We flush the qemu TLB lazily. check_tlb_flush is called whenever we hit
a context synchronizing event or instruction that requires a pending
flush to be performed.

However, we fail to handle broadcast TLB flush operations. In order to
fix that efficiently, we want to differentiate whether check_tlb_flush()
needs to only apply pending local flushes (isync instructions,
interrupts, ...) or also global pending flush operations. The latter is
only needed when executing instructions that are defined architecturally
as synchronizing global TLB flush operations. This in our case is
ptesync on BookS and tlbsync on BookE along with the paravirtualized
hypervisor calls.
Signed-off-by: NNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[dwg: Changed gen_check_tlb_flush() to also take a bool, and fixed
 some spelling errors in commit message]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

e3cffe6f

05 7月, 2016 1 次提交

ppc: simplify ppc_hash64_hpte_page_shift_noslb() · 1f0252e6

由 Cédric Le Goater 提交于 7月 01, 2016

The segment page shift parameter is never used. Let's remove it.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

1f0252e6

22 6月, 2016 1 次提交

powerpc/mm: Update the WIMG check during H_ENTER · c1175907

由 Aneesh Kumar K.V 提交于 6月 17, 2016

Support for 0 value for memeory coherence is optional and with ppc64
we can always enable memory coherence. Linux kernel did that during
the development of 4.7 kernel. But that resulted in failure in Qemu
in H_ENTER hcall due to below check. The mentioned change was reverted
in the kernel and kernel right now enable memory coherence only if
cache inhibited is not set. Nevertheless update qemu WIMG flag check
to cover the case where we enable memory coherence along with cache
inhibited flag.

In order to handle older and newer kernel version consider both Cache
inhibitted and (cache inhibitted | memory conference) as valid values
for wimg flags.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

c1175907