提交 · 388cc6e133132e6c9b64e7d5361114a3a7d57663 · openanolis / cloud-kernel

17 10月, 2013 5 次提交

KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1

由 Paul Mackerras 提交于 9月 21, 2013

This enables us to use the Processor Compatibility Register (PCR) on
POWER7 to put the processor into architecture 2.05 compatibility mode
when running a guest.  In this mode the new instructions and registers
that were introduced on POWER7 are disabled in user mode.  This
includes all the VSX facilities plus several other instructions such
as ldbrx, stdbrx, popcntw, popcntd, etc.

To select this mode, we have a new register accessible through the
set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
this to zero gives the full set of capabilities of the processor.
Setting it to one of the "logical" PVR values defined in PAPR puts
the vcpu into the compatibility mode for the corresponding
architecture level.  The supported values are:

0x0f000002	Architecture 2.05 (POWER6)
0x0f000003	Architecture 2.06 (POWER7)
0x0f100003	Architecture 2.06+ (POWER7+)

Since the PCR is per-core, the architecture compatibility level and
the corresponding PCR value are stored in the struct kvmppc_vcore, and
are therefore shared between all vcpus in a virtual core.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: squash in fix to add missing break statements and documentation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

388cc6e1

KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9

由 Paul Mackerras 提交于 9月 20, 2013

POWER7 and later IBM server processors have a register called the
Program Priority Register (PPR), which controls the priority of
each hardware CPU SMT thread, and affects how fast it runs compared
to other SMT threads.  This priority can be controlled by writing to
the PPR or by use of a set of instructions of the form or rN,rN,rN
which are otherwise no-ops but have been defined to set the priority
to particular levels.

This adds code to context switch the PPR when entering and exiting
guests and to make the PPR value accessible through the SET/GET_ONE_REG
interface.  When entering the guest, we set the PPR as late as
possible, because if we are setting a low thread priority it will
make the code run slowly from that point on.  Similarly, the
first-level interrupt handlers save the PPR value in the PACA very
early on, and set the thread priority to the medium level, so that
the interrupt handling code runs at a reasonable speed.
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4b8473c9

KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a

由 Paul Mackerras 提交于 9月 20, 2013

This adds the ability to have a separate LPCR (Logical Partitioning
Control Register) value relating to a guest for each virtual core,
rather than only having a single value for the whole VM.  This
corresponds to what real POWER hardware does, where there is a LPCR
per CPU thread but most of the fields are required to have the same
value on all active threads in a core.

The per-virtual-core LPCR can be read and written using the
GET/SET_ONE_REG interface.  Userspace can can only modify the
following fields of the LPCR value:

DPFD	Default prefetch depth
ILE	Interrupt little-endian
TC	Translation control (secondary HPT hash group search disable)

We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
contains bits relating to memory management, i.e. the Virtualized
Partition Memory (VPM) bits and the bits relating to guest real mode.
When this default value is updated, the update needs to be propagated
to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
that.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix whitespace]
Signed-off-by: NAlexander Graf <agraf@suse.de>

a0144e2a

KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc

由 Paul Mackerras 提交于 9月 06, 2013

This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  The kernel rounds up the value given
to a multiple of 2^24.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu->arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu->arch.dec_expires is a
host timebase value.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

93b0f4dc

KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789

由 Paul Mackerras 提交于 9月 06, 2013

Currently we are not saving and restoring the SIAR and SDAR registers in
the PMU (performance monitor unit) on guest entry and exit.  The result
is that performance monitoring tools in the guest could get false
information about where a program was executing and what data it was
accessing at the time of a performance monitor interrupt.  This fixes
it by saving and restoring these registers along with the other PMU
registers on guest entry/exit.

This also provides a way for userspace to access these values for a
vcpu via the one_reg interface.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

14941789

14 10月, 2013 1 次提交

KVM: PPC: Get rid of KVM_HPAGE defines · 2c5350e9

由 Christoffer Dall 提交于 10月 02, 2013

Now when the main kvm code relying on these defines has been moved to
the x86 specific part of the world, we can get rid of these.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

2c5350e9

08 7月, 2013 2 次提交

powerpc/kvm: Contiguous memory allocator based RMA allocation · 6c45b810

由 Aneesh Kumar K.V 提交于 7月 02, 2013

Older version of power architecture use Real Mode Offset register and Real Mode Limit
Selector for mapping guest Real Mode Area. The guest RMA should be physically
contigous since we use the range when address translation is not enabled.

This patch switch RMA allocation code to use contigous memory allocator. The patch
also remove the the linear allocator which not used any more
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6c45b810

powerpc/kvm: Contiguous memory allocator based hash page table allocation · fa61a4e3

由 Aneesh Kumar K.V 提交于 7月 02, 2013

Powerpc architecture uses a hash based page table mechanism for mapping virtual
addresses to physical address. The architecture require this hash page table to
be physically contiguous. With KVM on Powerpc currently we use early reservation
mechanism for allocating guest hash page table. This implies that we need to
reserve a big memory region to ensure we can create large number of guest
simultaneously with KVM on Power. Another disadvantage is that the reserved memory
is not available to rest of the subsystems and and that implies we limit the total
available memory in the host.

This patch series switch the guest hash page table allocation to use
contiguous memory allocator.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

fa61a4e3

27 4月, 2013 8 次提交

KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller · bc5ad3f3

由 Benjamin Herrenschmidt 提交于 4月 17, 2013

This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.

The XICS emulation supports up to 1048560 interrupt sources.
Interrupt source numbers below 16 are reserved; 0 is used to mean no
interrupt and 2 is used for IPIs.  Internally these are represented in
blocks of 1024, called ICS (interrupt controller source) entities, but
that is not visible to userspace.

Each vcpu gets one ICP (interrupt controller presentation) entity,
used to store the per-vcpu state such as vcpu priority, pending
interrupt state, IPI request, etc.

This does not include any API or any way to connect vcpus to their
ICP state; that will be added in later patches.

This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and
Paul Mackerras.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix typo, add dependency on !KVM_MPIC]
Signed-off-by: NAlexander Graf <agraf@suse.de>

bc5ad3f3

KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls · 8e591cb7

由 Michael Ellerman 提交于 4月 17, 2013

For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself.  This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names.  Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix warning]
Signed-off-by: NAlexander Graf <agraf@suse.de>

8e591cb7

KVM: PPC: Support irq routing and irqfd for in-kernel MPIC · de9ba2f3

由 Alexander Graf 提交于 4月 16, 2013

Now that all the irq routing and irqfd pieces are generic, we can expose
real irqchip support to all of KVM's internal helpers.

This allows us to use irqfd with the in-kernel MPIC.
Signed-off-by: NAlexander Graf <agraf@suse.de>

de9ba2f3

kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC · eb1e4f43

由 Scott Wood 提交于 4月 12, 2013

Enabling this capability connects the vcpu to the designated in-kernel
MPIC.  Using explicit connections between vcpus and irqchips allows
for flexibility, but the main benefit at the moment is that it
simplifies the code -- KVM doesn't need vm-global state to remember
which MPIC object is associated with this vm, and it doesn't need to
care about ordering between irqchip creation and vcpu creation.
Signed-off-by: NScott Wood <scottwood@freescale.com>
[agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu]
Signed-off-by: NAlexander Graf <agraf@suse.de>

eb1e4f43

kvm/ppc/mpic: in-kernel MPIC emulation · 5df554ad

由 Scott Wood 提交于 4月 12, 2013

Hook the MPIC code up to the KVM interfaces, add locking, etc.
Signed-off-by: NScott Wood <scottwood@freescale.com>
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: NAlexander Graf <agraf@suse.de>

5df554ad

KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map · c35635ef

由 Paul Mackerras 提交于 4月 18, 2013

At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
done by the host to the virtual processor areas (VPAs) and dispatch
trace logs (DTLs) registered by the guest. This is because those
modifications are done either in real mode or in the host kernel
context, and in neither case does the access go through the guest's
HPT, and thus no change (C) bit gets set in the guest's HPT.

However, the changes done by the host do need to be tracked so that
the modified pages get transferred when doing live migration. In
order to track these modifications, this adds a dirty flag to the
struct representing the VPA/DTL areas, and arranges to set the flag
when the VPA/DTL gets modified by the host. Then, when we are
collecting the dirty log, we also check the dirty flags for the
VPA and DTL for each vcpu and set the relevant bit in the dirty log
if necessary. Doing this also means we now need to keep track of
the guest physical address of the VPA/DTL areas.

So as not to lose track of modifications to a VPA/DTL area when it gets
unregistered, or when a new area gets registered in its place, we need
to transfer the dirty state to the rmap chain. This adds code to
kvmppc_unpin_guest_page() to do that if the area was dirty. To simplify
that code, we now require that all VPA, DTL and SLB shadow buffer areas
fit within a single host page. Guests already comply with this
requirement because pHyp requires that these areas not cross a 4k
boundary.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c35635ef

KVM: PPC: e500: Add support for EPTCFG register · 9a6061d7

由 Mihai Caraman 提交于 4月 11, 2013

EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
in the presence of MAV 2.0. Emulate it now.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9a6061d7

KVM: PPC: e500: Add support for TLBnPS registers · 307d9008

由 Mihai Caraman 提交于 4月 11, 2013

Add support for TLBnPS registers available in MMU Architecture Version
(MAV) 2.0.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

307d9008

22 3月, 2013 1 次提交

KVM: PPC: booke: Added debug handler · 15b708be

由 Bharat Bhushan 提交于 2月 27, 2013

Installed debug handler will be used for guest debug support
and debug facility emulation features (patches for these
features will follow this patch).
Signed-off-by: NLiu Yu <yu.liu@freescale.com>
[bharat.bhushan@freescale.com: Substantial changes]
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

15b708be

15 2月, 2013 1 次提交

powerpc/kvm/book3s_hv: Preserve guest CFAR register value · 0acb9111

由 Paul Mackerras 提交于 2月 04, 2013

The CFAR (Come-From Address Register) is a useful debugging aid that
exists on POWER7 processors.  Currently HV KVM doesn't save or restore
the CFAR register for guest vcpus, making the CFAR of limited use in
guests.

This adds the necessary code to capture the CFAR value saved in the
early exception entry code (it has to be saved before any branch is
executed), save it in the vcpu.arch struct, and restore it on entry
to the guest.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0acb9111

10 1月, 2013 1 次提交

KVM: PPC: BookE: Implement EPR exit · 1c810636

由 Alexander Graf 提交于 1月 04, 2013

The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.
Signed-off-by: NAlexander Graf <agraf@suse.de>

1c810636

14 12月, 2012 2 次提交

KVM: Make KVM_PRIVATE_MEM_SLOTS optional · 0743247f

由 Alex Williamson 提交于 12月 10, 2012

Seems like everyone copied x86 and defined 4 private memory slots
that never actually get used.  Even x86 only uses 3 of the 4.  These
aren't exposed so there's no need to add padding.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0743247f

KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS · bbacc0c1

由 Alex Williamson 提交于 12月 10, 2012

It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM.  One is
the user accessible slots and the other is user + private.  Make this
more obvious.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

bbacc0c1

06 12月, 2012 3 次提交

KVM: PPC: Make EPCR a valid field for booke64 and bookehv · 62b4db00

由 Alexander Graf 提交于 12月 01, 2012

In BookE, EPCR is defined and valid when either the HV or the 64bit
category are implemented. Reflect this in the field definition.

Today the only KVM target on 64bit is HV enabled, so there is no
change in actual source code, but this keeps the code closer to the
spec and doesn't build up artificial road blocks for a PR KVM
on 64bit.
Signed-off-by: NAlexander Graf <agraf@suse.de>

62b4db00

KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations · 1b400ba0

由 Paul Mackerras 提交于 11月 21, 2012

When we change or remove a HPT (hashed page table) entry, we can do
either a global TLB invalidation (tlbie) that works across the whole
machine, or a local invalidation (tlbiel) that only affects this core.
Currently we do local invalidations if the VM has only one vcpu or if
the guest requests it with the H_LOCAL flag, though the guest Linux
kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
possibility that vcpus moving around to different physical cores might
expose stale TLB entries, there is some code in kvmppc_hv_entry to
flush the whole TLB of entries for this VM if either this vcpu is now
running on a different physical core from where it last ran, or if this
physical core last ran a different vcpu.

There are a number of problems on POWER7 with this as it stands:

- The TLB invalidation is done per thread, whereas it only needs to be
  done per core, since the TLB is shared between the threads.
- With the possibility of the host paging out guest pages, the use of
  H_LOCAL by an SMP guest is dangerous since the guest could possibly
  retain and use a stale TLB entry pointing to a page that had been
  removed from the guest.
- The TLB invalidations that we do when a vcpu moves from one physical
  core to another are unnecessary in the case of an SMP guest that isn't
  using H_LOCAL.
- The optimization of using local invalidations rather than global should
  apply to guests with one virtual core, not just one vcpu.

(None of this applies on PPC970, since there we always have to
invalidate the whole TLB when entering and leaving the guest, and we
can't support paging out guest memory.)

To fix these problems and simplify the code, we now maintain a simple
cpumask of which cpus need to flush the TLB on entry to the guest.
(This is indexed by cpu, though we only ever use the bits for thread
0 of each core.)  Whenever we do a local TLB invalidation, we set the
bits for every cpu except the bit for thread 0 of the core that we're
currently running on.  Whenever we enter a guest, we test and clear the
bit for our core, and flush the TLB if it was set.

On initial startup of the VM, and when resetting the HPT, we set all the
bits in the need_tlb_flush cpumask, since any core could potentially have
stale TLB entries from the previous VM to use the same LPID, or the
previous contents of the HPT.

Then, we maintain a count of the number of online virtual cores, and use
that when deciding whether to use a local invalidation rather than the
number of online vcpus.  The code to make that decision is extracted out
into a new function, global_invalidates().  For multi-core guests on
POWER7 (i.e. when we are using mmu notifiers), we now never do local
invalidations regardless of the H_LOCAL flag.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1b400ba0

KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be

由 Paul Mackerras 提交于 11月 19, 2012

This uses a bit in our record of the guest view of the HPTE to record
when the HPTE gets modified. We use a reserved bit for this, and ensure
that this bit is always cleared in HPTE values returned to the guest.

The recording of modified HPTEs is only done if other code indicates
its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
The reason for this is that when later commits add facilities for
userspace to read the HPT, the first pass of reading the HPT will be
quicker if there are no (or very few) HPTEs marked as modified,
rather than having most HPTEs marked as modified.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

44e5f6be

30 10月, 2012 3 次提交

KVM: PPC: Book3S HV: Fix accounting of stolen time · c7b67670

由 Paul Mackerras 提交于 10月 15, 2012

Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c7b67670

KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run · 8455d79e

由 Paul Mackerras 提交于 10月 15, 2012

Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8455d79e

KVM: PPC: Book3S HV: Fixes for late-joining threads · 2f12f034

由 Paul Mackerras 提交于 10月 15, 2012

If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest.  Currently this only happens
rarely, when a vcpu is first started.  This fixes some bugs and
omissions in the code in this case.

First, we need to check for VPA updates for the latecomer and make
a DTL entry for it.  Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core.  To handle this correctly
we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap.  A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

2f12f034

06 10月, 2012 5 次提交

KVM: PPC: Move kvm->arch.slot_phys into memslot.arch · a66b48c3

由 Paul Mackerras 提交于 9月 11, 2012

Now that we have an architecture-specific field in the kvm_memory_slot
structure, we can use it to store the array of page physical addresses
that we need for Book3S HV KVM on PPC970 processors.  This reduces the
size of struct kvm_arch for Book3S HV, and also reduces the size of
struct kvm_arch_memory_slot for other PPC KVM variants since the fields
in it are now only compiled in for Book3S HV.

This necessitates making the kvm_arch_create_memslot and
kvm_arch_free_memslot operations specific to each PPC KVM variant.
That in turn means that we now don't allocate the rmap arrays on
Book3S PR and Book E.

Since we now unpin pages and free the slot_phys array in
kvmppc_core_free_memslot, we no longer need to do it in
kvmppc_core_destroy_vm, since the generic code takes care to free
all the memslots when destroying a VM.

We now need the new memslot to be passed in to
kvmppc_core_prepare_memory_region, since we need to initialize its
arch.slot_phys member on Book3S HV.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a66b48c3

booke: Added ONE_REG interface for IAC/DAC debug registers · 6df8d3fc

由 Bharat Bhushan 提交于 8月 08, 2012

IAC/DAC are defined as 32 bit while they are 64 bit wide. So ONE_REG
interface is added to set/get them.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6df8d3fc

KVM: PPC: booke: Add watchdog emulation · f61c94bb

由 Bharat Bhushan 提交于 8月 08, 2012

This patch adds the watchdog emulation in KVM. The watchdog
emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_BOOKE_WATCHDOG) ioctl.
The kernel timer are used for watchdog emulation and emulates
h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
it is configured.
Signed-off-by: NLiu Yu <yu.liu@freescale.com>
Signed-off-by: NScott Wood <scottwood@freescale.com>
[bharat.bhushan@freescale.com: reworked patch]
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
[agraf: adjust to new request framework]
Signed-off-by: NAlexander Graf <agraf@suse.de>

f61c94bb

KVM: PPC: Book3s: PR: Add (dumb) MMU Notifier support · 9b0cb3c8

由 Alexander Graf 提交于 8月 10, 2012

Now that we have very simple MMU Notifier support for e500 in place,
also add the same simple support to book3s. It gets us one step closer
to actual fast support.
Signed-off-by: NAlexander Graf <agraf@suse.de>

9b0cb3c8

KVM: PPC: E500: Implement MMU notifiers · 862d31f7

由 Alexander Graf 提交于 7月 31, 2012

The e500 target has lived without mmu notifiers ever since it got
introduced, but fails for the user space check on them with hugetlbfs.

So in order to get that one working, implement mmu notifiers in a
reasonably dumb fashion and be happy. On embedded hardware, we almost
never end up with mmu notifier calls, since most people don't overcommit.
Signed-off-by: NAlexander Graf <agraf@suse.de>

862d31f7

16 8月, 2012 1 次提交

KVM: PPC: Add cache flush on page map · 249ba1ee

由 Alexander Graf 提交于 8月 03, 2012

When we map a page that wasn't icache cleared before, do so when first
mapping it in KVM using the same information bits as the Linux mapping
logic. That way we are 100% sure that any page we map does not have stale
entries in the icache.
Signed-off-by: NAlexander Graf <agraf@suse.de>

249ba1ee

06 8月, 2012 1 次提交

KVM: Push rmap into kvm_arch_memory_slot · d89cc617

由 Takuya Yoshikawa 提交于 8月 01, 2012

Two reasons:
 - x86 can integrate rmap and rmap_pde and remove heuristics in
   __gfn_to_rmap().
 - Some architectures do not need rmap.

Since rmap is one of the most memory consuming stuff in KVM, ppc'd
better restrict the allocation to Book3S HV.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d89cc617

19 7月, 2012 1 次提交

KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start() · b3ae2096

由 Takuya Yoshikawa 提交于 7月 02, 2012

When we tested KVM under memory pressure, with THP enabled on the host,
we noticed that MMU notifier took a long time to invalidate huge pages.

Since the invalidation was done with mmu_lock held, it not only wasted
the CPU but also made the host harder to respond.

This patch mitigates this by using kvm_handle_hva_range().
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Alexander Graf <agraf@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b3ae2096

30 5月, 2012 2 次提交

KVM: PPC: booke: Added DECAR support · 21bd000a

由 Bharat Bhushan 提交于 5月 20, 2012

Added the decrementer auto-reload support. DECAR is readable
on e500v2/e500mc and later cpus.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

21bd000a

KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281

由 Paul Mackerras 提交于 5月 04, 2012

This adds a new ioctl to enable userspace to control the size of the guest
hashed page table (HPT) and to clear it out when resetting the guest.
The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
a pointer to a u32 containing the desired order of the HPT (log base 2
of the size in bytes), which is updated on successful return to the
actual order of the HPT which was allocated.

There must be no vcpus running at the time of this ioctl.  To enforce
this, we now keep a count of the number of vcpus running in
kvm->arch.vcpus_running.

If the ioctl is called when a HPT has already been allocated, we don't
reallocate the HPT but just clear it out.  We first clear the
kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
the kvm->lock mutex, it will prevent any vcpus from starting to run until
we're done, and (b) it means that the first vcpu to run after we're done
will re-establish the VRMA if necessary.

If userspace doesn't call this ioctl before running the first vcpu, the
kernel will allocate a default-sized HPT at that point.  We do it then
rather than when creating the VM, as the code did previously, so that
userspace has a chance to do the ioctl if it wants.

When allocating the HPT, we can allocate either from the kernel page
allocator, or from the preallocated pool.  If userspace is asking for
a different size from the preallocated HPTs, we first try to allocate
using the kernel page allocator.  Then we try to allocate from the
preallocated pool, and then if that fails, we try allocating decreasing
sizes from the kernel page allocator, down to the minimum size allowed
(256kB).  Note that the kernel page allocator limits allocations to
1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
16MB (on 64-bit powerpc, at least).
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix module compilation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

32fad281

06 5月, 2012 1 次提交

kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM · f31e65e1

由 Benjamin Herrenschmidt 提交于 3月 15, 2012

There is nothing in the code for emulating TCE tables in the kernel
that prevents it from working on "PR" KVM... other than ifdef's and
location of the code.

This and moves the bulk of the code there to a new file called
book3s_64_vio.c.

This speeds things up a bit on my G5.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix for hv kvm, 32bit, whitespace]
Signed-off-by: NAlexander Graf <agraf@suse.de>

f31e65e1

08 4月, 2012 2 次提交

KVM: PPC: Pass EA to updating emulation ops · 6020c0f6

由 Alexander Graf 提交于 3月 12, 2012

When emulating updating load/store instructions (lwzu, stwu, ...) we need to
write the effective address of the load/store into a register.

Currently, we write the physical address in there, which is very wrong. So
instead let's save off where the virtual fault was on MMIO and use that
information as value to put into the register.

While at it, also move the XOP variants of the above instructions to the new
scheme of using the already known vaddr instead of calculating it themselves.
Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6020c0f6

KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace log · 0456ec4f

由 Paul Mackerras 提交于 2月 03, 2012

This adds code to measure "stolen" time per virtual core in units of
timebase ticks, and to report the stolen time to the guest using the
dispatch trace log (DTL).  The guest can register an area of memory
for the DTL for a given vcpu.  The DTL is a ring buffer where KVM
fills in one entry every time it enters the guest for that vcpu.

Stolen time is measured as time when the virtual core is not running,
either because the vcore is not runnable (e.g. some of its vcpus are
executing elsewhere in the kernel or in userspace), or when the vcpu
thread that is running the vcore is preempted.  This includes time
when all the vcpus are idle (i.e. have executed the H_CEDE hypercall),
which is OK because the guest accounts stolen time while idle as idle
time.

Each vcpu keeps a record of how much stolen time has been reported to
the guest for that vcpu so far.  When we are about to enter the guest,
we create a new DTL entry (if the guest vcpu has a DTL) and report the
difference between total stolen time for the vcore and stolen time
reported so far for the vcpu as the "enqueue to dispatch" time in the
DTL entry.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0456ec4f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功