提交 · 2e23f544135e7b5fc2f0bcb6fa935c4b4f5058b2 · openeuler / raspberrypi-kernel

30 5月, 2014 4 次提交

KVM: PPC: Book3S PR: Expose EBB registers · 2e23f544

由 Alexander Graf 提交于 4月 29, 2014

POWER8 introduces a new facility called the "Event Based Branch" facility.
It contains of a few registers that indicate where a guest should branch to
when a defined event occurs and it's in PR mode.

We don't want to really enable EBB as it will create a big mess with !PR guest
mode while hardware is in PR and we don't really emulate the PMU anyway.

So instead, let's just leave it at emulation of all its registers.
Signed-off-by: NAlexander Graf <agraf@suse.de>

2e23f544

KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e

由 Alexander Graf 提交于 4月 22, 2014

POWER8 implements a new register called TAR. This register has to be
enabled in FSCR and then from KVM's point of view is mere storage.

This patch enables the guest to use TAR.
Signed-off-by: NAlexander Graf <agraf@suse.de>

e14e7a1e

KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR · 616dff86

由 Alexander Graf 提交于 4月 29, 2014

POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
which contains its status message in a new register called FSCR.

Handle these exits and try to emulate instructions for unhandled facilities.
Follow-on patches enable KVM to expose specific facilities into the guest.
Signed-off-by: NAlexander Graf <agraf@suse.de>

616dff86

KVM: PPC: Make shared struct aka magic page guest endian · 5deb8e7a

由 Alexander Graf 提交于 4月 24, 2014

The shared (magic) page is a data structure that contains often used
supervisor privileged SPRs accessible via memory to the user to reduce
the number of exits we have to take to read/write them.

When we actually share this structure with the guest we have to maintain
it in guest endianness, because some of the patch tricks only work with
native endian load/store operations.

Since we only share the structure with either host or guest in little
endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.

For booke, the shared struct stays big endian. For book3s_64 hv we maintain
the struct in host native endian, since it never gets shared with the guest.

For book3s_64 pr we introduce a variable that tells us which endianness the
shared struct is in and route every access to it through helper inline
functions that evaluate this variable.
Signed-off-by: NAlexander Graf <agraf@suse.de>

5deb8e7a

29 3月, 2014 2 次提交

KVM: PPC: Book3S HV: Return ENODEV error rather than EIO · 739e2425

由 Paul Mackerras 提交于 3月 25, 2014

If an attempt is made to load the kvm-hv module on a machine which
doesn't have hypervisor mode available, return an ENODEV error,
which is the conventional thing to return to indicate that this
module is not applicable to the hardware of the current machine,
rather than EIO, which causes a warning to be printed.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NScott Wood <scottwood@freescale.com>

739e2425

KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state · a7d80d01

由 Michael Neuling 提交于 3月 25, 2014

This adds code to get/set_one_reg to read and write the new transactional
memory (TM) state.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NScott Wood <scottwood@freescale.com>

a7d80d01

26 3月, 2014 2 次提交

KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n · 7505258c

由 Anton Blanchard 提交于 3月 25, 2014

I noticed KVM is broken when KVM in-kernel XICS emulation
(CONFIG_KVM_XICS) is disabled.

The problem was introduced in 48eaef05 (KVM: PPC: Book3S HV: use
xics_wake_cpu only when defined). It used CONFIG_KVM_XICS to wrap
xics_wake_cpu, where CONFIG_PPC_ICP_NATIVE should have been
used.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NScott Wood <scottwood@freescale.com>

7505258c

KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write · e59d24e6

由 Greg Kurz 提交于 2月 06, 2014

When the guest does an MMIO write which is handled successfully by an
ioeventfd, ioeventfd_write() returns 0 (success) and
kvmppc_handle_store() returns EMULATE_DONE.  Then
kvmppc_emulate_mmio() converts EMULATE_DONE to RESUME_GUEST_NV and
this causes an exit from the loop in kvmppc_vcpu_run_hv(), causing an
exit back to userspace with a bogus exit reason code, typically
causing userspace (e.g. qemu) to crash with a message about an unknown
exit code.

This adds handling of RESUME_GUEST_NV in kvmppc_vcpu_run_hv() in order
to fix that.  For generality, we define a helper to check for either
of the return-to-guest codes we use, RESUME_GUEST and RESUME_GUEST_NV,
to make it easy to check for either and provide one place to update if
any other return-to-guest code gets defined in future.

Since it only affects Book3S HV for now, the helper is added to
the kvm_book3s.h header file.

We use the helper in two places in kvmppc_run_core() as well for
future-proofing, though we don't see RESUME_GUEST_NV in either place
at present.

[paulus@samba.org - combined 4 patches into one, rewrote description]
Suggested-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

e59d24e6

27 1月, 2014 10 次提交

KVM: PPC: Book3S HV: Add new state for transactional memory · 7b490411

由 Michael Neuling 提交于 1月 08, 2014

Add new state for transactional memory (TM) to kvm_vcpu_arch.  Also add
asm-offset bits that are going to be required.

This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a
CONFIG_PPC_TRANSACTIONAL_MEM section.  This requires some code changes to
ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N.  Much of the added
the added #ifdefs are removed in a later patch when the bulk of the TM code is
added.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix merge conflict]
Signed-off-by: NAlexander Graf <agraf@suse.de>

7b490411

KVM: PPC: Book3S HV: Basic little-endian guest support · d682916a

由 Anton Blanchard 提交于 1月 08, 2014

We create a guest MSR from scratch when delivering exceptions in
a few places.  Instead of extracting LPCR[ILE] and inserting it
into MSR_LE each time, we simply create a new variable intr_msr which
contains the entire MSR to use.  For a little-endian guest, userspace
needs to set the ILE (interrupt little-endian) bit in the LPCR for
each vcpu (or at least one vcpu in each virtual core).

[paulus@samba.org - removed H_SET_MODE implementation from original
version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.]
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

d682916a

KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52

由 Paul Mackerras 提交于 1月 08, 2014

The DABRX (DABR extension) register on POWER7 processors provides finer
control over which accesses cause a data breakpoint interrupt. It
contains 3 bits which indicate whether to enable accesses in user,
kernel and hypervisor modes respectively to cause data breakpoint
interrupts, plus one bit that enables both real mode and virtual mode
accesses to cause interrupts. Currently, KVM sets DABRX to allow
both kernel and user accesses to cause interrupts while in the guest.

This adds support for the guest to specify other values for DABRX.
PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
and DABRX with one call. This adds a real-mode implementation of
H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
implementation. To support this, we add a per-vcpu field to store the
DABRX value plus code to get and set it via the ONE_REG interface.

For Linux guests to use this new hcall, userspace needs to add
"hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
property in the device tree. If userspace does this and then migrates
the guest to a host where the kernel doesn't include this patch, then
userspace will need to implement H_SET_XDABR by writing the specified
DABR value to the DABR using the ONE_REG interface. In that case, the
old kernel will set DABRX to DABRX_USER | DABRX_KERNEL. That should
still work correctly, at least for Linux guests, since Linux guests
cope with getting data breakpoint interrupts in modes that weren't
requested by just ignoring the interrupt, and Linux guests never set
DABRX_BTI.

The other thing this does is to make H_SET_DABR and H_SET_XDABR work
on POWER8, which has the DAWR and DAWRX instead of DABR/X. Guests that
know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
For them, this adds the logic to convert DABR/X values into DAWR/X values
on POWER8.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8563bf52

KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells · 5d00f66b

由 Paul Mackerras 提交于 1月 08, 2014

POWER8 has support for hypervisor doorbell interrupts.  Though the
kernel doesn't use them for IPIs on the powernv platform yet, it
probably will in future, so this makes KVM cope gracefully if a
hypervisor doorbell interrupt arrives while in a guest.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5d00f66b

KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 · e0622bd9

由 Paul Mackerras 提交于 1月 08, 2014

POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
registers to count when in the guest.  Set this bit.

POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
which is used to enable relocation-on interrupts.  Allow userspace to
set this field.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e0622bd9

KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 · 5557ae0e

由 Paul Mackerras 提交于 1月 08, 2014

This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
compatibility modes on a POWER8 processor.  (Note that transactional
memory is disabled for usermode if either or both of the PCR_TM_DIS
and PCR_ARCH_206 bits are set.)
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5557ae0e

KVM: PPC: Book3S HV: Add handler for HV facility unavailable · bd3048b8

由 Michael Ellerman 提交于 1月 08, 2014

At present this should never happen, since the host kernel sets
HFSCR to allow access to all facilities.  It's better to be prepared
to handle it cleanly if it does ever happen, though.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

bd3048b8

KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e

由 Michael Neuling 提交于 1月 08, 2014

This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.

Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b005255e

KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers · e0b7ec05

由 Paul Mackerras 提交于 1月 08, 2014

On a threaded processor such as POWER7, we group VCPUs into virtual
cores and arrange that the VCPUs in a virtual core run on the same
physical core.  Currently we don't enforce any correspondence between
virtual thread numbers within a virtual core and physical thread
numbers.  Physical threads are allocated starting at 0 on a first-come
first-served basis to runnable virtual threads (VCPUs).

POWER8 implements a new "msgsndp" instruction which guest kernels can
use to interrupt other threads in the same core or sub-core.  Since
the instruction takes the destination physical thread ID as a parameter,
it becomes necessary to align the physical thread IDs with the virtual
thread IDs, that is, to make sure virtual thread N within a virtual
core always runs on physical thread N.

This means that it's possible that thread 0, which is where we call
__kvmppc_vcore_entry, may end up running some other vcpu than the
one whose task called kvmppc_run_core(), or it may end up running
no vcpu at all, if for example thread 0 of the virtual core is
currently executing in userspace.  However, we do need thread 0
to be responsible for switching the MMU -- a previous version of
this patch that had other threads switching the MMU was found to
be responsible for occasional memory corruption and machine check
interrupts in the guest on POWER7 machines.

To accommodate this, we no longer pass the vcpu pointer to
__kvmppc_vcore_entry, but instead let the assembly code load it from
the PACA.  Since the assembly code will need to know the kvm pointer
and the thread ID for threads which don't have a vcpu, we move the
thread ID into the PACA and we add a kvm pointer to the virtual core
structure.

In the case where thread 0 has no vcpu to run, it still calls into
kvmppc_hv_entry in order to do the MMU switch, and then naps until
either its vcpu is ready to run in the guest, or some other thread
needs to exit the guest.  In the latter case, thread 0 jumps to the
code that switches the MMU back to the host.  This control flow means
that now we switch the MMU before loading any guest vcpu state.
Similarly, on guest exit we now save all the guest vcpu state before
switching the MMU back to the host.  This has required substantial
code movement, making the diff rather large.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e0b7ec05

KVM: PPC: Book3S HV: use xics_wake_cpu only when defined · 48eaef05

由 Andreas Schwab 提交于 12月 30, 2013

Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
CC: stable@vger.kernel.org
Signed-off-by: NAlexander Graf <agraf@suse.de>

48eaef05

09 1月, 2014 2 次提交

KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures · efff1912

由 Paul Mackerras 提交于 10月 15, 2013

This uses struct thread_fp_state and struct thread_vr_state to store
the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
This makes transferring the state to/from the thread_struct simpler
and allows us to unify the get/set_one_reg implementations for the
VSX registers.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

efff1912

KVM: PPC: Add devname:kvm aliases for modules · 398a76c6

由 Alexander Graf 提交于 12月 09, 2013

Systems that support automatic loading of kernel modules through
device aliases should try and automatically load kvm when /dev/kvm
gets opened.

Add code to support that magic for all PPC kvm targets, even the
ones that don't support modules yet.
Signed-off-by: NAlexander Graf <agraf@suse.de>

398a76c6

13 12月, 2013 1 次提交

KVM: Use cond_resched() directly and remove useless kvm_resched() · c08ac06a

由 Takuya Yoshikawa 提交于 12月 13, 2013

Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers
to make kvm preemptible"), the remaining stuff in this function is a
simple cond_resched() call with an extra need_resched() check which was
there to avoid dropping VCPUs unnecessarily. Now it is meaningless.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c08ac06a

21 11月, 2013 1 次提交

powerpc: kvm: optimize "sc 1" as fast return · 27025a60

由 Liu Ping Fan 提交于 11月 19, 2013

In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently,
this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL
to HV guest, so powernv can return to HV guest without heavy exit, i.e,
no need to swap TLB, HTAB,.. etc
Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

27025a60

19 11月, 2013 2 次提交

KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call · c9438092

由 Paul Mackerras 提交于 11月 16, 2013

Running a kernel with CONFIG_PROVE_RCU=y yields the following diagnostic:

===============================
[ INFO: suspicious RCU usage. ]
3.12.0-rc5-kvm+ #9 Not tainted
-------------------------------

include/linux/kvm_host.h:473 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-ppc/4831:

stack backtrace:
CPU: 28 PID: 4831 Comm: qemu-system-ppc Not tainted 3.12.0-rc5-kvm+ #9
Call Trace:
[c000000be462b2a0] [c00000000001644c] .show_stack+0x7c/0x1f0 (unreliable)
[c000000be462b370] [c000000000ad57c0] .dump_stack+0x88/0xb4
[c000000be462b3f0] [c0000000001315e8] .lockdep_rcu_suspicious+0x138/0x180
[c000000be462b480] [c00000000007862c] .gfn_to_memslot+0x13c/0x170
[c000000be462b510] [c00000000007d384] .gfn_to_hva_prot+0x24/0x90
[c000000be462b5a0] [c00000000007d420] .kvm_read_guest_page+0x30/0xd0
[c000000be462b630] [c00000000007d528] .kvm_read_guest+0x68/0x110
[c000000be462b6e0] [c000000000084594] .kvmppc_rtas_hcall+0x34/0x180
[c000000be462b7d0] [c000000000097934] .kvmppc_pseries_do_hcall+0x74/0x830
[c000000be462b880] [c0000000000990e8] .kvmppc_vcpu_run_hv+0xff8/0x15a0
[c000000be462b9e0] [c0000000000839cc] .kvmppc_vcpu_run+0x2c/0x40
[c000000be462ba50] [c0000000000810b4] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
[c000000be462bae0] [c00000000007b508] .kvm_vcpu_ioctl+0x478/0x730
[c000000be462bca0] [c00000000025532c] .do_vfs_ioctl+0x4dc/0x7a0
[c000000be462bd80] [c0000000002556b4] .SyS_ioctl+0xc4/0xe0
[c000000be462be30] [c000000000009ee4] syscall_exit+0x0/0x98

To fix this, we take the SRCU read lock around the kvmppc_rtas_hcall()
call.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c9438092

KVM: PPC: Book3S HV: Make tbacct_lock irq-safe · bf3d32e1

由 Paul Mackerras 提交于 11月 16, 2013

Lockdep reported that there is a potential for deadlock because
vcpu->arch.tbacct_lock is not irq-safe, and is sometimes taken inside
the rq_lock (run-queue lock) in the scheduler, which is taken within
interrupts.  The lockdep splat looks like:

======================================================
[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
3.12.0-rc5-kvm+ #8 Not tainted
------------------------------------------------------
qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(&(&vcpu->arch.tbacct_lock)->rlock){+.+...}, at: [<c0000000000947ac>] .kvmppc_core_vcpu_put_hv+0x2c/0xa0

and this task is already holding:
(&rq->lock){-.-.-.}, at: [<c000000000ac16c0>] .__schedule+0x180/0xaa0
which would create a new lock dependency:
(&rq->lock){-.-.-.} -> (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}

but this new dependency connects a HARDIRQ-irq-safe lock:
(&rq->lock){-.-.-.}
... which became HARDIRQ-irq-safe at:
 [<c00000000013797c>] .lock_acquire+0xbc/0x190
 [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60
 [<c0000000000f8564>] .scheduler_tick+0x54/0x180
 [<c0000000000c2610>] .update_process_times+0x70/0xa0
 [<c00000000012cdfc>] .tick_periodic+0x3c/0xe0
 [<c00000000012cec8>] .tick_handle_periodic+0x28/0xb0
 [<c00000000001ef40>] .timer_interrupt+0x120/0x2e0
 [<c000000000002868>] decrementer_common+0x168/0x180
 [<c0000000001c7ca4>] .get_page_from_freelist+0x924/0xc10
 [<c0000000001c8e00>] .__alloc_pages_nodemask+0x200/0xba0
 [<c0000000001c9eb8>] .alloc_pages_exact_nid+0x68/0x110
 [<c000000000f4c3ec>] .page_cgroup_init+0x1e0/0x270
 [<c000000000f24480>] .start_kernel+0x3e0/0x4e4
 [<c000000000009d30>] .start_here_common+0x20/0x70

to a HARDIRQ-irq-unsafe lock:
(&(&vcpu->arch.tbacct_lock)->rlock){+.+...}
... which became HARDIRQ-irq-unsafe at:
...  [<c00000000013797c>] .lock_acquire+0xbc/0x190
 [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60
 [<c0000000000946ac>] .kvmppc_core_vcpu_load_hv+0x2c/0x100
 [<c00000000008394c>] .kvmppc_core_vcpu_load+0x2c/0x40
 [<c000000000081000>] .kvm_arch_vcpu_load+0x10/0x30
 [<c00000000007afd4>] .vcpu_load+0x64/0xd0
 [<c00000000007b0f8>] .kvm_vcpu_ioctl+0x68/0x730
 [<c00000000025530c>] .do_vfs_ioctl+0x4dc/0x7a0
 [<c000000000255694>] .SyS_ioctl+0xc4/0xe0
 [<c000000000009ee4>] syscall_exit+0x0/0x98

Some users have reported this deadlock occurring in practice, though
the reports have been primarily on 3.10.x-based kernels.

This fixes the problem by making tbacct_lock be irq-safe.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

bf3d32e1

18 10月, 2013 2 次提交

kvm: powerpc: book3s: drop is_hv_enabled · a78b55d1

由 Aneesh Kumar K.V 提交于 10月 07, 2013

drop is_hv_enabled, because that should not be a callback property
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a78b55d1

kvm: powerpc: book3s: Allow the HV and PR selection per virtual machine · cbbc58d4

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This moves the kvmppc_ops callbacks to be a per VM entity. This
enables us to select HV and PR mode when creating a VM. We also
allow both kvm-hv and kvm-pr kernel module to be loaded. To
achieve this we move /dev/kvm ownership to kvm.ko module. Depending on
which KVM mode we select during VM creation we take a reference
count on respective module
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix coding style]
Signed-off-by: NAlexander Graf <agraf@suse.de>

cbbc58d4

17 10月, 2013 11 次提交

kvm: powerpc: book3s: Support building HV and PR KVM as module · 2ba9f0d8

由 Aneesh Kumar K.V 提交于 10月 07, 2013

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in compile fix]
Signed-off-by: NAlexander Graf <agraf@suse.de>

2ba9f0d8

kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops · 699cc876

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This help us to identify whether we are running with hypervisor mode KVM
enabled. The change is needed so that we can have both HV and PR kvm
enabled in the same kernel.

If both HV and PR KVM are included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest.

Allowing both PR and HV in the same kernel required some changes to
kvm_dev_ioctl_check_extension(), since the values returned now can't
be selected with #ifdefs as much as previously. We look at is_hv_enabled
to return the right value when checking for capabilities.For capabilities that
are only provided by HV KVM, we return the HV value only if
is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
we return the PR value only if is_hv_enabled is false.

NOTE: in later patch we replace is_hv_enabled with a static inline
function comparing kvm_ppc_ops
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

699cc876

kvm: powerpc: Add kvmppc_ops callback · 3a167bea

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This patch add a new callback kvmppc_ops. This will help us in enabling
both HV and PR KVM together in the same kernel. The actual change to
enable them together is done in the later patch in the series.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in booke changes]
Signed-off-by: NAlexander Graf <agraf@suse.de>

3a167bea

kvm: powerpc: book3s hv: Fix vcore leak · f1378b1c

由 Paul Mackerras 提交于 9月 27, 2013

add kvmppc_free_vcores() to free the kvmppc_vcore structures
that we allocate for a guest, which are currently being leaked.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f1378b1c

KVM: PPC: Book3S HV: Don't crash host on unknown guest interrupt · f3271d4c

由 Paul Mackerras 提交于 9月 20, 2013

If we come out of a guest with an interrupt that we don't know about,
instead of crashing the host with a BUG(), we now return to userspace
with the exit reason set to KVM_EXIT_UNKNOWN and the trap vector in
the hw.hardware_exit_reason field of the kvm_run structure, as is done
on x86.  Note that run->exit_reason is already set to KVM_EXIT_UNKNOWN
at the beginning of kvmppc_handle_exit().
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f3271d4c

KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1

由 Paul Mackerras 提交于 9月 21, 2013

This enables us to use the Processor Compatibility Register (PCR) on
POWER7 to put the processor into architecture 2.05 compatibility mode
when running a guest.  In this mode the new instructions and registers
that were introduced on POWER7 are disabled in user mode.  This
includes all the VSX facilities plus several other instructions such
as ldbrx, stdbrx, popcntw, popcntd, etc.

To select this mode, we have a new register accessible through the
set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
this to zero gives the full set of capabilities of the processor.
Setting it to one of the "logical" PVR values defined in PAPR puts
the vcpu into the compatibility mode for the corresponding
architecture level.  The supported values are:

0x0f000002	Architecture 2.05 (POWER6)
0x0f000003	Architecture 2.06 (POWER7)
0x0f100003	Architecture 2.06+ (POWER7+)

Since the PCR is per-core, the architecture compatibility level and
the corresponding PCR value are stored in the struct kvmppc_vcore, and
are therefore shared between all vcpus in a virtual core.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: squash in fix to add missing break statements and documentation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

388cc6e1

KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9

由 Paul Mackerras 提交于 9月 20, 2013

POWER7 and later IBM server processors have a register called the
Program Priority Register (PPR), which controls the priority of
each hardware CPU SMT thread, and affects how fast it runs compared
to other SMT threads.  This priority can be controlled by writing to
the PPR or by use of a set of instructions of the form or rN,rN,rN
which are otherwise no-ops but have been defined to set the priority
to particular levels.

This adds code to context switch the PPR when entering and exiting
guests and to make the PPR value accessible through the SET/GET_ONE_REG
interface.  When entering the guest, we set the PPR as late as
possible, because if we are setting a low thread priority it will
make the code run slowly from that point on.  Similarly, the
first-level interrupt handlers save the PPR value in the PACA very
early on, and set the thread priority to the medium level, so that
the interrupt handling code runs at a reasonable speed.
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4b8473c9

KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a

由 Paul Mackerras 提交于 9月 20, 2013

This adds the ability to have a separate LPCR (Logical Partitioning
Control Register) value relating to a guest for each virtual core,
rather than only having a single value for the whole VM.  This
corresponds to what real POWER hardware does, where there is a LPCR
per CPU thread but most of the fields are required to have the same
value on all active threads in a core.

The per-virtual-core LPCR can be read and written using the
GET/SET_ONE_REG interface.  Userspace can can only modify the
following fields of the LPCR value:

DPFD	Default prefetch depth
ILE	Interrupt little-endian
TC	Translation control (secondary HPT hash group search disable)

We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
contains bits relating to memory management, i.e. the Virtualized
Partition Memory (VPM) bits and the bits relating to guest real mode.
When this default value is updated, the update needs to be propagated
to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
that.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[agraf: fix whitespace]
Signed-off-by: NAlexander Graf <agraf@suse.de>

a0144e2a

KVM: PPC: Book3S HV: Implement H_CONFER · 42d7604d

由 Paul Mackerras 提交于 9月 06, 2013

The H_CONFER hypercall is used when a guest vcpu is spinning on a lock
held by another vcpu which has been preempted, and the spinning vcpu
wishes to give its timeslice to the lock holder.  We implement this
in the straightforward way using kvm_vcpu_yield_to().
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

42d7604d

KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc

由 Paul Mackerras 提交于 9月 06, 2013

This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  The kernel rounds up the value given
to a multiple of 2^24.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu->arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu->arch.dec_expires is a
host timebase value.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

93b0f4dc

KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789

由 Paul Mackerras 提交于 9月 06, 2013

Currently we are not saving and restoring the SIAR and SDAR registers in
the PMU (performance monitor unit) on guest entry and exit.  The result
is that performance monitoring tools in the guest could get false
information about where a program was executing and what data it was
accessing at the time of a performance monitor interrupt.  This fixes
it by saving and restoring these registers along with the other PMU
registers on guest entry/exit.

This also provides a way for userspace to access these values for a
vcpu via the one_reg interface.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

14941789

28 8月, 2013 1 次提交

arch: powerpc: kvm: add signed type cast for comparation · 5d226ae5

由 Chen Gang 提交于 7月 22, 2013

'rmls' is 'unsigned long', lpcr_rmls() will return negative number when
failure occurs, so it need a type cast for comparing.

'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number
when failure occurs, so it need a type cast for comparing.
Signed-off-by: NChen Gang <gang.chen@asianux.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5d226ae5

26 8月, 2013 1 次提交

ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag · 2f84d5ea

由 Yann Droneaud 提交于 8月 24, 2013

KVM uses anon_inode_get() to allocate file descriptors as part
of some of its ioctls. But those ioctls are lacking a flag argument
allowing userspace to choose options for the newly opened file descriptor.

In such case it's advised to use O_CLOEXEC by default so that
userspace is allowed to choose, without race, if the file descriptor
is going to be inherited across exec().

This patch set O_CLOEXEC flag on all file descriptors created
with anon_inode_getfd() to not leak file descriptors across exec().
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.comReviewed-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

2f84d5ea

23 8月, 2013 1 次提交

powerpc/kvm: Copy the pvr value after memset · 87916442

由 Aneesh Kumar K.V 提交于 8月 22, 2013

Otherwise we would clear the pvr value
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

87916442