提交 · 2365f6b67c0d786b9d1cb1268575e42807fe47e2 · openeuler / Kernel

27 9月, 2016 4 次提交

KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL · 2365f6b6

由 Thomas Huth 提交于 9月 21, 2016

On POWER8E and POWER8NVL, KVM-PR does not announce support for
64kB page sizes and 1TB segments yet. Looks like this has just
been forgotton so far, since there is no reason why this should
be different to the normal POWER8 CPUs.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

2365f6b6

KVM: PPC: BookE: Fix a sanity check · ac0e89bb

由 Dan Carpenter 提交于 7月 14, 2016

We use logical negate where bitwise negate was intended.  It means that
we never return -EINVAL here.

Fixes: ce11e48b ('KVM: PPC: E500: Add userspace debug stub support')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

ac0e89bb

KVM: PPC: Book3S HV: Take out virtual core piggybacking code · b009031f

由 Paul Mackerras 提交于 9月 15, 2016

This takes out the code that arranges to run two (or more) virtual
cores on a single subcore when possible, that is, when both vcores
are from the same VM, the VM is configured with one CPU thread per
virtual core, and all the per-subcore registers have the same value
in each vcore.  Since the VTB (virtual timebase) is a per-subcore
register, and will almost always differ between vcores, this code
is disabled on POWER8 machines, meaning that it is only usable on
POWER7 machines (which don't have VTB).  Given the tiny number of
POWER7 machines which have firmware that allows them to run HV KVM,
the benefit of simplifying the code outweighs the loss of this
feature on POWER7 machines.
Tested-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

b009031f

KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9

由 Paul Mackerras 提交于 9月 15, 2016

POWER8 has one virtual timebase (VTB) register per subcore, not one
per CPU thread.  The HV KVM code currently treats VTB as a per-thread
register, which can lead to spurious soft lockup messages from guests
which use the VTB as the time source for the soft lockup detector.
(CPUs before POWER8 did not have the VTB register.)

For HV KVM, this fixes the problem by making only the primary thread
in each virtual core save and restore the VTB value.  With this,
the VTB state becomes part of the kvmppc_vcore structure.  This
also means that "piggybacking" of multiple virtual cores onto one
subcore is not possible on POWER8, because then the virtual cores
would share a single VTB register.

PR KVM emulates a VTB register, which is per-vcpu because PR KVM
has no notion of CPU threads or SMT.  For PR KVM we move the VTB
state into the kvmppc_vcpu_book3s struct.

Cc: stable@vger.kernel.org # v3.14+
Reported-by: NThomas Huth <thuth@redhat.com>
Tested-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

88b02cf9

16 9月, 2016 1 次提交

kvm: add stubs for arch specific debugfs support · 235539b4

由 Luiz Capitulino 提交于 9月 07, 2016

Two stubs are added:

 o kvm_arch_has_vcpu_debugfs(): must return true if the arch
   supports creating debugfs entries in the vcpu debugfs dir
   (which will be implemented by the next commit)

 o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
   entries in the vcpu debugfs dir

For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

235539b4

13 9月, 2016 1 次提交

KVM: PPC: e500: Rename jump labels in kvmppc_e500_tlb_init() · aad9e5ba

由 Markus Elfring 提交于 9月 12, 2016

Adjust jump labels according to the current Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

aad9e5ba

12 9月, 2016 12 次提交

KVM: PPC: e500: Use kmalloc_array() in kvmppc_e500_tlb_init() · 90235dc1

由 Markus Elfring 提交于 8月 28, 2016

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

90235dc1

KVM: PPC: e500: Replace kzalloc() calls by kcalloc() in two functions · b0ac477b

由 Markus Elfring 提交于 8月 28, 2016

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kcalloc".
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>

  This issue was detected also by using the Coccinelle software.

* Replace the specification of data structures by pointer dereferences
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

b0ac477b

KVM: PPC: e500: Delete an unnecessary initialisation in kvm_vcpu_ioctl_config_tlb() · cfb60813

由 Markus Elfring 提交于 8月 28, 2016

The local variable "g2h_bitmap" will be set to an appropriate value
a bit later. Thus omit the explicit initialisation at the beginning.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

cfb60813

KVM: PPC: e500: Less function calls in kvm_vcpu_ioctl_config_tlb() after error detection · 46d4e747

由 Markus Elfring 提交于 8月 28, 2016

The kfree() function was called in two cases by the
kvm_vcpu_ioctl_config_tlb() function during error handling
even if the passed data structure element contained a null pointer.

* Split a condition check for memory allocation failures.

* Adjust jump targets according to the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

46d4e747

KVM: PPC: e500: Use kmalloc_array() in kvm_vcpu_ioctl_config_tlb() · f3c0ce86

由 Markus Elfring 提交于 8月 28, 2016

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f3c0ce86

KVM: PPC: Book3S HV: Counters for passthrough IRQ stats · 65e7026a

由 Suresh Warrier 提交于 8月 19, 2016

Add VCPU stat counters to track affinity for passthrough
interrupts.

pthru_all: Counts all passthrough interrupts whose IRQ mappings are
           in the kvmppc_passthru_irq_map structure.
pthru_host: Counts all cached passthrough interrupts that were injected
	    from the host through kvm_set_irq (i.e. not handled in
	    real mode).
pthru_bad_aff: Counts how many cached passthrough interrupts have
               bad affinity (receiving CPU is not running VCPU that is
	       the target of the virtual interrupt in the guest).
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

65e7026a

KVM: PPC: Book3S HV: Set server for passed-through interrupts · 5d375199

由 Paul Mackerras 提交于 8月 19, 2016

When a guest has a PCI pass-through device with an interrupt, it
will direct the interrupt to a particular guest VCPU.  In fact the
physical interrupt might arrive on any CPU, and then get
delivered to the target VCPU in the emulated XICS (guest interrupt
controller), and eventually delivered to the target VCPU.

Now that we have code to handle device interrupts in real mode
without exiting to the host kernel, there is an advantage to having
the device interrupt arrive on the same sub(core) as the target
VCPU is running on.  In this situation, the interrupt can be
delivered to the target VCPU without any exit to the host kernel
(using a hypervisor doorbell interrupt between threads if
necessary).

This patch aims to get passed-through device interrupts arriving
on the correct core by setting the interrupt server in the real
hardware XICS for the interrupt to the first thread in the (sub)core
where its target VCPU is running.  We do this in the real-mode H_EOI
code because the H_EOI handler already needs to look at the
emulated ICS state for the interrupt (whereas the H_XIRR handler
doesn't), and we know we are running in the target VCPU context
at that point.

We set the server CPU in hardware using an OPAL call, regardless of
what the IRQ affinity mask for the interrupt says, and without
updating the affinity mask.  This amounts to saying that when an
interrupt is passed through to a guest, as a matter of policy we
allow the guest's affinity for the interrupt to override the host's.

This is inspired by an earlier patch from Suresh Warrier, although
none of this code came from that earlier patch.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

5d375199

KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode · 366274f5

由 Suresh Warrier 提交于 8月 19, 2016

When a passthrough IRQ is handled completely within KVM real
mode code, it has to also update the IRQ stats since this
does not go through the generic IRQ handling code.

However, the per CPU kstat_irqs field is an allocated (not static)
field and so cannot be directly accessed in real mode safely.

The function this_cpu_inc_rm() is introduced to safely increment
per CPU fields (currently coded for unsigned integers only) that
are allocated and could thus be vmalloced also.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

366274f5

KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass · 644abbb2

由 Suresh Warrier 提交于 8月 19, 2016

Add a  module parameter kvm_irq_bypass for kvm_hv.ko to
disable IRQ bypass for passthrough interrupts. The default
value of this tunable is 1 - that is enable the feature.

Since the tunable is used by built-in kernel code, we use
the module_param_cb macro to achieve this.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

644abbb2

KVM: PPC: Book3S HV: Dump irqmap in debugfs · af893c7d

由 Suresh Warrier 提交于 8月 19, 2016

Dump the passthrough irqmap structure associated with a
guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

af893c7d

KVM: PPC: Book3S HV: Complete passthrough interrupt in host · f7af5209

由 Suresh Warrier 提交于 8月 19, 2016

In existing real mode ICP code, when updating the virtual ICP
state, if there is a required action that cannot be completely
handled in real mode, as for instance, a VCPU needs to be woken
up, flags are set in the ICP to indicate the required action.
This is checked when returning from hypercalls to decide whether
the call needs switch back to the host where the action can be
performed in virtual mode. Note that if h_ipi_redirect is enabled,
real mode code will first try to message a free host CPU to
complete this job instead of returning the host to do it ourselves.

Currently, the real mode PCI passthrough interrupt handling code
checks if any of these flags are set and simply returns to the host.
This is not good enough as the trap value (0x500) is treated as an
external interrupt by the host code. It is only when the trap value
is a hypercall that the host code searches for and acts on unfinished
work by calling kvmppc_xics_rm_complete.

This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
which is returned by KVM if there is unfinished business to be
completed in host virtual mode after handling a PCI passthrough
interrupt. The host checks for this special interrupt condition
and calls into the kvmppc_xics_rm_complete, which is made an
exported function for this reason.

[paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f7af5209

KVM: PPC: Book3S HV: Handle passthrough interrupts in guest · e3c13e56

由 Suresh Warrier 提交于 8月 19, 2016

Currently, KVM switches back to the host to handle any external
interrupt (when the interrupt is received while running in the
guest). This patch updates real-mode KVM to check if an interrupt
is generated by a passthrough adapter that is owned by this guest.
If so, the real mode KVM will directly inject the corresponding
virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
in hardware. In short, the interrupt is handled entirely in real
mode in the guest context without switching back to the host.

In some rare cases, the interrupt cannot be completely handled in
real mode, for instance, a VCPU that is sleeping needs to be woken
up. In this case, KVM simply switches back to the host with trap
reason set to 0x500. This works, but it is clearly not very efficient.
A following patch will distinguish this case and handle it
correctly in the host. Note that we can use the existing
check_too_hard() routine even though we are not in a hypercall to
determine if there is unfinished business that needs to be
completed in host virtual mode.

The patch assumes that the mapping between hardware interrupt IRQ
and virtual IRQ to be injected to the guest already exists for the
PCI passthrough interrupts that need to be handled in real mode.
If the mapping does not exist, KVM falls back to the default
existing behavior.

The KVM real mode code reads mappings from the mapped array in the
passthrough IRQ map without taking any lock.  We carefully order the
loads and stores of the fields in the kvmppc_irq_map data structure
using memory barriers to avoid an inconsistent mapping being seen by
the reader. Thus, although it is possible to miss a map entry, it is
not possible to read a stale value.

[paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
 pulled out powernv eoi change into a separate patch, made
 kvmppc_read_intr get the vcpu from the paca rather than being
 passed in, rewrote the logic at the end of kvmppc_read_intr to
 avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
 since we were always restoring SRR0/1 anyway, get rid of the cached
 array (just use the mapped array), removed the kick_all_cpus_sync()
 call, clear saved_xirr PACA field when we handle the interrupt in
 real mode, fix compilation with CONFIG_KVM_XICS=n.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e3c13e56

09 9月, 2016 5 次提交

KVM: PPC: Book3S HV: Enable IRQ bypass · c57875f5

由 Suresh Warrier 提交于 8月 19, 2016

Add the irq_bypass_add_producer and irq_bypass_del_producer
functions. These functions get called whenever a GSI is being
defined for a guest. They create/remove the mapping between
host real IRQ numbers and the guest GSI.

Add the following helper functions to manage the
passthrough IRQ map.

kvmppc_set_passthru_irq()
  Creates a mapping in the passthrough IRQ map that maps a host
  IRQ to a guest GSI. It allocates the structure (one per guest VM)
  the first time it is called.

kvmppc_clr_passthru_irq()
  Removes the passthrough IRQ map entry given a guest GSI.
  The passthrough IRQ map structure is not freed even when the
  number of mapped entries goes to zero. It is only freed when
  the VM is destroyed.

[paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than
 requiring all passed-through interrupts to use the same irq_chip;
 changed deletion so it zeroes out the r_hwirq field rather than
 copying the last entry down and decrementing the number of entries.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

c57875f5

KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap · 8daaafc8

由 Suresh Warrier 提交于 8月 19, 2016

This patch introduces an IRQ mapping structure, the
kvmppc_passthru_irqmap structure that is to be used
to map the real hardware IRQ in the host with the virtual
hardware IRQ (gsi) that is injected into a guest by KVM for
passthrough adapters.

Currently, we assume a separate IRQ mapping structure for
each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
containing all defined real<->virtual IRQs.

[paulus@ozlabs.org - removed irq_chip field from struct
 kvmppc_passthru_irqmap; changed parameter for
 kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
 kvm *, removed small cached array.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

8daaafc8

KVM: PPC: select IRQ_BYPASS_MANAGER · 9576730d

由 Suresh Warrier 提交于 8月 19, 2016

Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
Add the PPC producer functions for add and del producer.

[paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
 so booke compiles; added kvm_arch_has_irq_bypass implementation.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

9576730d

KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function · 37f55d30

由 Suresh Warrier 提交于 8月 19, 2016

Modify kvmppc_read_intr to make it a C function.  Because it is called
from kvmppc_check_wake_reason, any of the assembler code that calls
either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume
that the volatile registers might have been modified.

This also adds in the optimization of clearing saved_xirr in the case
where we completely handle and EOI an IPI.  Without this, the next
device interrupt will require two trips through the host interrupt
handling code.

[paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame
 when it is calling kvmppc_read_intr, which means we can set r12 to
 the trap number (0x500) after the call to kvmppc_read_intr, instead
 of using r31.  Also moved the deliver_guest_interrupt label so as to
 restore XER and CTR, plus other minor tweaks.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

37f55d30

powerpc: move hmi.c to arch/powerpc/kvm/ · 3f257774

由 Paolo Bonzini 提交于 8月 11, 2016

hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use.  So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.

Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

3f257774

08 9月, 2016 3 次提交

KVM: PPC: Implement existing and add new halt polling vcpu stats · 2a27f514

由 Suraj Jitindar Singh 提交于 8月 02, 2016

vcpu stats are used to collect information about a vcpu which can be viewed
in the debugfs. For example halt_attempted_poll and halt_successful_poll
are used to keep track of the number of times the vcpu attempts to and
successfully polls. These stats are currently not used on powerpc.

Implement incrementation of the halt_attempted_poll and
halt_successful_poll vcpu stats for powerpc. Since these stats are summed
over all the vcpus for all running guests it doesn't matter which vcpu
they are attributed to, thus we choose the current runner vcpu of the
vcore.

Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and
halt_wait_ns to be used to accumulate the total time spend polling
successfully, polling unsuccessfully and waiting respectively, and
halt_successful_wait to accumulate the number of times the vcpu waits.
Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are
expressed in nanoseconds it is necessary to represent these as 64-bit
quantities, otherwise they would overflow after only about 4 seconds.

Given that the total time spend either polling or waiting will be known and
the number of times that each was done, it will be possible to determine
the average poll and wait times. This will give the ability to tune the kvm
module parameters based on the calculated average wait and poll times.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

2a27f514

KVM: PPC: Book3S HV: Implement halt polling · 0cda69dd

由 Suraj Jitindar Singh 提交于 8月 02, 2016

This patch introduces new halt polling functionality into the kvm_hv kernel
module. When a vcore is idle it will poll for some period of time before
scheduling itself out.

When all of the runnable vcpus on a vcore have ceded (and thus the vcore is
idle) we schedule ourselves out to allow something else to run. In the
event that we need to wake up very quickly (for example an interrupt
arrives), we are required to wait until we get scheduled again.

Implement halt polling so that when a vcore is idle, and before scheduling
ourselves, we poll for vcpus in the runnable_threads list which have
pending exceptions or which leave the ceded state. If we poll successfully
then we can get back into the guest very quickly without ever scheduling
ourselves, otherwise we schedule ourselves out as before.

There exists generic halt_polling code in virt/kvm_main.c, however on
powerpc the polling conditions are different to the generic case. It would
be nice if we could just implement an arch specific kvm_check_block()
function, but there is still other arch specific things which need to be
done for kvm_hv (for example manipulating vcore states) which means that a
separate implementation is the best option.

Testing of this patch with a TCP round robin test between two guests with
virtio network interfaces has found a decrease in round trip time of ~15us
on average. A performance gain is only seen when going out of and
back into the guest often and quickly, otherwise there is no net benefit
from the polling. The polling interval is adjusted such that when we are
often scheduled out for long periods of time it is reduced, and when we
often poll successfully it is increased. The rate at which the polling
interval increases or decreases, and the maximum polling interval, can
be set through module parameters.

Based on the implementation in the generic kvm module by Wanpeng Li and
Paolo Bonzini, and on direction from Paul Mackerras.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

0cda69dd

KVM: PPC: Book3S HV: Change vcore element runnable_threads from linked-list to array · 7b5f8272

由 Suraj Jitindar Singh 提交于 8月 02, 2016

The struct kvmppc_vcore is a structure used to store various information
about a virtual core for a kvm guest. The runnable_threads element of the
struct provides a list of all of the currently runnable vcpus on the core
(those in the KVMPPC_VCPU_RUNNABLE state). The previous implementation of
this list was a linked_list. The next patch requires that the list be able
to be iterated over without holding the vcore lock.

Reimplement the runnable_threads list in the kvmppc_vcore struct as an
array. Implement function to iterate over valid entries in the array and
update access sites accordingly.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

7b5f8272

25 8月, 2016 1 次提交

KVM: PPC: Always select KVM_VFIO, plus Makefile cleanup · 4b3d173d

由 Paul Mackerras 提交于 8月 18, 2016

As discussed recently on the kvm mailing list, David Gibson's
intention in commit 178a7875 ("vfio: Enable VFIO device for
powerpc", 2016-02-01) was to have the KVM VFIO device built in
on all powerpc platforms.  This patch adds the "select KVM_VFIO"
statement that makes this happen.

Currently, arch/powerpc/kvm/Makefile doesn't include vfio.o for
the 64-bit kvm module, because the list of objects doesn't use
the $(common-objs-y) list.  The reason it doesn't is because we
don't necessarily want coalesced_mmio.o or emulate.o (for example
if HV KVM is the only target), and common-objs-y includes both.

Since this is confusing, this patch adjusts the definitions so that
we now use $(common-objs-y) in the list for the 64-bit kvm.ko
module, emulate.o is removed from common-objs-y and added in the
places that need it, and the inclusion of coalesced_mmio.o now
depends on CONFIG_KVM_MMIO.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

4b3d173d

19 8月, 2016 2 次提交

KVM: PPC: Implement kvm_arch_intc_initialized() for PPC · 34a75b0f

由 Paul Mackerras 提交于 8月 10, 2016

It doesn't make sense to create irqfds for a VM that doesn't have
in-kernel interrupt controller emulation.  There is an existing
interface for architecture code to tell the irqfd code whether or
not any interrupt controller has been initialized, called
kvm_arch_intc_initialized(), so let's implement that for powerpc.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

34a75b0f

KVM: PPC: Book3S: Don't crash if irqfd used with no in-kernel XICS emulation · e48ba1cb

由 Paul Mackerras 提交于 8月 10, 2016

It turns out that if userspace creates a pseries-type VM without
in-kernel XICS (interrupt controller) emulation, and then connects
an eventfd to the VM as an irqfd, and the eventfd gets signalled,
that the code will try to deliver an interrupt via the non-existent
XICS object and crash the host kernel with a NULL pointer dereference.

To fix this, we check for the presence of the XICS object before
trying to deliver the interrupt, and return with an error if not.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e48ba1cb

12 8月, 2016 2 次提交

KVM: Protect device ops->create and list_add with kvm->lock · a28ebea2

由 Christoffer Dall 提交于 8月 09, 2016

KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.

Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.

The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a28ebea2

KVM: PPC: Move xics_debugfs_init out of create · 023e9fdd

由 Christoffer Dall 提交于 8月 09, 2016

As we are about to hold the kvm->lock during the create operation on KVM
devices, we should move the call to xics_debugfs_init into its own
function, since holding a mutex over extended amounts of time might not
be a good idea.

Introduce an init operation on the kvm_device_ops struct which cannot
fail and call this, if configured, after the device has been created.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

023e9fdd

02 8月, 2016 1 次提交

KVM: PPC: Introduce KVM_CAP_PPC_HTM · 23528bb2

由 Sam Bobroff 提交于 7月 20, 2016

Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to
determine if a PowerPC KVM guest should use HTM (Hardware Transactional
Memory).

This will be used by QEMU to populate the pa-features bits in the
guest's device tree.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

23528bb2

28 7月, 2016 2 次提交

KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE · 93d17397

由 Paul Mackerras 提交于 6月 22, 2016

It turns out that if the guest does a H_CEDE while the CPU is in
a transactional state, and the H_CEDE does a nap, and the nap
loses the architected state of the CPU (which is is allowed to do),
then we lose the checkpointed state of the virtual CPU.  In addition,
the transactional-memory state recorded in the MSR gets reset back
to non-transactional, and when we try to return to the guest, we take
a TM bad thing type of program interrupt because we are trying to
transition from non-transactional to transactional with a hrfid
instruction, which is not permitted.

The result of the program interrupt occurring at that point is that
the host CPU will hang in an infinite loop with interrupts disabled.
Thus this is a denial of service vulnerability in the host which can
be triggered by any guest (and depending on the guest kernel, it can
potentially triggered by unprivileged userspace in the guest).

This vulnerability has been assigned the ID CVE-2016-5412.

To fix this, we save the TM state before napping and restore it
on exit from the nap, when handling a H_CEDE in real mode.  The
case where H_CEDE exits to host virtual mode is already OK (as are
other hcalls which exit to host virtual mode) because the exit
path saves the TM state.

Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

93d17397

KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures · f024ee09

由 Paul Mackerras 提交于 6月 22, 2016

This moves the transactional memory state save and restore sequences
out of the guest entry/exit paths into separate procedures.  This is
so that these sequences can be used in going into and out of nap
in a subsequent patch.

The only code changes here are (a) saving and restore LR on the
stack, since these new procedures get called with a bl instruction,
(b) explicitly saving r1 into the PACA instead of assuming that
HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
and redundant setting of MSR[TM] that should have been removed by
commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
support", 2013-09-24) but wasn't.

Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f024ee09

21 7月, 2016 2 次提交

powerpc/mm: Move hash table ops to a separate structure · 7025776e

由 Benjamin Herrenschmidt 提交于 7月 05, 2016

Moving probe_machine() to after mmu init will cause the ppc_md
fields relative to the hash table management to be overwritten.

Since we have essentially disconnected the machine type from
the hash backend ops, finish the job by moving them to a different
structure.

The only callback that didn't quite fix is update_partition_table
since this is not specific to hash, so I moved it to a standalone
variable for now. We can revisit later if needed.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fix ppc64e build failure in kexec]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7025776e

powerpc: Put exception configuration in a common place · d3cbff1b

由 Benjamin Herrenschmidt 提交于 7月 05, 2016

The various calls to establish exception endianness and AIL are
now done from a single point using already established CPU and FW
feature bits to decide what to do.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d3cbff1b

19 7月, 2016 1 次提交

Kbuild: arch: look for generated headers in obtree · 58ab5e0c

由 Arnd Bergmann 提交于 6月 15, 2016

There are very few files that need add an -I$(obj) gcc for the preprocessor
or the assembler. For C files, we add always these for both the objtree and
srctree, but for the other ones we require the Makefile to add them, and
Kbuild then adds it for both trees.

As a preparation for changing the meaning of the -I$(obj) directive to
only refer to the srctree, this changes the two instances in arch/x86 to use
an explictit $(objtree) prefix where needed, otherwise we won't find the
headers any more, as reported by the kbuild 0day builder.

arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NMichal Marek <mmarek@suse.com>

58ab5e0c

15 7月, 2016 1 次提交

powerpc/powernv: Rename reusable idle functions to hardware agnostic names · 5fa6b6bd

由 Shreyas B. Prabhu 提交于 7月 08, 2016

Functions like power7_wakeup_loss, power7_wakeup_noloss,
power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can
also be used by POWER9. Hence rename these functions hardware agnostic
names.
Suggested-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5fa6b6bd

14 7月, 2016 2 次提交

powerpc/kvm: Clarify __user annotations · f8750513

由 Daniel Axtens 提交于 7月 12, 2016

kvmppc_h_put_tce_indirect labels a u64 pointer as __user. It also
labelled the u64 where get_user puts the result as __user. This isn't
a pointer and so doesn't need to be labelled __user.

Split the u64 value definition onto a new line to make it clear that
it doesn't get the annotation.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f8750513

KVM: pass struct kvm to kvm_set_routing_entry · c63cf538

由 Radim Krčmář 提交于 7月 12, 2016

Arch-specific code will use it.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c63cf538

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功