提交 · c83b6d1594bfff8cdb95becb338b8fe9e3473c8d · openeuler / Kernel

23 9月, 2016 3 次提交

KVM: nVMX: Fix reload apic access page warning · c83b6d15

由 Wanpeng Li 提交于 9月 06, 2016

WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
 dump_stack+0x99/0xd0
 __warn+0xd1/0xf0
 warn_slowpath_fmt+0x4f/0x60
 ? prepare_to_swait+0x39/0xa0
 ? prepare_to_swait+0x39/0xa0
 __might_sleep+0x7e/0x80
 __gfn_to_pfn_memslot+0x156/0x480 [kvm]
 gfn_to_pfn+0x2a/0x30 [kvm]
 gfn_to_page+0xe/0x20 [kvm]
 kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
 nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
 ? _raw_spin_unlock_irqrestore+0x36/0x80
 vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
 kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
 kvm_vcpu_check_block+0x12/0x60 [kvm]
 kvm_vcpu_block+0x94/0x4c0 [kvm]
 kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
 ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
 kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]

===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc5+ #47 Not tainted
-------------------------------
./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/4230:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]

stack backtrace:
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
 dump_stack+0x99/0xd0
 lockdep_rcu_suspicious+0xe7/0x120
 gfn_to_memslot+0x12a/0x140 [kvm]
 gfn_to_pfn+0x12/0x30 [kvm]
 gfn_to_page+0xe/0x20 [kvm]
 kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
 nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
 ? _raw_spin_unlock_irqrestore+0x36/0x80
 vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
 kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
 kvm_vcpu_check_block+0x12/0x60 [kvm]
 kvm_vcpu_block+0x94/0x4c0 [kvm]
 kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
 ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
 kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
 ? __fget+0xfd/0x210
 ? __lock_is_held+0x54/0x70
 do_vfs_ioctl+0x96/0x6a0
 ? __fget+0x11c/0x210
 ? __fget+0x5/0x210
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x81/0x220
 entry_SYSCALL64_slow_path+0x25/0x25

These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat

The nested preemption timer is based on hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the new check_nested_events
hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
both can be in nested preemption timer evaluation path which results in
the warning above.

This patch fix it by leveraging request bit to async reload APIC access
page before vmentry in order to avoid to reload directly during the nested
preemption timer evaluation, it is safe since the vmcs01 is loaded and
current is nested vmexit.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c83b6d15

kvmconfig: add virtio-gpu to config fragment · ce836c29

由 Rob Herring 提交于 9月 08, 2016

virtio-gpu is used for VMs, so add it to the kvm config.
Signed-off-by: NRob Herring <robh@kernel.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvm@vger.kernel.org
[expanded "frag" to "fragment" in summary]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

ce836c29

config: move x86 kvm_guest.config to a common location · bd6c9222

由 Rob Herring 提交于 9月 08, 2016

kvm_guest.config is useful for KVM guests on other arches, and nothing
in it appears to be x86 specific, so just move the whole file. Kbuild
will find it in either location.
Signed-off-by: NRob Herring <robh@kernel.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvm@vger.kernel.org
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

bd6c9222

20 9月, 2016 5 次提交

kvm: svm: fix unsigned compare less than zero comparison · adad0d02

由 Colin Ian King 提交于 9月 19, 2016

vm_data->avic_vm_id is a u32, so the check for a error
return (less than zero) such as -EAGAIN from
avic_get_next_vm_id currently has no effect whatsoever.
Fix this by using a temporary int for the comparison
and assign vm_data->avic_vm_id to this. I used an explicit
u32 cast in the assignment to show why vm_data->avic_vm_id
cannot be used in the assign/compare steps.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

adad0d02

KVM: x86: Hyper-V tsc page setup · 095cf55d

由 Paolo Bonzini 提交于 2月 08, 2016

Lately tsc page was implemented but filled with empty
values. This patch setup tsc page scale and offset based
on vcpu tsc, tsc_khz and  HV_X64_MSR_TIME_REF_COUNT value.

The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr
reads count to zero which potentially improves performance.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NPeter Hornyack <peterhornyack@google.com>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
[Computation of TSC page parameters rewritten to use the Linux timekeeper
 parameters. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

095cf55d

KVM: x86: introduce get_kvmclock_ns · 108b249c

由 Paolo Bonzini 提交于 9月 01, 2016

Introduce a function that reads the exact nanoseconds value that is
provided to the guest in kvmclock.  This crystallizes the notion of
kvmclock as a thin veneer over a stable TSC, that the guest will
(hopefully) convert with NTP.  In other words, kvmclock is *not* a
paravirtualized host-to-guest NTP.

Drop the get_kernel_ns() function, that was used both to get the base
value of the master clock and to get the current value of kvmclock.
The former use is replaced by ktime_get_boot_ns(), the latter is
the purpose of get_kernel_ns().

This also allows KVM to provide a Hyper-V time reference counter that
is synchronized with the time that is computed from the TSC page.
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

108b249c

KVM: x86: initialize kvmclock_offset · 67198ac3

由 Paolo Bonzini 提交于 9月 01, 2016

Make the guest's kvmclock count up from zero, not from the host boot
time.  The guest cannot rely on that anyway because it changes on
migration, the numbers are easier on the eye and finally it matches the
desired semantics of the Hyper-V time reference counter.
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67198ac3

KVM: x86: always fill in vcpu->arch.hv_clock · 0d6dd2ff

由 Paolo Bonzini 提交于 9月 01, 2016

We will use it in the next patches for KVM_GET_CLOCK and as a basis for the
contents of the Hyper-V TSC page.  Get the values from the Linux
timekeeper even if kvmclock is not enabled.
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d6dd2ff

16 9月, 2016 6 次提交

kvm: x86: export TSC information to user-space · 4f5758fc

由 Luiz Capitulino 提交于 9月 16, 2016

This commit exports the following information to
user-space via the newly created per-vcpu debugfs
directory:

 - TSC offset (as a signed number)
 - TSC scaling ratio
 - TSC scaling ratio fractinal bits

The original intention of this commit was to
export only the TSC offset, but the TSC scaling
information is exported for completeness.

We need to retrieve the TSC offset from user-space
in order to support the merging of host and guest
traces in trace-cmd. Today, we use the kvm_write_tsc_offset
tracepoint, but it has a number of problems (mainly,
it requires a running VM to be rebooted, ftrace setup,
and also tracepoints are not supposed to be ABIs).

The merging of host and guest traces is explained
in more detail in this thread:

 [Qemu-devel] [RFC] host and guest kernel trace merging
 https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html

This commit creates the following files in debugfs:

/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-offset
/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio
/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio-frac-bits

The last two are only created if TSC scaling is supported.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f5758fc

kvm: create per-vcpu dirs in debugfs · 45b5939e

由 Luiz Capitulino 提交于 9月 16, 2016

This commit adds the ability for archs to export
per-vcpu information via a new per-vcpu dir in
the VM's debugfs directory.

If kvm_arch_has_vcpu_debugfs() returns true, then KVM
will create a vcpu dir for each vCPU in the VM's
debugfs directory. Then kvm_arch_create_vcpu_debugfs()
is responsible for populating each vcpu directory
with arch specific entries.

The per-vcpu path in debugfs will look like:

/sys/kernel/debug/kvm/29162-10/vcpu0
/sys/kernel/debug/kvm/29162-10/vcpu1

This is all arch specific for now because the only
user of this interface (x86) wants to export x86-specific
per-vcpu information to user-space.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45b5939e

kvm: add stubs for arch specific debugfs support · 235539b4

由 Luiz Capitulino 提交于 9月 07, 2016

Two stubs are added:

 o kvm_arch_has_vcpu_debugfs(): must return true if the arch
   supports creating debugfs entries in the vcpu debugfs dir
   (which will be implemented by the next commit)

 o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
   entries in the vcpu debugfs dir

For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

235539b4

kvm: kvm_destroy_vm_debugfs(): check debugfs_stat_data pointer · 9d5a1dce

由 Luiz Capitulino 提交于 9月 07, 2016

This make it possible to call kvm_destroy_vm_debugfs() from
kvm_create_vm_debugfs() in error conditions.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d5a1dce

kvm: x86: drop read_tsc_offset() · 3e3f5026

由 Luiz Capitulino 提交于 9月 07, 2016

The TSC offset can now be read directly from struct kvm_arch_vcpu.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3e3f5026

kvm: x86: add tsc_offset field to struct kvm_vcpu_arch · a545ab6a

由 Luiz Capitulino 提交于 9月 07, 2016

A future commit will want to easily read a vCPU's TSC offset,
so we store it in struct kvm_arch_vcpu_arch for easy access.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a545ab6a

13 9月, 2016 2 次提交

Merge branch 'kvm-ppc-next' of... · ad53e35a

由 Paolo Bonzini 提交于 9月 13, 2016

Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

Paul Mackerras writes:

    The highlights are:

    * Reduced latency for interrupts from PCI pass-through devices, from
      Suresh Warrier and me.
    * Halt-polling implementation from Suraj Jitindar Singh.
    * 64-bit VCPU statistics, also from Suraj.
    * Various other minor fixes and improvements.

ad53e35a

KVM: PPC: e500: Rename jump labels in kvmppc_e500_tlb_init() · aad9e5ba

由 Markus Elfring 提交于 9月 12, 2016

Adjust jump labels according to the current Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

aad9e5ba

12 9月, 2016 12 次提交

KVM: PPC: e500: Use kmalloc_array() in kvmppc_e500_tlb_init() · 90235dc1

由 Markus Elfring 提交于 8月 28, 2016

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

90235dc1

KVM: PPC: e500: Replace kzalloc() calls by kcalloc() in two functions · b0ac477b

由 Markus Elfring 提交于 8月 28, 2016

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kcalloc".
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>

  This issue was detected also by using the Coccinelle software.

* Replace the specification of data structures by pointer dereferences
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

b0ac477b

KVM: PPC: e500: Delete an unnecessary initialisation in kvm_vcpu_ioctl_config_tlb() · cfb60813

由 Markus Elfring 提交于 8月 28, 2016

The local variable "g2h_bitmap" will be set to an appropriate value
a bit later. Thus omit the explicit initialisation at the beginning.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

cfb60813

KVM: PPC: e500: Less function calls in kvm_vcpu_ioctl_config_tlb() after error detection · 46d4e747

由 Markus Elfring 提交于 8月 28, 2016

The kfree() function was called in two cases by the
kvm_vcpu_ioctl_config_tlb() function during error handling
even if the passed data structure element contained a null pointer.

* Split a condition check for memory allocation failures.

* Adjust jump targets according to the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

46d4e747

KVM: PPC: e500: Use kmalloc_array() in kvm_vcpu_ioctl_config_tlb() · f3c0ce86

由 Markus Elfring 提交于 8月 28, 2016

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f3c0ce86

KVM: PPC: Book3S HV: Counters for passthrough IRQ stats · 65e7026a

由 Suresh Warrier 提交于 8月 19, 2016

Add VCPU stat counters to track affinity for passthrough
interrupts.

pthru_all: Counts all passthrough interrupts whose IRQ mappings are
           in the kvmppc_passthru_irq_map structure.
pthru_host: Counts all cached passthrough interrupts that were injected
	    from the host through kvm_set_irq (i.e. not handled in
	    real mode).
pthru_bad_aff: Counts how many cached passthrough interrupts have
               bad affinity (receiving CPU is not running VCPU that is
	       the target of the virtual interrupt in the guest).
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

65e7026a

KVM: PPC: Book3S HV: Set server for passed-through interrupts · 5d375199

由 Paul Mackerras 提交于 8月 19, 2016

When a guest has a PCI pass-through device with an interrupt, it
will direct the interrupt to a particular guest VCPU.  In fact the
physical interrupt might arrive on any CPU, and then get
delivered to the target VCPU in the emulated XICS (guest interrupt
controller), and eventually delivered to the target VCPU.

Now that we have code to handle device interrupts in real mode
without exiting to the host kernel, there is an advantage to having
the device interrupt arrive on the same sub(core) as the target
VCPU is running on.  In this situation, the interrupt can be
delivered to the target VCPU without any exit to the host kernel
(using a hypervisor doorbell interrupt between threads if
necessary).

This patch aims to get passed-through device interrupts arriving
on the correct core by setting the interrupt server in the real
hardware XICS for the interrupt to the first thread in the (sub)core
where its target VCPU is running.  We do this in the real-mode H_EOI
code because the H_EOI handler already needs to look at the
emulated ICS state for the interrupt (whereas the H_XIRR handler
doesn't), and we know we are running in the target VCPU context
at that point.

We set the server CPU in hardware using an OPAL call, regardless of
what the IRQ affinity mask for the interrupt says, and without
updating the affinity mask.  This amounts to saying that when an
interrupt is passed through to a guest, as a matter of policy we
allow the guest's affinity for the interrupt to override the host's.

This is inspired by an earlier patch from Suresh Warrier, although
none of this code came from that earlier patch.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

5d375199

KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode · 366274f5

由 Suresh Warrier 提交于 8月 19, 2016

When a passthrough IRQ is handled completely within KVM real
mode code, it has to also update the IRQ stats since this
does not go through the generic IRQ handling code.

However, the per CPU kstat_irqs field is an allocated (not static)
field and so cannot be directly accessed in real mode safely.

The function this_cpu_inc_rm() is introduced to safely increment
per CPU fields (currently coded for unsigned integers only) that
are allocated and could thus be vmalloced also.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

366274f5

KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass · 644abbb2

由 Suresh Warrier 提交于 8月 19, 2016

Add a  module parameter kvm_irq_bypass for kvm_hv.ko to
disable IRQ bypass for passthrough interrupts. The default
value of this tunable is 1 - that is enable the feature.

Since the tunable is used by built-in kernel code, we use
the module_param_cb macro to achieve this.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

644abbb2

KVM: PPC: Book3S HV: Dump irqmap in debugfs · af893c7d

由 Suresh Warrier 提交于 8月 19, 2016

Dump the passthrough irqmap structure associated with a
guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

af893c7d

KVM: PPC: Book3S HV: Complete passthrough interrupt in host · f7af5209

由 Suresh Warrier 提交于 8月 19, 2016

In existing real mode ICP code, when updating the virtual ICP
state, if there is a required action that cannot be completely
handled in real mode, as for instance, a VCPU needs to be woken
up, flags are set in the ICP to indicate the required action.
This is checked when returning from hypercalls to decide whether
the call needs switch back to the host where the action can be
performed in virtual mode. Note that if h_ipi_redirect is enabled,
real mode code will first try to message a free host CPU to
complete this job instead of returning the host to do it ourselves.

Currently, the real mode PCI passthrough interrupt handling code
checks if any of these flags are set and simply returns to the host.
This is not good enough as the trap value (0x500) is treated as an
external interrupt by the host code. It is only when the trap value
is a hypercall that the host code searches for and acts on unfinished
work by calling kvmppc_xics_rm_complete.

This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
which is returned by KVM if there is unfinished business to be
completed in host virtual mode after handling a PCI passthrough
interrupt. The host checks for this special interrupt condition
and calls into the kvmppc_xics_rm_complete, which is made an
exported function for this reason.

[paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f7af5209

KVM: PPC: Book3S HV: Handle passthrough interrupts in guest · e3c13e56

由 Suresh Warrier 提交于 8月 19, 2016

Currently, KVM switches back to the host to handle any external
interrupt (when the interrupt is received while running in the
guest). This patch updates real-mode KVM to check if an interrupt
is generated by a passthrough adapter that is owned by this guest.
If so, the real mode KVM will directly inject the corresponding
virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
in hardware. In short, the interrupt is handled entirely in real
mode in the guest context without switching back to the host.

In some rare cases, the interrupt cannot be completely handled in
real mode, for instance, a VCPU that is sleeping needs to be woken
up. In this case, KVM simply switches back to the host with trap
reason set to 0x500. This works, but it is clearly not very efficient.
A following patch will distinguish this case and handle it
correctly in the host. Note that we can use the existing
check_too_hard() routine even though we are not in a hypercall to
determine if there is unfinished business that needs to be
completed in host virtual mode.

The patch assumes that the mapping between hardware interrupt IRQ
and virtual IRQ to be injected to the guest already exists for the
PCI passthrough interrupts that need to be handled in real mode.
If the mapping does not exist, KVM falls back to the default
existing behavior.

The KVM real mode code reads mappings from the mapped array in the
passthrough IRQ map without taking any lock.  We carefully order the
loads and stores of the fields in the kvmppc_irq_map data structure
using memory barriers to avoid an inconsistent mapping being seen by
the reader. Thus, although it is possible to miss a map entry, it is
not possible to read a stale value.

[paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
 pulled out powernv eoi change into a separate patch, made
 kvmppc_read_intr get the vcpu from the paca rather than being
 passed in, rewrote the logic at the end of kvmppc_read_intr to
 avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
 since we were always restoring SRR0/1 anyway, get rid of the cached
 array (just use the mapped array), removed the kick_all_cpus_sync()
 call, clear saved_xirr PACA field when we handle the interrupt in
 real mode, fix compilation with CONFIG_KVM_XICS=n.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e3c13e56

09 9月, 2016 9 次提交

KVM: PPC: Book3S HV: Enable IRQ bypass · c57875f5

由 Suresh Warrier 提交于 8月 19, 2016

Add the irq_bypass_add_producer and irq_bypass_del_producer
functions. These functions get called whenever a GSI is being
defined for a guest. They create/remove the mapping between
host real IRQ numbers and the guest GSI.

Add the following helper functions to manage the
passthrough IRQ map.

kvmppc_set_passthru_irq()
  Creates a mapping in the passthrough IRQ map that maps a host
  IRQ to a guest GSI. It allocates the structure (one per guest VM)
  the first time it is called.

kvmppc_clr_passthru_irq()
  Removes the passthrough IRQ map entry given a guest GSI.
  The passthrough IRQ map structure is not freed even when the
  number of mapped entries goes to zero. It is only freed when
  the VM is destroyed.

[paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than
 requiring all passed-through interrupts to use the same irq_chip;
 changed deletion so it zeroes out the r_hwirq field rather than
 copying the last entry down and decrementing the number of entries.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

c57875f5

KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap · 8daaafc8

由 Suresh Warrier 提交于 8月 19, 2016

This patch introduces an IRQ mapping structure, the
kvmppc_passthru_irqmap structure that is to be used
to map the real hardware IRQ in the host with the virtual
hardware IRQ (gsi) that is injected into a guest by KVM for
passthrough adapters.

Currently, we assume a separate IRQ mapping structure for
each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
containing all defined real<->virtual IRQs.

[paulus@ozlabs.org - removed irq_chip field from struct
 kvmppc_passthru_irqmap; changed parameter for
 kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
 kvm *, removed small cached array.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

8daaafc8

KVM: PPC: select IRQ_BYPASS_MANAGER · 9576730d

由 Suresh Warrier 提交于 8月 19, 2016

Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
Add the PPC producer functions for add and del producer.

[paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
 so booke compiles; added kvm_arch_has_irq_bypass implementation.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

9576730d

KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function · 37f55d30

由 Suresh Warrier 提交于 8月 19, 2016

Modify kvmppc_read_intr to make it a C function.  Because it is called
from kvmppc_check_wake_reason, any of the assembler code that calls
either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume
that the volatile registers might have been modified.

This also adds in the optimization of clearing saved_xirr in the case
where we completely handle and EOI an IPI.  Without this, the next
device interrupt will require two trips through the host interrupt
handling code.

[paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame
 when it is calling kvmppc_read_intr, which means we can set r12 to
 the trap number (0x500) after the call to kvmppc_read_intr, instead
 of using r31.  Also moved the deliver_guest_interrupt label so as to
 restore XER and CTR, plus other minor tweaks.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

37f55d30

Merge branch 'kvm-ppc-infrastructure' into kvm-ppc-next · 99212c86

由 Paul Mackerras 提交于 9月 09, 2016

This merges the topic branch 'kvm-ppc-infrastructure' into kvm-ppc-next
so that I can then apply further patches that need the changes in the
kvm-ppc-infrastructure branch.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

99212c86

powerpc: move hmi.c to arch/powerpc/kvm/ · 3f257774

由 Paolo Bonzini 提交于 8月 11, 2016

hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use.  So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.

Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

3f257774

powerpc/powernv: Provide facilities for EOI, usable from real mode · 4ee11c1a

由 Suresh Warrier 提交于 8月 19, 2016

This adds a new function pnv_opal_pci_msi_eoi() which does the part of
end-of-interrupt (EOI) handling of an MSI which involves doing an
OPAL call.  This function can be called in real mode.  This doesn't
just export pnv_ioda2_msi_eoi() because that does a call to
icp_native_eoi(), which does not work in real mode.

This also adds a function, is_pnv_opal_msi(), which KVM can call to
check whether an interrupt is one for which we should be calling
pnv_opal_pci_msi_eoi() when we need to do an EOI.

[paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
 from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
 interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

4ee11c1a

powerpc: Add simple cache inhibited MMIO accessors · 07b1fdf5

由 Suresh Warrier 提交于 8月 19, 2016

Add simple cache inhibited accessors for memory mapped I/O.
Unlike the accessors built from the DEF_MMIO_* macros, these
don't include any hardware memory barriers, callers need to
manage memory barriers on their own. These can only be called
in hypervisor real mode.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
[paulus@ozlabs.org - added line to comment]
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

07b1fdf5

powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c

由 Paul Mackerras 提交于 9月 02, 2016

This replaces a 2-D search through an array with a simple 8-bit table
lookup for determining the actual and/or base page size for a HPT entry.

The encoding in the second doubleword of the HPTE is designed to encode
the actual and base page sizes without using any more bits than would be
needed for a 4k page number, by using between 1 and 8 low-order bits of
the RPN (real page number) field to encode the page sizes. A single
"large page" bit in the first doubleword indicates that these low-order
bits are to be interpreted like this.

We can determine the page sizes by using the low-order 8 bits of the RPN
to look up a 256-entry table. For actual page sizes less than 1MB, some
of the upper bits of these 8 bits are going to be real address bits, but
we can cope with that by replicating the entries for those smaller page
sizes.

While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
functions from a KVM-specific header to a header for 64-bit HPT systems,
since this computation doesn't have anything specifically to do with KVM.
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

0eeede0c

08 9月, 2016 3 次提交

Merge tag 'kvm-s390-next-4.9-1' of... · 6f90f1d1

由 Paolo Bonzini 提交于 9月 08, 2016

Merge tag 'kvm-s390-next-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: features and fixes for 4.9

- lazy enablement of runtime instrumentation
- up to 255 CPUs for nested guests
- rework of machine check deliver
- cleanups/fixes

6f90f1d1

C

Merge remote-tracking branch 'kvms390/s390forkvm' into kvms390next · b0eb91ae
由 Christian Borntraeger 提交于 9月 08, 2016

b0eb91ae

KVM: s390: Use memdup_user() rather than duplicating code · 0624a8eb

由 Markus Elfring 提交于 8月 24, 2016

* Reuse existing functionality from memdup_user() instead of keeping
  duplicate source code.

  This issue was detected by using the Coccinelle software.

* Return directly if this copy operation failed.
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Message-Id: <c86f7520-885e-2829-ae9c-b81caa898e84@users.sourceforge.net>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

0624a8eb

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功