提交 · ec257165082616841a354dd915801ed43e3553be · openeuler / raspberrypi-kernel

22 8月, 2015 5 次提交

KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165

由 Paul Mackerras 提交于 6月 24, 2015

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu->arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Tested-by: NLaurent Vivier <lvivier@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ec257165

KVM: PPC: add missing pt_regs initialization · 845ac985

由 Tudor Laurentiu 提交于 5月 18, 2015

On this switch branch the regs initialization
doesn't happen so add it.
This was found with the help of a static
code analysis tool.
Signed-off-by: NLaurentiu Tudor <Laurentiu.Tudor@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

845ac985

KVM: PPC: Fix warnings from sparse · 5358a963

由 Thomas Huth 提交于 5月 22, 2015

When compiling the KVM code for POWER with "make C=1", sparse
complains about functions missing proper prototypes and a 64-bit
constant missing the ULL prefix. Let's fix this by making the
functions static or by including the proper header with the
prototypes, and by appending a ULL prefix to the constant
PPC_MPPE_ADDRESS_MASK.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5358a963

KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig · 129fd423

由 Thomas Huth 提交于 5月 22, 2015

Since the PPC970 support has been removed from the kvm-hv kernel
module recently, we should also reflect this change in the help
text of the corresponding Kconfig option.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

129fd423

KVM: PPC: fix suspicious use of conditional operator · f5ffe330

由 Tudor Laurentiu 提交于 5月 25, 2015

This was signaled by a static code analysis tool.
Signed-off-by: NLaurentiu Tudor <Laurentiu.Tudor@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f5ffe330

07 6月, 2015 1 次提交

powerpc/kernel: Rename PACA_DSCR to PACA_DSCR_DEFAULT · 1db36525

由 Anshuman Khandual 提交于 5月 21, 2015

PACA_DSCR offset macro tracks dscr_default element in the paca
structure. Better change the name of this macro to match that of the
data element it tracks. Makes the code more readable.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1db36525

28 5月, 2015 1 次提交

KVM: add "new" argument to kvm_arch_commit_memory_region · f36f3f28

由 Paolo Bonzini 提交于 5月 18, 2015

This lets the function access the new memory slot without going through
kvm_memslots and id_to_memslot.  It will simplify the code when more
than one address space will be supported.

Unfortunately, the "const"ness of the new argument must be casted
away in two places.  Fixing KVM to accept const struct kvm_memory_slot
pointers would require modifications in pretty much all architectures,
and is left for later.
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f36f3f28

26 5月, 2015 2 次提交

KVM: const-ify uses of struct kvm_userspace_memory_region · 09170a49

由 Paolo Bonzini 提交于 5月 18, 2015

Architecture-specific helpers are not supposed to muck with
struct kvm_userspace_memory_region contents.  Add const to
enforce this.

In order to eliminate the only write in __kvm_set_memory_region,
the cleaning of deleted slots is pulled up from update_memslots
to __kvm_set_memory_region.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09170a49

KVM: use kvm_memslots whenever possible · 9f6b8029

由 Paolo Bonzini 提交于 5月 17, 2015

kvm_memslots provides lockdep checking.  Use it consistently instead of
explicit dereferencing of kvm->memslots.
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9f6b8029

10 5月, 2015 1 次提交

KVM: PPC: Book3S HV: Fix list traversal in error case · 17d48901

由 Paul Mackerras 提交于 4月 29, 2015

This fixes a regression introduced in commit 25fedfca, "KVM: PPC:
Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which
leads to a user-triggerable oops.

In the case where we try to run a vcore on a physical core that is
not in single-threaded mode, or the vcore has too many threads for
the physical core, we iterate the list of runnable vcpus to make
each one return an EBUSY error to userspace.  Since this involves
taking each vcpu off the runnable_threads list for the vcore, we
need to use list_for_each_entry_safe rather than list_for_each_entry
to traverse the list.  Otherwise the kernel will crash with an oops
message like this:

Unable to handle kernel paging request for data at address 0x000fff88
Faulting instruction address: 0xd00000001e635dc8
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=1024 NUMA PowerNV
...
CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G      D        3.18.0 #1
task: c00000274e507500 ti: c0000027d1924000 task.ti: c0000027d1924000
NIP: d00000001e635dc8 LR: d00000001e635df8 CTR: c00000000011ba50
REGS: c0000027d19275b0 TRAP: 0300   Tainted: G      D         (3.18.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 22002824  XER: 00000000
CFAR: c000000000008468 DAR: 00000000000fff88 DSISR: 40000000 SOFTE: 1
GPR00: d00000001e635df8 c0000027d1927830 d00000001e64c850 0000000000000001
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
GPR08: 0000000000200200 0000000000000000 0000000000000000 d00000001e63e588
GPR12: 0000000000002200 c000000007dbc800 c000000fc7800000 000000000000000a
GPR16: fffffffffffffffc c000000fd5439690 c000000fc7801c98 0000000000000001
GPR20: 0000000000000003 c0000027d1927aa8 c000000fd543b348 c000000fd543b350
GPR24: 0000000000000000 c000000fa57f0000 0000000000000030 0000000000000000
GPR28: fffffffffffffff0 c000000fd543b328 00000000000fe468 c000000fd543b300
NIP [d00000001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
LR [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
Call Trace:
[c0000027d1927830] [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] (unreliable)
[c0000027d1927a30] [d00000001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
[c0000027d1927b70] [d00000001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
[c0000027d1927ba0] [d00000001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[c0000027d1927be0] [d00000001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
[c0000027d1927d40] [c0000000002d6720] do_vfs_ioctl+0x490/0x780
[c0000027d1927de0] [c0000000002d6ae4] SyS_ioctl+0xd4/0xf0
[c0000027d1927e30] [c000000000009358] syscall_exit+0x0/0x98
Instruction dump:
60000000 60420000 387e1b30 38800003 38a00001 38c00000 480087d9 e8410018
ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc
---[ end trace 8cdf50251cca6680 ]---

Fixes: 25fedfcaSigned-off-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17d48901

07 5月, 2015 2 次提交

P
KVM: booke: use __kvm_guest_exit · e233d54d
由 Paolo Bonzini 提交于 4月 30, 2015
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
e233d54d

KVM: arm/mips/x86/power use __kvm_guest_{enter|exit} · ccf73aaf

由 Christian Borntraeger 提交于 4月 30, 2015

Use __kvm_guest_{enter|exit} instead of kvm_guest_{enter|exit}
where interrupts are disabled.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ccf73aaf

29 4月, 2015 1 次提交

powerpc/kvm: Fix SMP=n build error in book3s_xics.c · 433c5c20

由 Michael Ellerman 提交于 4月 28, 2015

Commit 34cb7954 "Convert ICS mutex lock to spin lock" added an
include of asm/spinlock.h, which does not work in the SMP=n case.

It should instead include linux/spinlock.h

Fixes: 34cb7954 ("KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock")
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

433c5c20

21 4月, 2015 20 次提交

KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61

由 Paul Mackerras 提交于 3月 28, 2015

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
  rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr:     1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry:     3060.1ns (212 - 65138, 953873 samples)
  rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
  rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

66feed61

KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C · eddb60fb

由 Paul Mackerras 提交于 3月 28, 2015

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c. It also moves the IPI sending code that was
in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

eddb60fb

KVM: PPC: Book3S HV: Streamline guest entry and exit · 6af27c84

由 Paul Mackerras 提交于 3月 28, 2015

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry:     avg 4537.3ns (222 - 48444, 1068878 samples)
  rm_exit:     avg 4787.6ns (152 - 165490, 1010717 samples)
  rm_intr:     avg 1673.6ns (12 - 341304, 3818691 samples)

and this after the change:

 rm_entry:     avg 3427.7ns (232 - 68150, 1118921 samples)
  rm_exit:     avg 4716.0ns (12 - 150720, 1119477 samples)
  rm_intr:     avg 1614.8ns (12 - 522436, 3850432 samples)

showing a substantial reduction in the time spent per guest entry in
the real-mode guest entry code, and smaller reductions in the real
mode guest exit and interrupt handling times.  (The test was to start
the guest and boot Fedora 20 big-endian to the login prompt.)
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6af27c84

KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da

由 Paul Mackerras 提交于 3月 28, 2015

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each. The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7d6c40da

KVM: PPC: Book3S HV: Use decrementer to wake napping threads · fd6d53b1

由 Paul Mackerras 提交于 3月 28, 2015

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

fd6d53b1

KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI · ccc07772

由 Paul Mackerras 提交于 3月 28, 2015

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ccc07772

KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd

由 Paul Mackerras 提交于 3月 28, 2015

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5d5b99cd

KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu · 25fedfca

由 Paul Mackerras 提交于 3月 28, 2015

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running. (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc->lock for us around the reschedule.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

25fedfca

KVM: PPC: Book3S HV: Minor cleanups · 1f09c3ed

由 Paul Mackerras 提交于 3月 28, 2015

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1f09c3ed

KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update · d911f0be

由 Paul Mackerras 提交于 3月 28, 2015

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

d911f0be

KVM: PPC: Book3S HV: Accumulate timing information for real-mode code · b6c295df

由 Paul Mackerras 提交于 3月 28, 2015

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b6c295df

KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT · e23a808b

由 Paul Mackerras 提交于 3月 28, 2015

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vmnnnn, where nnnn is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e23a808b

KVM: PPC: Book3S HV: Add ICP real mode counters · 6e0365b7

由 Suresh Warrier 提交于 3月 20, 2015

Add two counters to count how often we generate real-mode ICS resend
and reject events. The counters provide some performance statistics
that could be used in the future to consider if the real mode functions
need further optimizing. The counters are displayed as part of IPC and
ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.

Also added two counters that count (approximately) how many times we
don't find an ICP or ICS we're looking for. These are not currently
exposed through sysfs, but can be useful when debugging crashes.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6e0365b7

KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode · b0221556

由 Suresh Warrier 提交于 3月 20, 2015

Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
to switch to the host to complete the rest of hypercall function in
virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
functions to be runnable in hypervisor real mode, thus avoiding the need
to switch to the host to execute these functions in virtual mode. However,
the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
events - these events cannot be done in real mode and they will still need
a switch to host virtual mode.

There are sufficient differences between the real mode code and the
virtual mode code for the ICS/ICP resend and reject functions that
for now the code has been duplicated instead of sharing common code.
In the future, we can look at creating common functions.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b0221556

KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock · 34cb7954

由 Suresh Warrier 提交于 3月 20, 2015

Replaces the ICS mutex lock with a spin lock since we will be porting
these routines to real mode. Note that we need to disable interrupts
before we take the lock in anticipation of the fact that on the guest
side, we are running in the context of a hard irq and interrupts are
disabled (EE bit off) when the lock is acquired. Again, because we
will be acquiring the lock in hypervisor real mode, we need to use
an arch_spinlock_t instead of a normal spinlock here as we want to
avoid running any lockdep code (which may not be safe to execute in
real mode).
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

34cb7954

KVM: PPC: Book3S HV: Add guest->host real mode completion counters · 878610fe

由 Suresh E. Warrier 提交于 3月 20, 2015

Add counters to track number of times we switch from guest real mode
to host virtual mode during an interrupt-related hyper call because the
hypercall requires actions that cannot be completed in real mode. This
will help when making optimizations that reduce guest-host transitions.

It is safe to use an ordinary increment rather than an atomic operation
because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
only works on the ICP for the current VCPU.

The counters are displayed as part of IPC and ICP state provided by
/sys/debug/kernel/powerpc/kvm* for each VM.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

878610fe

KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte · a4bd6eb0

由 Aneesh Kumar K.V 提交于 3月 20, 2015

This adds helper routines for locking and unlocking HPTEs, and uses
them in the rest of the code.  We don't change any locking rules in
this patch.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a4bd6eb0

KVM: PPC: Book3S HV: Remove RMA-related variables from code · 31037eca

由 Aneesh Kumar K.V 提交于 3月 20, 2015

We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code.  Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

31037eca

KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. · e928e9cb

由 Michael Ellerman 提交于 3月 20, 2015

Some PowerNV systems include a hardware random-number generator.
This HWRNG is present on POWER7+ and POWER8 chips and is capable of
generating one 64-bit random number every microsecond. The random
numbers are produced by sampling a set of 64 unstable high-frequency
oscillators and are almost completely entropic.

PAPR defines an H_RANDOM hypercall which guests can use to obtain one
64-bit random sample from the HWRNG. This adds a real-mode
implementation of the H_RANDOM hypercall. This hypercall was
implemented in real mode because the latency of reading the HWRNG is
generally small compared to the latency of a guest exit and entry for
all the threads in the same virtual core.

Userspace can detect the presence of the HWRNG and the H_RANDOM
implementation by querying the KVM_CAP_PPC_HWRNG capability. The
H_RANDOM hypercall implementation will only be invoked when the guest
does an H_RANDOM hypercall if userspace first enables the in-kernel
H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e928e9cb

kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM · 99342cf8

由 David Gibson 提交于 2月 05, 2015

On POWER, storage caching is usually configured via the MMU - attributes
such as cache-inhibited are stored in the TLB and the hashed page table.

This makes correctly performing cache inhibited IO accesses awkward when
the MMU is turned off (real mode). Some CPU models provide special
registers to control the cache attributes of real mode load and stores but
this is not at all consistent. This is a problem in particular for SLOF,
the firmware used on KVM guests, which runs entirely in real mode, but
which needs to do IO to load the kernel.

To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
a logical address (aka guest physical address). SLOF uses these for IO.

However, because these are implemented within qemu, not the host kernel,
these bypass any IO devices emulated within KVM itself. The simplest way
to see this problem is to attempt to boot a KVM guest from a virtio-blk
device with iothread / dataplane enabled. The iothread code relies on an
in kernel implementation of the virtio queue notification, which is not
triggered by the IO hcalls, and so the guest will stall in SLOF unable to
load the guest OS.

This patch addresses this by providing in-kernel implementations of the
2 hypercalls, which correctly scan the KVM IO bus. Any access to an
address not handled by the KVM IO bus will cause a VM exit, hitting the
qemu implementation as before.

Note that a userspace change is also required, in order to enable these
new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
[agraf: fix compilation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

99342cf8

17 4月, 2015 5 次提交

powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error · a7e73e71

由 Shreyas B. Prabhu 提交于 4月 16, 2015

kvm_no_guest() calls power7_wakeup_loss() to put the thread into the
deepest supported idle state. power7_wakeup_loss() is defined in
arch/powerpc/kernel/idle_power7.S, which is compiled only when
PPC_P7_NAP=y.

And PPC_P7_NAP is selected when PPC_POWERNV=y.

Hence in cases where PPC_POWERNV=n and KVM_BOOK3S_64_HV=y we see the
following error:

  arch/powerpc/kvm/built-in.o: In function `kvm_no_guest':
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x42c): undefined reference to `power7_wakeup_loss'

Fix this by adding PPC_POWERNV as a dependency for KVM_BOOK3S_64_HV.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a7e73e71

powerpc/mm/thp: Return pte address if we find trans_splitting. · 7d6e7f7f

由 Aneesh Kumar K.V 提交于 3月 30, 2015

For THP that is marked trans splitting, we return the pte.
This require the callers to handle the pmd_trans_splitting scenario,
if they care. All the current callers are either looking at pfn or
write_ok, hence we don't need to update them.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7d6e7f7f

powerpc/mm/thp: Make page table walk safe against thp split/collapse · 691e95fd

由 Aneesh Kumar K.V 提交于 3月 30, 2015

We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

691e95fd

KVM: PPC: Remove page table walk helpers · dac56570

由 Aneesh Kumar K.V 提交于 3月 30, 2015

This patch remove helpers which we had used only once in the code.
Limiting page table walk variants help in ensuring that we won't
end up with code walking page table with wrong assumptions.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dac56570

KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer · 5e1d44ae

由 Aneesh Kumar K.V 提交于 3月 30, 2015

pte can get updated from other CPUs as part of multiple activities
like THP split, huge page collapse, unmap. We need to make sure we
don't reload the pte value again and again for different checks.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5e1d44ae

08 4月, 2015 1 次提交

kvm/ppc/mpic: drop unused IRQ_testbit · 19456060

由 Arseny Solokha 提交于 2月 24, 2015

Drop unused static procedure which doesn't have callers within its
translation unit. It had been already removed independently in QEMU[1]
from the OpenPIC implementation borrowed from the kernel.

[1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.htmlSigned-off-by: NArseny Solokha <asolokha@kb.kras.ru>
Cc: Alexander Graf <agraf@suse.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1424768706-23150-3-git-send-email-asolokha@kb.kras.ru>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

19456060

27 3月, 2015 1 次提交

KVM: move iodev.h from virt/kvm/ to include/kvm · af669ac6

由 Andre Przywara 提交于 3月 26, 2015

iodev.h contains definitions for the kvm_io_bus framework. This is
needed both by the generic KVM code in virt/kvm as well as by
architecture specific code under arch/. Putting the header file in
virt/kvm and using local includes in the architecture part seems at
least dodgy to me, so let's move the file into include/kvm, so that a
more natural "#include <kvm/iodev.h>" can be used by all of the code.
This also solves a problem later when using struct kvm_io_device
in arm_vgic.h.
Fixing up the FSF address in the GPL header and a wrong include path
on the way.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

af669ac6