提交 · 93e602490c1da83162a8b6ba86b4b48a7a0f0c9e · openeuler / Kernel

05 3月, 2012 40 次提交

KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests · 93e60249

由 Paul Mackerras 提交于 12月 12, 2011

This adds two new functions, kvmppc_pin_guest_page() and
kvmppc_unpin_guest_page(), and uses them to pin the guest pages where
the guest has registered areas of memory for the hypervisor to update,
(i.e. the per-cpu virtual processor areas, SLB shadow buffers and
dispatch trace logs) and then unpin them when they are no longer
required.

Although it is not strictly necessary to pin the pages at this point,
since all guest pages are already pinned, later commits in this series
will mean that guest pages aren't all pinned.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93e60249

KVM: PPC: Keep page physical addresses in per-slot arrays · b2b2f165

由 Paul Mackerras 提交于 12月 12, 2011

This allocates an array for each memory slot that is added to store
the physical addresses of the pages in the slot.  This array is
vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr().
This allows us to remove the ram_pginfo field from the kvm_arch
struct, and removes the 64GB guest RAM limit that we had.

We use the low-order bits of the array entries to store a flag
indicating that we have done get_page on the corresponding page,
and therefore need to call put_page when we are finished with the
page.  Currently this is set for all pages except those in our
special RMO regions.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b2b2f165

KVM: PPC: Keep a record of HV guest view of hashed page table entries · 8936dda4

由 Paul Mackerras 提交于 12月 12, 2011

This adds an array that parallels the guest hashed page table (HPT),
that is, it has one entry per HPTE, used to store the guest's view
of the second doubleword of the corresponding HPTE.  The first
doubleword in the HPTE is the same as the guest's idea of it, so we
don't need to store a copy, but the second doubleword in the HPTE has
the real page number rather than the guest's logical page number.
This allows us to remove the back_translate() and reverse_xlate()
functions.

This "reverse mapping" array is vmalloc'd, meaning that to access it
in real mode we have to walk the kernel's page tables explicitly.
That is done by the new real_vmalloc_addr() function.  (In fact this
returns an address in the linear mapping, so the result is usable
both in real mode and in virtual mode.)

There are also some minor cleanups here: moving the definitions of
HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8936dda4

KVM: PPC: Make wakeups work again for Book3S HV guests · 4e72dbe1

由 Paul Mackerras 提交于 12月 12, 2011

When commit f43fdc15fa ("KVM: PPC: booke: Improve timer register
emulation") factored out some code in arch/powerpc/kvm/powerpc.c
into a new helper function, kvm_vcpu_kick(), an error crept in
which causes Book3s HV guest vcpus to stall.  This fixes it.
On POWER7 machines, guest vcpus are grouped together into virtual
CPU cores that share a single waitqueue, so it's important to use
vcpu->arch.wqp rather than &vcpu->wq.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4e72dbe1

KVM: PPC: Avoid patching paravirt template code · befdc0a6

由 Liu Yu-B13201 提交于 12月 01, 2011

Currently we patch the whole code include paravirt template code.
This isn't safe for scratch area and has impact to performance.
Signed-off-by: NLiu Yu <yu.liu@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

befdc0a6

KVM: PPC: e500: use hardware hint when loading TLB0 entries · 57013524

由 Scott Wood 提交于 11月 29, 2011

The hardware maintains a per-set next victim hint.  Using this
reduces conflicts, especially on e500v2 where a single guest
TLB entry is mapped to two shadow TLB entries (user and kernel).
We want those two entries to go to different TLB ways.

sesel is now only used for TLB1.
Reported-by: NLiu Yu <yu.liu@freescale.com>
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

57013524

KVM: PPC: e500: Fix TLBnCFG in KVM_CONFIG_TLB · 7b11dc99

由 Scott Wood 提交于 11月 28, 2011

The associativity, not just total size, can differ from the host
hardware.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7b11dc99

KVM: PPC: Book3S: PR: Fix signal check race · e371f713

由 Alexander Graf 提交于 12月 19, 2011

As Scott put it:

> If we get a signal after the check, we want to be sure that we don't
> receive the reschedule IPI until after we're in the guest, so that it
> will cause another signal check.

we need to have interrupts disabled from the point we do signal_check()
all the way until we actually enter the guest.

This patch fixes potential signal loss races.
Reported-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e371f713

KVM: PPC: align vcpu_kick with x86 · ae21216b

由 Alexander Graf 提交于 12月 09, 2011

Our vcpu kick implementation differs a bit from x86 which resulted in us not
disabling preemption during the kick. Get it a bit closer to what x86 does.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ae21216b

KVM: PPC: Use get/set for to_svcpu to help preemption · 468a12c2

由 Alexander Graf 提交于 12月 09, 2011

When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were
doing a few things wrong, most notably access to PACA fields without making
sure that the pointers stay stable accross the access (preempt_disable()).

This patch moves to_svcpu towards a get/put model which allows us to disable
preemption while accessing the shadow vcpu fields in the PACA. That way we
can run preemptible and everyone's happy!
Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

468a12c2

KVM: PPC: Book3s: PR: No irq_disable in vcpu_run · d33ad328

由 Alexander Graf 提交于 12月 09, 2011

Somewhere during merges we ended up from

  local_irq_enable()
  foo();
  local_irq_disable()

to always keeping irqs enabled during that part. However, we now
have the following code:

  foo();
  local_irq_disable()

which disables interrupts without the surrounding code enabling them
again! So let's remove that disable and be happy.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d33ad328

KVM: PPC: Book3s: PR: Disable preemption in vcpu_run · 7d82714d

由 Alexander Graf 提交于 12月 09, 2011

When entering the guest, we want to make sure we're not getting preempted
away, so let's disable preemption on entry, but enable it again while handling
guest exits.
Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7d82714d

KVM: PPC: booke: Improve timer register emulation · dfd4d47e

由 Scott Wood 提交于 11月 17, 2011

Decrementers are now properly driven by TCR/TSR, and the guest
has full read/write access to these registers.

The decrementer keeps ticking (and setting the TSR bit) regardless of
whether the interrupts are enabled with TCR.

The decrementer stops at zero, rather than going negative.

Decrementers (and FITs, once implemented) are delivered as
level-triggered interrupts -- dequeued when the TSR bit is cleared, not
on delivery.
Signed-off-by: NLiu Yu <yu.liu@freescale.com>
[scottwood@freescale.com: significant changes]
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dfd4d47e

KVM: PPC: Paravirtualize SPRG4-7, ESR, PIR, MASn · b5904972

由 Scott Wood 提交于 11月 08, 2011

This allows additional registers to be accessed by the guest
in PR-mode KVM without trapping.

SPRG4-7 are readable from userspace.  On booke, KVM will sync
these registers when it enters the guest, so that accesses from
guest userspace will work.  The guest kernel, OTOH, must consistently
use either the real registers or the shared area between exits.  This
also applies to the already-paravirted SPRG3.

On non-booke, it's not clear to what extent SPRG4-7 are supported
(they're not architected for book3s, but exist on at least some classic
chips).  They are copied in the get/set regs ioctls, but I do not see any
non-booke emulation.  I also do not see any syncing with real registers
(in PR-mode) including the user-readable SPRG3.  This patch should not
make that situation any worse.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b5904972

KVM: PPC: booke: Paravirtualize wrtee · 940b45ec

由 Scott Wood 提交于 11月 08, 2011

Also fix wrteei 1 paravirt to check for a pending interrupt.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

940b45ec

KVM: PPC: booke: Fix int_pending calculation for MSR[EE] paravirt · 29ac26ef

由 Scott Wood 提交于 11月 08, 2011

int_pending was only being lowered if a bit in pending_exceptions
was cleared during exception delivery -- but for interrupts, we clear
it during IACK/TSR emulation.  This caused paravirt for enabling
MSR[EE] to be ineffective.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

29ac26ef

KVM: PPC: booke: Check for MSR[WE] in prepare_to_enter · c59a6a3e

由 Scott Wood 提交于 11月 08, 2011

This prevents us from inappropriately blocking in a KVM_SET_REGS
ioctl -- the MSR[WE] will take effect when the guest is next entered.

It also causes SRR1[WE] to be set when we enter the guest's interrupt
handler, which is what e500 hardware is documented to do.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c59a6a3e

KVM: PPC: Move prepare_to_enter call site into subarch code · 25051b5a

由 Scott Wood 提交于 11月 08, 2011

This function should be called with interrupts disabled, to avoid
a race where an exception is delivered after we check, but the
resched kick is received before we disable interrupts (and thus doesn't
actually trigger the exit code that would recheck exceptions).

booke already does this properly in the lightweight exit case, but
not on initial entry.

For now, move the call of prepare_to_enter into subarch-specific code so
that booke can do the right thing here.  Ideally book3s would do the same
thing, but I'm having a hard time seeing where it does any interrupt
disabling of this sort (plus it has several additional call sites), so
I'm deferring the book3s fix to someone more familiar with that code.
book3s behavior should be unchanged by this patch.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

25051b5a

KVM: PPC: Rename deliver_interrupts to prepare_to_enter · 7e28e60e

由 Scott Wood 提交于 11月 08, 2011

This function also updates paravirt int_pending, so rename it
to be more obvious that this is a collection of checks run prior
to (re)entering a guest.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7e28e60e

KVM: PPC: booke: check for signals in kvmppc_vcpu_run · 1d1ef222

由 Scott Wood 提交于 11月 08, 2011

Currently we check prior to returning from a lightweight exit,
but not prior to initial entry.

book3s already does a similar test.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1d1ef222

KVM: PPC: booke: Do Not start decrementer when SPRN_DEC set 0 · 7401f626

由 Bharat Bhushan 提交于 10月 31, 2011

As per specification the decrementer interrupt not happen when DEC is written
with 0. Also when DEC is zero, no decrementer running. So we should not start
hrtimer for decrementer when DEC = 0.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7401f626

KVM: PPC: Fix DEC truncation for greater than 0xffff_ffff/1000 · dc2babfe

由 Bharat Bhushan 提交于 10月 19, 2011

kvmppc_emulate_dec() uses dec_nsec of type unsigned long and does below calculation:

        dec_nsec = vcpu->arch.dec;
        dec_nsec *= 1000;
This will truncate if DEC value "vcpu->arch.dec" is greater than 0xffff_ffff/1000.
For example : For tb_ticks_per_usec = 4a, we can not set decrementer more than ~58ms.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: NLiu Yu <yu.liu@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dc2babfe

PPC: Fix race in mtmsr paravirt implementation · f9208427

由 Bharat Bhushan 提交于 10月 13, 2011

The current implementation of mtmsr and mtmsrd are racy in that it does:

  * check (int_pending == 0)
  ---> host sets int_pending = 1 <---
  * write shared page
  * done

while instead we should check for int_pending after the shared page is written.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f9208427

KVM: PPC: E500: Support hugetlbfs · 95325e6b

由 Alexander Graf 提交于 9月 20, 2011

With hugetlbfs support emerging on e500, we should also support KVM
backing its guest memory by it.

This patch adds support for hugetlbfs into the e500 shadow mmu code.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

95325e6b

KVM: PPC: e500: Don't hardcode PIR=0 · 841741f2

由 Scott Wood 提交于 9月 02, 2011

The hardcoded behavior prevents proper SMP support.

user space shall specify the vcpu's PIR as the vcpu id.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

841741f2

KVM: PPC: e500: tlbsx: fix tlb0 esel · 303b7c97

由 Scott Wood 提交于 8月 18, 2011

It should contain the way, not the absolute TLB0 index.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

303b7c97

KVM: PPC: e500: MMU API · dc83b8bc

由 Scott Wood 提交于 8月 18, 2011

This implements a shared-memory API for giving host userspace access to
the guest's TLB.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dc83b8bc

KVM: PPC: e500: clear up confusion between host and guest entries · 0164c0f0

由 Scott Wood 提交于 8月 18, 2011

Split out the portions of tlbe_priv that should be associated with host
entries into tlbe_ref.  Base victim selection on the number of hardware
entries, not guest entries.

For TLB1, where one guest entry can be mapped by multiple host entries,
we use the host tlbe_ref for tracking page references.  For the guest
TLB0 entries, we still track it with gtlb_priv, to avoid having to
retranslate if the entry is evicted from the host TLB but not the
guest TLB.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0164c0f0

KVM: PPC: e500: Eliminate preempt_disable in local_sid_destroy_all · 90b92a6f

由 Scott Wood 提交于 8月 18, 2011

The only place it makes sense to call this function already needs
to have preemption disabled.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

90b92a6f

KVM: PPC: e500: don't translate gfn to pfn with preemption disabled · 3bf3cdcc

由 Scott Wood 提交于 8月 18, 2011

Delay allocation of the shadow pid until we're ready to disable
preemption and write the entry.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3bf3cdcc

KVM: s390: provide access guest registers via kvm_run · 59674c1a

由 Christian Borntraeger 提交于 1月 11, 2012

This patch adds the access registers to the kvm_run structure.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

59674c1a

KVM: s390: provide general purpose guest registers via kvm_run · 5a32c1af

由 Christian Borntraeger 提交于 1月 11, 2012

This patch adds the general purpose registers to the kvm_run structure.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5a32c1af

KVM: s390: provide the prefix register via kvm_run · 60b413c9

由 Christian Borntraeger 提交于 1月 11, 2012

Add the prefix register to the synced register field in kvm_run.
While we need the prefix register most of the time read-only, this
patch also adds handling for guest dirtying of the prefix register.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

60b413c9

KVM: provide synchronous registers in kvm_run · b9e5dc8d

由 Christian Borntraeger 提交于 1月 11, 2012

On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make certain userspace handled exits more expensive than necessary.
Lets provide a section in kvm_run that works as a shared save area
for guest registers.
We also provide two 64bit flags fields (architecture specific), that will
specify
1. which parts of these fields are valid.
2. which registers were modified by userspace

Each bit for these flag fields will define a group of registers (like
general purpose) or a single register.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b9e5dc8d

KVM: s390: rework code that sets the prefix · 8d26cf7b

由 Christian Borntraeger 提交于 1月 11, 2012

There are several places in the kvm module, which set the prefix register.
Since we need to flush the cpu, lets combine this operation into a helper
function. This helper will also explicitely mask out the unused bits.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8d26cf7b

KVM: SVM: Add support for AMD's OSVW feature in guests · 2b036c6b

由 Boris Ostrovsky 提交于 1月 09, 2012

In some cases guests should not provide workarounds for errata even when the
physical processor is affected. For example, because of erratum 400 on family
10h processors a Linux guest will read an MSR (resulting in VMEXIT) before
going to idle in order to avoid getting stuck in a non-C0 state. This is not
necessary: HLT and IO instructions are intercepted and therefore there is no
reason for erratum 400 workaround in the guest.

This patch allows us to present a guest with certain errata as fixed,
regardless of the state of actual hardware.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2b036c6b

KVM: MMU: unnecessary NX state assignment · 4a58ae61

由 Davidlohr Bueso 提交于 1月 06, 2012

We can remove the first ->nx state assignment since it is assigned afterwards anyways.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4a58ae61

KVM: s390: Fix return code for unknown ioctl numbers · 3e6afcf1

由 Carsten Otte 提交于 1月 04, 2012

This patch fixes the return code of kvm_arch_vcpu_ioctl in case
of an unkown ioctl number.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3e6afcf1

KVM: s390: ucontrol: announce capability for user controlled vms · 1efd0f59

由 Carsten Otte 提交于 1月 04, 2012

This patch announces a new capability KVM_CAP_S390_UCONTROL that
indicates that kvm can now support virtual machines that are
controlled by userspace.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1efd0f59

KVM: s390: fix assumption for KVM_MAX_VCPUS · 3777594d

由 Carsten Otte 提交于 1月 04, 2012

This patch fixes definition of the idle_mask and the local_int array
in kvm_s390_float_interrupt. Previous definition had 64 cpus max
hardcoded instead of using KVM_MAX_VCPUS.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3777594d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功