提交 · 468a12c2b53776721ff83517d4a195b85c5fce54 · openanolis / cloud-kernel

05 3月, 2012 40 次提交

KVM: PPC: Use get/set for to_svcpu to help preemption · 468a12c2

由 Alexander Graf 提交于 12月 09, 2011

When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were
doing a few things wrong, most notably access to PACA fields without making
sure that the pointers stay stable accross the access (preempt_disable()).

This patch moves to_svcpu towards a get/put model which allows us to disable
preemption while accessing the shadow vcpu fields in the PACA. That way we
can run preemptible and everyone's happy!
Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

468a12c2

KVM: PPC: Book3s: PR: No irq_disable in vcpu_run · d33ad328

由 Alexander Graf 提交于 12月 09, 2011

Somewhere during merges we ended up from

  local_irq_enable()
  foo();
  local_irq_disable()

to always keeping irqs enabled during that part. However, we now
have the following code:

  foo();
  local_irq_disable()

which disables interrupts without the surrounding code enabling them
again! So let's remove that disable and be happy.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d33ad328

KVM: PPC: Book3s: PR: Disable preemption in vcpu_run · 7d82714d

由 Alexander Graf 提交于 12月 09, 2011

When entering the guest, we want to make sure we're not getting preempted
away, so let's disable preemption on entry, but enable it again while handling
guest exits.
Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7d82714d

KVM: PPC: booke: Improve timer register emulation · dfd4d47e

由 Scott Wood 提交于 11月 17, 2011

Decrementers are now properly driven by TCR/TSR, and the guest
has full read/write access to these registers.

The decrementer keeps ticking (and setting the TSR bit) regardless of
whether the interrupts are enabled with TCR.

The decrementer stops at zero, rather than going negative.

Decrementers (and FITs, once implemented) are delivered as
level-triggered interrupts -- dequeued when the TSR bit is cleared, not
on delivery.
Signed-off-by: NLiu Yu <yu.liu@freescale.com>
[scottwood@freescale.com: significant changes]
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dfd4d47e

KVM: PPC: Paravirtualize SPRG4-7, ESR, PIR, MASn · b5904972

由 Scott Wood 提交于 11月 08, 2011

This allows additional registers to be accessed by the guest
in PR-mode KVM without trapping.

SPRG4-7 are readable from userspace.  On booke, KVM will sync
these registers when it enters the guest, so that accesses from
guest userspace will work.  The guest kernel, OTOH, must consistently
use either the real registers or the shared area between exits.  This
also applies to the already-paravirted SPRG3.

On non-booke, it's not clear to what extent SPRG4-7 are supported
(they're not architected for book3s, but exist on at least some classic
chips).  They are copied in the get/set regs ioctls, but I do not see any
non-booke emulation.  I also do not see any syncing with real registers
(in PR-mode) including the user-readable SPRG3.  This patch should not
make that situation any worse.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b5904972

KVM: PPC: booke: Paravirtualize wrtee · 940b45ec

由 Scott Wood 提交于 11月 08, 2011

Also fix wrteei 1 paravirt to check for a pending interrupt.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

940b45ec

KVM: PPC: booke: Fix int_pending calculation for MSR[EE] paravirt · 29ac26ef

由 Scott Wood 提交于 11月 08, 2011

int_pending was only being lowered if a bit in pending_exceptions
was cleared during exception delivery -- but for interrupts, we clear
it during IACK/TSR emulation.  This caused paravirt for enabling
MSR[EE] to be ineffective.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

29ac26ef

KVM: PPC: booke: Check for MSR[WE] in prepare_to_enter · c59a6a3e

由 Scott Wood 提交于 11月 08, 2011

This prevents us from inappropriately blocking in a KVM_SET_REGS
ioctl -- the MSR[WE] will take effect when the guest is next entered.

It also causes SRR1[WE] to be set when we enter the guest's interrupt
handler, which is what e500 hardware is documented to do.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c59a6a3e

KVM: PPC: Move prepare_to_enter call site into subarch code · 25051b5a

由 Scott Wood 提交于 11月 08, 2011

This function should be called with interrupts disabled, to avoid
a race where an exception is delivered after we check, but the
resched kick is received before we disable interrupts (and thus doesn't
actually trigger the exit code that would recheck exceptions).

booke already does this properly in the lightweight exit case, but
not on initial entry.

For now, move the call of prepare_to_enter into subarch-specific code so
that booke can do the right thing here.  Ideally book3s would do the same
thing, but I'm having a hard time seeing where it does any interrupt
disabling of this sort (plus it has several additional call sites), so
I'm deferring the book3s fix to someone more familiar with that code.
book3s behavior should be unchanged by this patch.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

25051b5a

KVM: PPC: Rename deliver_interrupts to prepare_to_enter · 7e28e60e

由 Scott Wood 提交于 11月 08, 2011

This function also updates paravirt int_pending, so rename it
to be more obvious that this is a collection of checks run prior
to (re)entering a guest.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7e28e60e

KVM: PPC: booke: check for signals in kvmppc_vcpu_run · 1d1ef222

由 Scott Wood 提交于 11月 08, 2011

Currently we check prior to returning from a lightweight exit,
but not prior to initial entry.

book3s already does a similar test.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1d1ef222

KVM: PPC: booke: Do Not start decrementer when SPRN_DEC set 0 · 7401f626

由 Bharat Bhushan 提交于 10月 31, 2011

As per specification the decrementer interrupt not happen when DEC is written
with 0. Also when DEC is zero, no decrementer running. So we should not start
hrtimer for decrementer when DEC = 0.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7401f626

KVM: PPC: Fix DEC truncation for greater than 0xffff_ffff/1000 · dc2babfe

由 Bharat Bhushan 提交于 10月 19, 2011

kvmppc_emulate_dec() uses dec_nsec of type unsigned long and does below calculation:

        dec_nsec = vcpu->arch.dec;
        dec_nsec *= 1000;
This will truncate if DEC value "vcpu->arch.dec" is greater than 0xffff_ffff/1000.
For example : For tb_ticks_per_usec = 4a, we can not set decrementer more than ~58ms.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: NLiu Yu <yu.liu@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dc2babfe

PPC: Fix race in mtmsr paravirt implementation · f9208427

由 Bharat Bhushan 提交于 10月 13, 2011

The current implementation of mtmsr and mtmsrd are racy in that it does:

  * check (int_pending == 0)
  ---> host sets int_pending = 1 <---
  * write shared page
  * done

while instead we should check for int_pending after the shared page is written.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f9208427

KVM: PPC: E500: Support hugetlbfs · 95325e6b

由 Alexander Graf 提交于 9月 20, 2011

With hugetlbfs support emerging on e500, we should also support KVM
backing its guest memory by it.

This patch adds support for hugetlbfs into the e500 shadow mmu code.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

95325e6b

KVM: PPC: e500: Don't hardcode PIR=0 · 841741f2

由 Scott Wood 提交于 9月 02, 2011

The hardcoded behavior prevents proper SMP support.

user space shall specify the vcpu's PIR as the vcpu id.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

841741f2

KVM: PPC: e500: tlbsx: fix tlb0 esel · 303b7c97

由 Scott Wood 提交于 8月 18, 2011

It should contain the way, not the absolute TLB0 index.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

303b7c97

KVM: PPC: e500: MMU API · dc83b8bc

由 Scott Wood 提交于 8月 18, 2011

This implements a shared-memory API for giving host userspace access to
the guest's TLB.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dc83b8bc

KVM: PPC: e500: clear up confusion between host and guest entries · 0164c0f0

由 Scott Wood 提交于 8月 18, 2011

Split out the portions of tlbe_priv that should be associated with host
entries into tlbe_ref.  Base victim selection on the number of hardware
entries, not guest entries.

For TLB1, where one guest entry can be mapped by multiple host entries,
we use the host tlbe_ref for tracking page references.  For the guest
TLB0 entries, we still track it with gtlb_priv, to avoid having to
retranslate if the entry is evicted from the host TLB but not the
guest TLB.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0164c0f0

KVM: PPC: e500: Eliminate preempt_disable in local_sid_destroy_all · 90b92a6f

由 Scott Wood 提交于 8月 18, 2011

The only place it makes sense to call this function already needs
to have preemption disabled.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

90b92a6f

KVM: PPC: e500: don't translate gfn to pfn with preemption disabled · 3bf3cdcc

由 Scott Wood 提交于 8月 18, 2011

Delay allocation of the shadow pid until we're ready to disable
preemption and write the entry.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3bf3cdcc

KVM: s390: provide access guest registers via kvm_run · 59674c1a

由 Christian Borntraeger 提交于 1月 11, 2012

This patch adds the access registers to the kvm_run structure.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

59674c1a

KVM: s390: provide general purpose guest registers via kvm_run · 5a32c1af

由 Christian Borntraeger 提交于 1月 11, 2012

This patch adds the general purpose registers to the kvm_run structure.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5a32c1af

KVM: s390: provide the prefix register via kvm_run · 60b413c9

由 Christian Borntraeger 提交于 1月 11, 2012

Add the prefix register to the synced register field in kvm_run.
While we need the prefix register most of the time read-only, this
patch also adds handling for guest dirtying of the prefix register.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

60b413c9

KVM: provide synchronous registers in kvm_run · b9e5dc8d

由 Christian Borntraeger 提交于 1月 11, 2012

On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make certain userspace handled exits more expensive than necessary.
Lets provide a section in kvm_run that works as a shared save area
for guest registers.
We also provide two 64bit flags fields (architecture specific), that will
specify
1. which parts of these fields are valid.
2. which registers were modified by userspace

Each bit for these flag fields will define a group of registers (like
general purpose) or a single register.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b9e5dc8d

KVM: s390: rework code that sets the prefix · 8d26cf7b

由 Christian Borntraeger 提交于 1月 11, 2012

There are several places in the kvm module, which set the prefix register.
Since we need to flush the cpu, lets combine this operation into a helper
function. This helper will also explicitely mask out the unused bits.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8d26cf7b

KVM: SVM: Add support for AMD's OSVW feature in guests · 2b036c6b

由 Boris Ostrovsky 提交于 1月 09, 2012

In some cases guests should not provide workarounds for errata even when the
physical processor is affected. For example, because of erratum 400 on family
10h processors a Linux guest will read an MSR (resulting in VMEXIT) before
going to idle in order to avoid getting stuck in a non-C0 state. This is not
necessary: HLT and IO instructions are intercepted and therefore there is no
reason for erratum 400 workaround in the guest.

This patch allows us to present a guest with certain errata as fixed,
regardless of the state of actual hardware.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2b036c6b

KVM: MMU: unnecessary NX state assignment · 4a58ae61

由 Davidlohr Bueso 提交于 1月 06, 2012

We can remove the first ->nx state assignment since it is assigned afterwards anyways.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4a58ae61

KVM: s390: Fix return code for unknown ioctl numbers · 3e6afcf1