提交 · 6c85f52b10fd60e45c6e30c5b85d116406bd3c9b · openanolis / cloud-kernel

27 1月, 2014 4 次提交

kvm/ppc: IRQ disabling cleanup · 6c85f52b

由 Scott Wood 提交于 1月 09, 2014

Simplify the handling of lazy EE by going directly from fully-enabled
to hard-disabled.  This replaces the lazy_irq_pending() check
(including its misplaced kvm_guest_exit() call).

As suggested by Tiejun Chen, move the interrupt disabling into
kvmppc_prepare_to_enter() rather than have each caller do it.  Also
move the IRQ enabling on heavyweight exit into
kvmppc_prepare_to_enter().
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6c85f52b

KVM: PPC: e500: Fix bad address type in deliver_tlb_misss() · 70713fe3

由 Mihai Caraman 提交于 1月 09, 2014

Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
CC: stable@vger.kernel.org
Signed-off-by: NAlexander Graf <agraf@suse.de>

70713fe3

KVM: PPC: Book3S HV: use xics_wake_cpu only when defined · 48eaef05

由 Andreas Schwab 提交于 12月 30, 2013

Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
CC: stable@vger.kernel.org
Signed-off-by: NAlexander Graf <agraf@suse.de>

48eaef05

KVM: PPC: Book3S: MMIO emulation support for little endian guests · 73601775

由 Cédric Le Goater 提交于 1月 09, 2014

MMIO emulation reads the last instruction executed by the guest
and then emulates. If the guest is running in Little Endian order,
or more generally in a different endian order of the host, the
instruction needs to be byte-swapped before being emulated.

This patch adds a helper routine which tests the endian order of
the host and the guest in order to decide whether a byteswap is
needed or not. It is then used to byteswap the last instruction
of the guest in the endian order of the host before MMIO emulation
is performed.

Finally, kvmppc_handle_load() of kvmppc_handle_store() are modified
to reverse the endianness of the MMIO if required.
Signed-off-by: NCédric Le Goater <clg@fr.ibm.com>
[agraf: add booke handling]
Signed-off-by: NAlexander Graf <agraf@suse.de>

73601775

09 1月, 2014 11 次提交

KVM: PPC: NULL return of kvmppc_mmu_hpte_cache_next should be handled · 47d45d9f

由 Zhouyi Zhou 提交于 12月 02, 2013

NULL return of kvmppc_mmu_hpte_cache_next should be handled
Signed-off-by: NZhouyi Zhou <yizhouzhou@ict.ac.cn>
Signed-off-by: NAlexander Graf <agraf@suse.de>

47d45d9f

KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state · 9bd880a2

由 Tiejun Chen 提交于 10月 23, 2013

Rather than calling hard_irq_disable() when we're back in C code
we can just call RECONCILE_IRQ_STATE to soft disable IRQs while
we're already in hard disabled state.

This should be functionally equivalent to the code before, but
cleaner and faster.
Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
[agraf: fix comment, commit message]
Signed-off-by: NAlexander Graf <agraf@suse.de>

9bd880a2

kvm: powerpc: use caching attributes as per linux pte · 08c9a188

由 Bharat Bhushan 提交于 11月 18, 2013

KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

08c9a188

kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update() · 7c85e6b3

由 Bharat Bhushan 提交于 11月 15, 2013

lookup_linux_pte() is doing more than lookup, updating the pte,
so for clarity it is renamed to lookup_linux_pte_and_update()
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7c85e6b3

kvm: booke: clear host tlb reference flag on guest tlb invalidation · 30a91fe2

由 Bharat Bhushan 提交于 11月 15, 2013

On booke, "struct tlbe_ref" contains host tlb mapping information
(pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
for a guest tlb entry. So when a guest creates a TLB entry then
"struct tlbe_ref" is set to point to valid "pfn" and set attributes in
"flags" field of the above said structure. When a guest TLB entry is
invalidated then flags field of corresponding "struct tlbe_ref" is
updated to point that this is no more valid, also we selectively clear
some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.

Ideally we should clear complete "flags" as this entry is invalid and does not
have anything to re-used. The other part of the problem is that when we use
the same entry again then also we do not clear (started doing or-ing etc).

So far it was working because the selectively clearing mentioned above
actually clears "flags" what was set during TLB mapping. But the problem
starts coming when we add more attributes to this then we need to selectively
clear them and which is not needed.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

30a91fe2

KVM: PPC: Book3S HV: Use load/store_fp_state functions in HV guest entry/exit · 595e4f7e

由 Paul Mackerras 提交于 10月 15, 2013

This modifies kvmppc_load_fp and kvmppc_save_fp to use the generic
FP/VSX and VMX load/store functions instead of open-coding the
FP/VSX/VMX load/store instructions.  Since kvmppc_load/save_fp don't
follow C calling conventions, we make them private symbols within
book3s_hv_rmhandlers.S.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

595e4f7e

KVM: PPC: Load/save FP/VMX/VSX state directly to/from vcpu struct · 99dae3ba

由 Paul Mackerras 提交于 10月 15, 2013

Now that we have the vcpu floating-point and vector state stored in
the same type of struct as the main kernel uses, we can load that
state directly from the vcpu struct instead of having extra copies
to/from the thread_struct.  Similarly, when the guest state needs to
be saved, we can have it saved it directly to the vcpu struct by
setting the current->thread.fp_save_area and current->thread.vr_save_area
pointers.  That also means that we don't need to back up and restore
userspace's FP/vector state.  This all makes the code simpler and
faster.

Note that it's not necessary to save or modify current->thread.fpexc_mode,
since nothing in KVM uses or is affected by its value.  Nor is it
necessary to touch used_vr or used_vsr.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

99dae3ba

KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures · efff1912

由 Paul Mackerras 提交于 10月 15, 2013

This uses struct thread_fp_state and struct thread_vr_state to store
the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
This makes transferring the state to/from the thread_struct simpler
and allows us to unify the get/set_one_reg implementations for the
VSX registers.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

efff1912

KVM: PPC: Use load_fp/vr_state rather than load_up_fpu/altivec · 09548fda

由 Paul Mackerras 提交于 10月 15, 2013

The load_up_fpu and load_up_altivec functions were never intended to
be called from C, and do things like modifying the MSR value in their
callers' stack frames, which are assumed to be interrupt frames. In
addition, on 32-bit Book S they require the MMU to be off.

This makes KVM use the new load_fp_state() and load_vr_state() functions
instead of load_up_fpu/altivec. This means we can remove the assembler
glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E,
where load_up_fpu was called directly from C.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

09548fda

KVM: PPC: fix couple of memory leaks in MPIC/XICS devices · 458ff3c0

由 Gleb Natapov 提交于 9月 01, 2013

XICS failed to free xics structure on error path. MPIC destroy handler
forgot to delete kvm_device structure.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

458ff3c0

KVM: PPC: Add devname:kvm aliases for modules · 398a76c6

由 Alexander Graf 提交于 12月 09, 2013

Systems that support automatic loading of kernel modules through
device aliases should try and automatically load kvm when /dev/kvm
gets opened.

Add code to support that magic for all PPC kvm targets, even the
ones that don't support modules yet.
Signed-off-by: NAlexander Graf <agraf@suse.de>

398a76c6

21 11月, 2013 1 次提交

powerpc: kvm: optimize "sc 1" as fast return · 27025a60

由 Liu Ping Fan 提交于 11月 19, 2013

In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently,
this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL
to HV guest, so powernv can return to HV guest without heavy exit, i.e,
no need to swap TLB, HTAB,.. etc
Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

27025a60

18 10月, 2013 2 次提交

kvm: powerpc: book3s: drop is_hv_enabled · a78b55d1

由 Aneesh Kumar K.V 提交于 10月 07, 2013

drop is_hv_enabled, because that should not be a callback property
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a78b55d1

kvm: powerpc: book3s: Allow the HV and PR selection per virtual machine · cbbc58d4

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This moves the kvmppc_ops callbacks to be a per VM entity. This
enables us to select HV and PR mode when creating a VM. We also
allow both kvm-hv and kvm-pr kernel module to be loaded. To
achieve this we move /dev/kvm ownership to kvm.ko module. Depending on
which KVM mode we select during VM creation we take a reference
count on respective module
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix coding style]
Signed-off-by: NAlexander Graf <agraf@suse.de>

cbbc58d4

17 10月, 2013 22 次提交

kvm: Add struct kvm arg to memslot APIs · 5587027c

由 Aneesh Kumar K.V 提交于 10月 07, 2013

We will use that in the later patch to find the kvm ops handler
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5587027c

kvm: powerpc: book3s: Support building HV and PR KVM as module · 2ba9f0d8

由 Aneesh Kumar K.V 提交于 10月 07, 2013

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in compile fix]
Signed-off-by: NAlexander Graf <agraf@suse.de>

2ba9f0d8

A
kvm: powerpc: booke: Move booke related tracepoints to separate header · dba291f2
由 Aneesh Kumar K.V 提交于 10月 07, 2013
```
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
```
dba291f2

kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header · 72c12535

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This patch moves PR related tracepoints to a separate header. This
enables in converting PR to a kernel module which will be done in
later patches
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

72c12535

kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops · 699cc876

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This help us to identify whether we are running with hypervisor mode KVM
enabled. The change is needed so that we can have both HV and PR kvm
enabled in the same kernel.

If both HV and PR KVM are included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest.

Allowing both PR and HV in the same kernel required some changes to
kvm_dev_ioctl_check_extension(), since the values returned now can't
be selected with #ifdefs as much as previously. We look at is_hv_enabled
to return the right value when checking for capabilities.For capabilities that
are only provided by HV KVM, we return the HV value only if
is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
we return the PR value only if is_hv_enabled is false.

NOTE: in later patch we replace is_hv_enabled with a static inline
function comparing kvm_ppc_ops
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

699cc876

kvm: powerpc: book3s: Cleanup interrupt handling code · dd96b2c2

由 Aneesh Kumar K.V 提交于 10月 07, 2013

With this patch if HV is included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps
in enabling both HV and PR, which we do in later patch
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

dd96b2c2

kvm: powerpc: Add kvmppc_ops callback · 3a167bea

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This patch add a new callback kvmppc_ops. This will help us in enabling
both HV and PR KVM together in the same kernel. The actual change to
enable them together is done in the later patch in the series.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in booke changes]
Signed-off-by: NAlexander Graf <agraf@suse.de>

3a167bea

kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLE · 9975f5e3

由 Aneesh Kumar K.V 提交于 10月 07, 2013

This help ups to select the relevant code in the kernel code
when we later move HV and PR bits as seperate modules. The patch
also makes the config options for PR KVM selectable
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9975f5e3

kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE · 7aa79938

由 Aneesh Kumar K.V 提交于 10月 07, 2013

With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7aa79938

kvm: powerpc: book3s: move book3s_64_vio_hv.c into the main kernel binary · 066212e0

由 Paul Mackerras 提交于 10月 07, 2013

Since the code in book3s_64_vio_hv.c is called from real mode with HV
KVM, and therefore has to be built into the main kernel binary, this
makes it always built-in rather than part of the KVM module. It gets
called from the KVM module by PR KVM, so this adds an EXPORT_SYMBOL_GPL().
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

066212e0

kvm: powerpc: book3s: remove kvmppc_handler_highmem label · 178db620

由 Paul Mackerras 提交于 10月 07, 2013

This label is not used now.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

178db620

KVM: PPC: E500: Add userspace debug stub support · ce11e48b

由 Bharat Bhushan 提交于 7月 04, 2013

This patch adds the debug stub support on booke/bookehv.
Now QEMU debug stub can use hw breakpoint, watchpoint and
software breakpoint to debug guest.

This is how we save/restore debug register context when switching
between guest, userspace and kernel user-process:

When QEMU is running
 -> thread->debug_reg == QEMU debug register context.
 -> Kernel will handle switching the debug register on context switch.
 -> no vcpu_load() called

QEMU makes ioctls (except RUN)
 -> This will call vcpu_load()
 -> should not change context.
 -> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs

QEMU Makes RUN ioctl
 -> Save thread->debug_reg on STACK
 -> Store thread->debug_reg == vcpu->debug_reg
 -> load thread->debug_reg
 -> RUN VCPU ( So thread points to vcpu context )

Context switch happens When VCPU running
 -> makes vcpu_load() should not load any context
 -> kernel loads the vcpu context as thread->debug_regs points to vcpu context.

On heavyweight_exit
 -> Load the context saved on stack in thread->debug_reg

Currently we do not support debug resource emulation to guest,
On debug exception, always exit to user space irrespective of
user space is expecting the debug exception or not. If this is
unexpected exception (breakpoint/watchpoint event not set by
userspace) then let us leave the action on user space. This
is similar to what it was before, only thing is that now we
have proper exit state available to user space.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ce11e48b

KVM: PPC: E500: Using "struct debug_reg" · 547465ef

由 Bharat Bhushan 提交于 7月 04, 2013

For KVM also use the "struct debug_reg" defined in asm/processor.h
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

547465ef

KVM: PPC: E500: exit to user space on "ehpriv 1" instruction · b12c7841

由 Bharat Bhushan 提交于 7月 04, 2013

"ehpriv 1" instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with "run->debug" have relevant information.

As this is the first point we are using run->debug, also defined
the run->debug structure.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b12c7841

kvm: powerpc: e500: mark page accessed when mapping a guest page · 84e4d632

由 Bharat Bhushan 提交于 8月 07, 2013

Mark the guest page as accessed so that there is likely
less chances of this page getting swap-out.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

84e4d632

kvm: powerpc: allow guest control "G" attribute in mas2 · ca8ccbd4

由 Bharat Bhushan 提交于 9月 19, 2013

"G" bit in MAS2 indicates whether the page is Guarded.
There is no reason to stop guest setting  "G", so allow him.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ca8ccbd4

kvm: powerpc: allow guest control "E" attribute in mas2 · fd75cb51

由 Bharat Bhushan 提交于 9月 19, 2013

"E" bit in MAS2 bit indicates whether the page is accessed
in Little-Endian or Big-Endian byte order.
There is no reason to stop guest setting  "E", so allow him."
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

fd75cb51

KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode · 44a3add8

由 Paul Mackerras 提交于 10月 04, 2013

When an interrupt or exception happens in the guest that comes to the
host, the CPU goes to hypervisor real mode (MMU off) to handle the
exception but doesn't change the MMU context. After saving a few
registers, we then clear the "in guest" flag. If, for any reason,
we get an exception in the real-mode code, that then gets handled
by the normal kernel exception handlers, which turn the MMU on. This
is disastrous if the MMU is still set to the guest context, since we
end up executing instructions from random places in the guest kernel
with hypervisor privilege.

In order to catch this situation, we define a new value for the "in guest"
flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
mode with guest MMU context. If the "in guest" flag is set to this value,
we branch off to an emergency handler. For the moment, this just does
a branch to self to stop the CPU from doing anything further.

While we're here, we define another new flag value to indicate that we
are in a HV guest, as distinct from a PR guest. This will be useful
when we have a kernel that can support both PR and HV guests concurrently.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

44a3add8

kvm: powerpc: book3s hv: Fix vcore leak · f1378b1c

由 Paul Mackerras 提交于 9月 27, 2013

add kvmppc_free_vcores() to free the kvmppc_vcore structures
that we allocate for a guest, which are currently being leaked.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f1378b1c

KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers · 491d6ecc

由 Paul Mackerras 提交于 9月 20, 2013

Currently, whenever any of the MMU notifier callbacks get called, we
invalidate all the shadow PTEs.  This is inefficient because it means
that we typically then get a lot of DSIs and ISIs in the guest to fault
the shadow PTEs back in.  We do this even if the address range being
notified doesn't correspond to guest memory.

This commit adds code to scan the memslot array to find out what range(s)
of guest physical addresses corresponds to the host virtual address range
being affected.  For each such range we flush only the shadow PTEs
for the range, on all cpus.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

491d6ecc

KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being written · adc0bafe

由 Paul Mackerras 提交于 9月 20, 2013

The mark_page_dirty() function, despite what its name might suggest,
doesn't actually mark the page as dirty as far as the MM subsystem is
concerned.  It merely sets a bit in KVM's map of dirty pages, if
userspace has requested dirty tracking for the relevant memslot.
To tell the MM subsystem that the page is dirty, we have to call
kvm_set_pfn_dirty() (or an equivalent such as SetPageDirty()).

This adds a call to kvm_set_pfn_dirty(), and while we are here, also
adds a call to kvm_set_pfn_accessed() to tell the MM subsystem that
the page has been accessed.  Since we are now using the pfn in
several places, this adds a 'pfn' variable to store it and changes
the places that used hpaddr >> PAGE_SHIFT to use pfn instead, which
is the same thing.

This also changes a use of HPTE_R_PP to PP_RXRX.  Both are 3, but
PP_RXRX is more informative as being the read-only page permission
bit setting.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

adc0bafe

KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() · d78bca72

由 Paul Mackerras 提交于 9月 20, 2013

When the MM code is invalidating a range of pages, it calls the KVM
kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
kvm_unmap_hva_range(), which arranges to flush all the existing host
HPTEs for guest pages.  However, the Linux PTEs for the range being
flushed are still valid at that point.  We are not supposed to establish
any new references to pages in the range until the ...range_end()
notifier gets called.  The PPC-specific KVM code doesn't get any
explicit notification of that; instead, we are supposed to use
mmu_notifier_retry() to test whether we are or have been inside a
range flush notifier pair while we have been getting a page and
instantiating a host HPTE for the page.

This therefore adds a call to mmu_notifier_retry inside
kvmppc_mmu_map_page().  This call is inside a region locked with
kvm->mmu_lock, which is the same lock that is called by the KVM
MMU notifier functions, thus ensuring that no new notification can
proceed while we are in the locked region.  Inside this region we
also create the host HPTE and link the corresponding hpte_cache
structure into the lists used to find it later.  We cannot allocate
the hpte_cache structure inside this locked region because that can
lead to deadlock, so we allocate it outside the region and free it
if we end up not using it.

This also moves the updates of vcpu3s->hpte_cache_count inside the
regions locked with vcpu3s->mmu_lock, and does the increment in
kvmppc_mmu_hpte_cache_map() when the pte is added to the cache
rather than when it is allocated, in order that the hpte_cache_count
is accurate.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

d78bca72

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功