提交 · 9a26af64d6bba72c9dfd62cc0cab0e79f8a66d7b · openeuler / Kernel

28 7月, 2014 31 次提交

KVM: PPC: Book3s: Remove kvmppc_read_inst() function · 9a26af64

由 Mihai Caraman 提交于 7月 23, 2014

In the context of replacing kvmppc_ld() function calls with a version of
kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this:

"If we get EMULATE_AGAIN, we just have to make sure we go back into the guest.
No need to inject an ISI into the guest - it'll do that all by itself.
With an error returning kvmppc_get_last_inst we can just use completely
get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst() instead."

As a intermediate step get rid of kvmppc_read_inst() and only use kvmppc_ld()
instead.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9a26af64

KVM: PPC: e500mc: Revert "add load inst fixup" · b5741bb3

由 Mihai Caraman 提交于 7月 23, 2014

The commit 1d628af7 "add load inst fixup" made an attempt to handle
failures generated by reading the guest current instruction. The fixup
code that was added works by chance hiding the real issue.

Load external pid (lwepx) instruction, used by KVM to read guest
instructions, is executed in a subsituted guest translation context
(EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
interrupts need to be handled by KVM, even though these interrupts
are generated from host context (MSR[GS] = 0) where lwepx is executed.

Currently, KVM hooks only interrupts generated from guest context
(MSR[GS] = 1), doing minimal checks on the fast path to avoid host
performance degradation. As a result, the host kernel handles lwepx
faults searching the faulting guest data address (loaded in DEAR) in
its own Logical Partition ID (LPID) 0 context. In case a host translation
is found the execution returns to the lwepx instruction instead of the
fixup, the host ending up in an infinite loop.

Revert the commit "add load inst fixup". lwepx issue will be addressed
in a subsequent patch without needing fixup code.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b5741bb3

kvm: ppc: Add SPRN_EPR get helper function · 34f754b9

由 Bharat Bhushan 提交于 7月 17, 2014

kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So
rename and move get_epr helper function to same file.
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
[agraf: remove duplicate return]
Signed-off-by: NAlexander Graf <agraf@suse.de>

34f754b9

kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 · c1b8a01b

由 Bharat Bhushan 提交于 7月 17, 2014

Use kvmppc_set_sprg[0-7]() and kvmppc_get_sprg[0-7]() helper
functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c1b8a01b

kvm: ppc: booke: Add shared struct helpers of SPRN_ESR · dc168549

由 Bharat Bhushan 提交于 7月 17, 2014

Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

dc168549

kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR · a5414d4b

由 Bharat Bhushan 提交于 7月 17, 2014

Uses kvmppc_set_dar() and kvmppc_get_dar() helper functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a5414d4b

kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 · 31579eea

由 Bharat Bhushan 提交于 7月 17, 2014

Use kvmppc_set_srr0/srr1() and kvmppc_get_srr0/srr1() helper functions
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

31579eea

KVM: PPC: Book3S: Make magic page properly 4k mappable · 89b68c96

由 Alexander Graf 提交于 7月 13, 2014

The magic page is defined as a 4k page of per-vCPU data that is shared
between the guest and the host to accelerate accesses to privileged
registers.

However, when the host is using 64k page size granularity we weren't quite
as strict about that rule anymore. Instead, we partially treated all of the
upper 64k as magic page and mapped only the uppermost 4k with the actual
magic contents.

This works well enough for Linux which doesn't use any memory in kernel
space in the upper 64k, but Mac OS X got upset. So this patch makes magic
page actually stay in a 4k range even on 64k page size hosts.

This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE
hosts for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

89b68c96

KVM: PPC: Book3S: Add hack for split real mode · c01e3f66

由 Alexander Graf 提交于 7月 11, 2014

Today we handle split real mode by mapping both instruction and data faults
into a special virtual address space that only exists during the split mode
phase.

This is good enough to catch 32bit Linux guests that use split real mode for
copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
instruction pointer and can map the user space process freely below there.

However, that approach fails when we're running KVM inside of KVM. Here the 1st
level last_inst reader may well be in the same virtual page as a 2nd level
interrupt handler.

It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
kernel copy_from/to_user implementation can easily overlap with user space
addresses.

The architecturally correct way to fix this would be to implement an instruction
interpreter in KVM that kicks in whenever we go into split real mode. This
interpreter however would not receive a great amount of testing and be a lot of
bloat for a reasonably isolated corner case.

So I went back to the drawing board and tried to come up with a way to make
split real mode work with a single flat address space. And then I realized that
we could get away with the same trick that makes it work for Linux:

Whenever we see an instruction address during split real mode that may collide,
we just move it higher up the virtual address space to a place that hopefully
does not collide (keep your fingers crossed!).

That approach does work surprisingly well. I am able to successfully run
Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
broken split real mode :).
Signed-off-by: NAlexander Graf <agraf@suse.de>

c01e3f66

KVM: PPC: Book3S: Stop PTE lookup on write errors · 2e27ecc9

由 Alexander Graf 提交于 7月 10, 2014

When a page lookup failed because we're not allowed to write to the page, we
should not overwrite that value with another lookup on the second PTEG which
will return "page not found". Instead, we should just tell the caller that we
had a permission problem.

This fixes Mac OS X guests looping endlessly in page lookup code for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

2e27ecc9

KVM: PPC: Deflect page write faults properly in kvmppc_st · 17824b5a

由 Alexander Graf 提交于 7月 10, 2014

When we have a page that we're not allowed to write to, xlate() will already
tell us -EPERM on lookup of that page. With the code as is we change it into
a "page missing" error which a guest may get confused about. Instead, just
tell the caller about the -EPERM directly.

This fixes Mac OS X guests when run with DCBZ32 emulation.
Signed-off-by: NAlexander Graf <agraf@suse.de>

17824b5a

KVM: PPC: e500: Emulate power management control SPR · debf27d6

由 Mihai Caraman 提交于 7月 04, 2014

For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as a general store.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

debf27d6

KVM: PPC: Book3S HV: Enable for little endian hosts · 6947f948

由 Alexander Graf 提交于 6月 11, 2014

Now that we've fixed all the issues that HV KVM code had on little endian
hosts, we can enable it in the kernel configuration for users to play with.
Signed-off-by: NAlexander Graf <agraf@suse.de>

6947f948

KVM: PPC: Book3S HV: Fix ABIv2 on LE · 9bf163f8

由 Alexander Graf 提交于 6月 16, 2014

For code that doesn't live in modules we can just branch to the real function
names, giving us compatibility with ABIv1 and ABIv2.

Do this for the compiled-in code of HV KVM.
Signed-off-by: NAlexander Graf <agraf@suse.de>

9bf163f8

KVM: PPC: Book3S HV: Access XICS in BE · 76d072fb

由 Alexander Graf 提交于 6月 11, 2014

On the exit path from the guest we check what type of interrupt we received
if we received one. This means we're doing hardware access to the XICS interrupt
controller.

However, when running on a little endian system, this access is byte reversed.

So let's make sure to swizzle the bytes back again and virtually make XICS
accesses big endian.
Signed-off-by: NAlexander Graf <agraf@suse.de>

76d072fb

KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE · 0865a583

由 Alexander Graf 提交于 6月 11, 2014

Some data structures are always stored in big endian. Among those are the LPPACA
fields as well as the shadow slb. These structures might be shared with a
hypervisor.

So whenever we access those fields, make sure we do so in big endian byte order.
Signed-off-by: NAlexander Graf <agraf@suse.de>

0865a583

KVM: PPC: Book3S HV: Access guest VPA in BE · 02407552

由 Alexander Graf 提交于 6月 11, 2014

There are a few shared data structures between the host and the guest. Most
of them get registered through the VPA interface.

These data structures are defined to always be in big endian byte order, so
let's make sure we always access them in big endian.
Signed-off-by: NAlexander Graf <agraf@suse.de>

02407552

KVM: PPC: Book3S HV: Make HTAB code LE host aware · 6f22bd32

由 Alexander Graf 提交于 6月 11, 2014

When running on an LE host all data structures are kept in little endian
byte order. However, the HTAB still needs to be maintained in big endian.

So every time we access any HTAB we need to make sure we do so in the right
byte order. Fix up all accesses to manually byte swap.
Signed-off-by: NAlexander Graf <agraf@suse.de>

6f22bd32

KVM: PPC: e500: Fix default tlb for victim hint · d57cef91

由 Mihai Caraman 提交于 6月 30, 2014

Tlb search operation used for victim hint relies on the default tlb set by the
host. When hardware tablewalk support is enabled in the host, the default tlb is
TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb
when searching for victim hint.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

d57cef91

KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling · 9642382e

由 Michael Neuling 提交于 6月 02, 2014

This adds support for the H_SET_MODE hcall.  This hcall is a
multiplexer that has several functions, some of which are called
rarely, and some which are potentially called very frequently.
Here we add support for the functions that set the debug registers
CIABR (Completed Instruction Address Breakpoint Register) and
DAWR/DAWRX (Data Address Watchpoint Register and eXtension),
since they could be updated by the guest as often as every context
switch.

This also adds a kvmppc_power8_compatible() function to test to see
if a guest is compatible with POWER8 or not.  The CIABR and DAWR/X
only exist on POWER8.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9642382e

KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled · ae2113a4

由 Paul Mackerras 提交于 6月 02, 2014

This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
capability is used to enable or disable in-kernel handling of an
hcall, that the hcall is actually implemented by the kernel.
If not an EINVAL error is returned.

This also checks the default-enabled list of hcalls and prints a
warning if any hcall there is not actually implemented.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ae2113a4

KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0

由 Paul Mackerras 提交于 6月 02, 2014

This provides a way for userspace controls which sPAPR hcalls get
handled in the kernel.  Each hcall can be individually enabled or
disabled for in-kernel handling, except for H_RTAS.  The exception
for H_RTAS is because userspace can already control whether
individual RTAS functions are handled in-kernel or not via the
KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
H_RTAS is out of the normal sequence of hcall numbers.

Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
The args field of the struct kvm_enable_cap specifies the hcall number
in args[0] and the enable/disable flag in args[1]; 0 means disable
in-kernel handling (so that the hcall will always cause an exit to
userspace) and 1 means enable.  Enabling or disabling in-kernel
handling of an hcall is effective across the whole VM.

The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
capability advertises that this ability exists.

When a VM is created, an initial set of hcalls are enabled for
in-kernel handling.  The set that is enabled is the set that have
an in-kernel implementation at this point.  Any new hcall
implementations from this point onwards should not be added to the
default set without a good reason.

No distinction is made between real-mode and virtual-mode hcall
implementations; the one setting controls them both.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

699a0ea0

KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule · 1f0eeb7e

由 Mihai Caraman 提交于 6月 18, 2014

On vcpu schedule, the condition checked for tlb pollution is too loose.
The tlb entries of a vcpu become polluted (vs stale) only when a different
vcpu within the same logical partition runs in-between. Optimize the tlb
invalidation condition keeping last_vcpu per logical partition id.

With the new invalidation condition, a guest shows 4% performance improvement
on P5020DS while running a memory stress application with the cpu oversubscribed,
the other guest running a cpu intensive workload.

Guest - old invalidation condition
  real 3.89
  user 3.87
  sys 0.01

Guest - enhanced invalidation condition
  real 3.75
  user 3.73
  sys 0.01

Host
  real 3.70
  user 1.85
  sys 0.00

The memory stress application accesses 4KB pages backed by 75% of available
TLB0 entries:

char foo[ENTRIES][4096] __attribute__ ((aligned (4096)));

int main()
{
	char bar;
	int i, j;

	for (i = 0; i < ITERATIONS; i++)
        	for (j = 0; j < ENTRIES; j++)
            		bar = foo[j][0];

	return 0;
}
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1f0eeb7e

KVM: PPC: Book3S PR: Fix sparse endian checks · f396df35

由 Alexander Graf 提交于 6月 16, 2014

While sending sparse with endian checks over the code base, it triggered at
some places that were missing casts or had wrong types. Fix them up.
Signed-off-by: NAlexander Graf <agraf@suse.de>

f396df35

KVM: PPC: Book3S PR: Fix ABIv2 on LE · da166fac

由 Alexander Graf 提交于 6月 16, 2014

We switched to ABIv2 on Little Endian systems now which gets rid of the
dotted function names. Branch to the actual functions when we see such
a system.
Signed-off-by: NAlexander Graf <agraf@suse.de>

da166fac

KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() · ad7d4584

由 Anton Blanchard 提交于 6月 12, 2014

Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are
assembly functions that are exported to modules and also require
a valid r2.

As such we need to use _GLOBAL_TOC so we provide a global entry
point that establishes the TOC (r2).
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ad7d4584

KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue · 05a308c7

由 Anton Blanchard 提交于 6月 12, 2014

To establish addressability quickly, ABIv2 requires the target
address of the function being called to be in r12.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

05a308c7

KVM: PPC: Book3S PR: Handle hyp doorbell exits · 568fccc4

由 Alexander Graf 提交于 6月 16, 2014

If we're running PR KVM in HV mode, we may get hypervisor doorbell interrupts.
Handle those the same way we treat normal doorbells.
Signed-off-by: NAlexander Graf <agraf@suse.de>

568fccc4

KVM: PPC: Book3s PR: Disable AIL mode with OPAL · fb4188ba

由 Alexander Graf 提交于 6月 09, 2014

When we're using PR KVM we must not allow the CPU to take interrupts
in virtual mode, as the SLB does not contain host kernel mappings
when running inside the guest context.

To make sure we get good performance for non-KVM tasks but still
properly functioning PR KVM, let's just disable AIL whenever a vcpu
is scheduled in.

This is fundamentally different from how we deal with AIL on pSeries
type machines where we disable AIL for the whole machine as soon as
a single KVM VM is up.

The reason for that is easy - on pSeries we do not have control over
per-cpu configuration of AIL. We also don't want to mess with CPU hotplug
races and AIL configuration, so setting it per CPU is easier and more
flexible.

This patch fixes running PR KVM on POWER8 bare metal for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaul Mackerras <paulus@samba.org>

fb4188ba

KVM: PPC: BOOK3S: PR: Emulate instruction counter · 06da28e7

由 Aneesh Kumar K.V 提交于 6月 05, 2014

Writing to IC is not allowed in the privileged mode.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

06da28e7

KVM: PPC: BOOK3S: PR: Emulate virtual timebase register · 8f42ab27

由 Aneesh Kumar K.V 提交于 6月 05, 2014

virtual time base register is a per VM, per cpu register that needs
to be saved and restored on vm exit and entry. Writing to VTB is not
allowed in the privileged mode.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix compile error]
Signed-off-by: NAlexander Graf <agraf@suse.de>

8f42ab27

06 7月, 2014 1 次提交

KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation · 3cd60e31

由 Aneesh Kumar K.V 提交于 6月 04, 2014

We use time base for PURR and SPURR emulation with PR KVM since we
are emulating a single threaded core. When using time base
we need to make sure that we don't accumulate time spent in the host
in PURR and SPURR value.

Also we don't need to emulate mtspr because both the registers are
hypervisor resource.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

3cd60e31

11 6月, 2014 1 次提交

powerpc/book3s: Fix guest MC delivery mechanism to avoid soft lockups in guest. · 74845bc2

由 Mahesh Salgaonkar 提交于 6月 11, 2014

Currently we forward MCEs to guest which have been recovered by guest.
And for unhandled errors we do not deliver the MCE to guest. It looks like
with no support of FWNMI in qemu, guest just panics whenever we deliver the
recovered MCEs to guest. Also, the existig code used to return to host for
unhandled errors which was casuing guest to hang with soft lockups inside
guest and makes it difficult to recover guest instance.

This patch now forwards all fatal MCEs to guest causing guest to crash/panic.
And, for recovered errors we just go back to normal functioning of guest
instead of returning to host. This fixes soft lockup issues in guest.
This patch also fixes an issue where guest MCE events were not logged to
host console.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

74845bc2

30 5月, 2014 7 次提交

KVM: PPC: Book3S PR: Rework SLB switching code · d8d164a9

由 Alexander Graf 提交于 5月 15, 2014

On LPAR guest systems Linux enables the shadow SLB to indicate to the
hypervisor a number of SLB entries that always have to be available.

Today we go through this shadow SLB and disable all ESID's valid bits.
However, pHyp doesn't like this approach very much and honors us with
fancy machine checks.

Fortunately the shadow SLB descriptor also has an entry that indicates
the number of valid entries following. During the lifetime of a guest
we can just swap that value to 0 and don't have to worry about the
SLB restoration magic.

While we're touching the code, let's also make it more readable (get
rid of rldicl), allow it to deal with a dynamic number of bolted
SLB entries and only do shadow SLB swizzling on LPAR systems.
Signed-off-by: NAlexander Graf <agraf@suse.de>

d8d164a9

KVM: PPC: Book3S PR: Use SLB entry 0 · 207438d4

由 Alexander Graf 提交于 5月 15, 2014

We didn't make use of SLB entry 0 because ... of no good reason. SLB entry 0
will always be used by the Linux linear SLB entry, so the fact that slbia
does not invalidate it doesn't matter as we overwrite SLB 0 on exit anyway.

Just enable use of SLB entry 0 for our shadow SLB code.
Signed-off-by: NAlexander Graf <agraf@suse.de>

207438d4

KVM: PPC: Book3S HV: Fix machine check delivery to guest · 000a25dd

由 Paul Mackerras 提交于 5月 26, 2014

The code that delivered a machine check to the guest after handling
it in real mode failed to load up r11 before calling kvmppc_msr_interrupt,
which needs the old MSR value in r11 so it can see the transactional
state there.  This adds the missing load.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

000a25dd

KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs · 9bc01a9b

由 Paul Mackerras 提交于 5月 26, 2014

This adds workarounds for two hardware bugs in the POWER8 performance
monitor unit (PMU), both related to interrupt generation.  The effect
of these bugs is that PMU interrupts can get lost, leading to tools
such as perf reporting fewer counts and samples than they should.

The first bug relates to the PMAO (perf. mon. alert occurred) bit in
MMCR0; setting it should cause an interrupt, but doesn't.  The other
bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
Setting PMAE when a counter is negative and counter negative
conditions are enabled to cause alerts should cause an alert, but
doesn't.

The workaround for the first bug is to create conditions where a
counter will overflow, whenever we are about to restore a MMCR0
value that has PMAO set (and PMAO_SYNC clear).  The workaround for
the second bug is to freeze all counters using MMCR2 before reading
MMCR0.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9bc01a9b

KVM: PPC: Book3S HV: Make sure we don't miss dirty pages · 6c576e74

由 Paul Mackerras 提交于 5月 26, 2014

Current, when testing whether a page is dirty (when constructing the
bitmap for the KVM_GET_DIRTY_LOG ioctl), we test the C (changed) bit
in the HPT entries mapping the page, and if it is 0, we consider the
page to be clean.  However, the Power ISA doesn't require processors
to set the C bit to 1 immediately when writing to a page, and in fact
allows them to delay the writeback of the C bit until they receive a
TLB invalidation for the page.  Thus it is possible that the page
could be dirty and we miss it.

Now, if there are vcpus running, this is not serious since the
collection of the dirty log is racy already - some vcpu could dirty
the page just after we check it.  But if there are no vcpus running we
should return definitive results, in case we are in the final phase of
migrating the guest.

Also, if the permission bits in the HPTE don't allow writing, then we
know that no CPU can set C.  If the HPTE was previously writable and
the page was modified, any C bit writeback would have been flushed out
by the tlbie that we did when changing the HPTE to read-only.

Otherwise we need to do a TLB invalidation even if the C bit is 0, and
then check the C bit.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6c576e74

KVM: PPC: Book3S HV: Fix dirty map for hugepages · 687414be

由 Alexey Kardashevskiy 提交于 5月 26, 2014

The dirty map that we construct for the KVM_GET_DIRTY_LOG ioctl has
one bit per system page (4K/64K).  Currently, we only set one bit in
the map for each HPT entry with the Change bit set, even if the HPT is
for a large page (e.g., 16MB).  Userspace then considers only the
first system page dirty, though in fact the guest may have modified
anywhere in the large page.

To fix this, we make kvm_test_clear_dirty() return the actual number
of pages that are dirty (and rename it to kvm_test_clear_dirty_npages()
to emphasize that that's what it returns).  In kvmppc_hv_get_dirty_log()
we then set that many bits in the dirty map.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

687414be

KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address · 1066f772

由 Paul Mackerras 提交于 5月 26, 2014

Currently, when a huge page is faulted in for a guest, we select the
rmap chain to insert the HPTE into based on the guest physical address
that the guest tried to access.  Since there is an rmap chain for each
system page, there are many rmap chains for the area covered by a huge
page (e.g. 256 for 16MB pages when PAGE_SIZE = 64kB), and the huge-page
HPTE could end up in any one of them.

For consistency, and to make the huge-page HPTEs easier to find, we now
put huge-page HPTEs in the rmap chain corresponding to the base address
of the huge page.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1066f772

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功