提交 · cc568ead3ce8e0284e7e2cc77bd1dafb03ba4ca1 · openeuler / Kernel

02 8月, 2014 2 次提交

ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction · 6bf755db

由 Omar Sandoval 提交于 8月 01, 2014

The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn)
should only be entered when a kgdb break instruction is executed
from the kernel. Otherwise, if kgdb is enabled, a userspace program
can cause the kernel to drop into the debugger by executing either
KGDB_BREAKINST or KGDB_COMPILED_BREAK.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

6bf755db

ARM: idmap: add identity mapping usage note · c5cc87fa

由 Russell King 提交于 7月 29, 2014

Add a note about the usage of the identity mapping; we do not support
accesses outside of the identity map region and kernel image while a
CPU is using the identity map. This is because the identity mapping
may overwrite vmalloc space, IO mappings, the vectors pages, etc.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c5cc87fa

01 8月, 2014 2 次提交

arm64: add newline to I-cache policy string · ea171967

由 Mark Rutland 提交于 8月 01, 2014

Due to a missing newline in the I-cache policy detection log output,
it's possible to get some ratehr unfortunate output at boot time:

CPU1: Booted secondary processor
Detected VIPT I-cache on CPU1CPU2: Booted secondary processor
Detected VIPT I-cache on CPU2CPU3: Booted secondary processor
Detected VIPT I-cache on CPU3CPU4: Booted secondary processor
Detected PIPT I-cache on CPU4CPU5: Booted secondary processor
Detected PIPT I-cache on CPU5Brought up 6 CPUs
SMP: Total of 6 processors activated.

This patch adds the missing newline to the format string, cleaning up
the output.

Fixes: 59ccc0d4 ("arm64: cachetype: report weakest cache policy")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ea171967

arm64: KVM: fix 64bit CP15 VM access for 32bit guests · dedf97e8

由 Marc Zyngier 提交于 8月 01, 2014

Commit f0a3eaff (ARM64: KVM: fix big endian issue in
access_vm_reg for 32bit guest) changed the way we handle CP15
VM accesses, so that all 64bit accesses are done via vcpu_sys_reg.

This looks like a good idea as it solves indianness issues in an
elegant way, except for one small detail: the register index is
doesn't refer to the same array! We end up corrupting some random
data structure instead.

Fix this by reverting to the original code, except for the introduction
of a vcpu_cp15_64_high macro that deals with the endianness thing.

Tested on Juno with 32bit SMP guests.

Cc: Victor Kamensky <victor.kamensky@linaro.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

dedf97e8

31 7月, 2014 8 次提交

x86/kvm: Resolve shadow warnings in macro expansion · 42cbc04f

由 Mark D Rustad 提交于 7月 30, 2014

Resolve shadow warnings that appear in W=2 builds. Instead of
using ret to hold the return pointer, save the length in a new
variable saved_len and compute the pointer on exit. This also
resolves a very technical error, in that ret was declared as
a const char *, when it really was a char * const.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42cbc04f

arm64: KVM: GICv3: move system register access to msr_s/mrs_s · f4c321eb

由 Marc Zyngier 提交于 7月 31, 2014

Commit 72c58395 (arm64: gicv3: Allow GICv3 compilation with
older binutils) changed the way we express the GICv3 system registers,
but couldn't change the occurences used by KVM as the code wasn't
merged yet.

Just fix the accessors.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

f4c321eb

Revert "arm64: dmi: Add SMBIOS/DMI support" · 94156675

由 Will Deacon 提交于 7月 31, 2014

This reverts commit a28e3f4b.

Ard and Yi Li report that this patch is broken by design, so revert it
and let them sort it out for 3.18 instead.
Reported-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

94156675

arm64: fpsimd: fix a typo in fpsimd_save_partial_state ENDPROC · e4aa297a

由 byungchul.park 提交于 7月 31, 2014

Commit 190f1ca8 ("arm64: add support for kernel mode NEON in interrupt
context") introduced a typing error in fpsimd_save_partial_state ENDPROC.

This patch fixes the typing error.
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Nbyungchul.park <byungchul.park@lge.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

e4aa297a

arm64: don't call break hooks for BRK exceptions from EL0 · c878e0cf

由 Will Deacon 提交于 7月 31, 2014

Our break hooks are used to handle brk exceptions from kgdb (and potentially
kprobes if that code ever resurfaces), so don't bother calling them if
the BRK exception comes from userspace.

This prevents userspace from trapping to a kdb shell on systems where
kgdb is enabled and active.

Cc: <stable@vger.kernel.org>
Reported-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c878e0cf

KVM: PPC: PR: Handle FSCR feature deselects · 8e6afa36

由 Alexander Graf 提交于 7月 31, 2014

We handle FSCR feature bits (well, TAR only really today) lazily when the guest
starts using them. So when a guest activates the bit and later uses that feature
we enable it for real in hardware.

However, when the guest stops using that bit we don't stop setting it in
hardware. That means we can potentially lose a trap that the guest expects to
happen because it thinks a feature is not active.

This patch adds support to drop TAR when then guest turns it off in FSCR. While
at it it also restricts FSCR access to 64bit systems - 32bit ones don't have it.
Signed-off-by: NAlexander Graf <agraf@suse.de>

8e6afa36

KVM: s390: rework broken SIGP STOP interrupt handling · db373861

由 David Hildenbrand 提交于 7月 28, 2014

A VCPU might never stop if it intercepts (for whatever reason) between
"fake interrupt delivery" and execution of the stop function.

Heart of the problem is that SIGP STOP is an interrupt that has to be
processed on every SIE entry until the VCPU finally executes the stop
function.

This problem was made apparent by commit 7dfc63cf
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time).
With the old code, the guest could (incorrectly) inject SIGP STOPs
multiple times. The bug of losing a sigp stop exists in KVM before
7dfc63cf, but it was hidden by Linux guests doing a sigp stop loop.
The new code (rightfully) returns CC=2 and does not queue a new
interrupt.

This patch is a simple fix of the problem. Longterm we are going to
rework that code - e.g. get rid of the action bits and so on.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
[some additional patch description]

db373861

ARM: nomadik: fix up double inversion in DT · 3181788c

由 Linus Walleij 提交于 7月 25, 2014

The GPIO pin connected to card detect was inverted twice: once by
the argument to the GPIO line itself where it was magically marked
as active low by the flag GPIO_ACTIVE_LOW (0x01) in the third cell,
and also marked active low AGAIN by explicitly stating
"cd-inverted" (a deprecated method).

After commit 78f87df2
"mmc: mmci: Use the common mmc DT parser" this results in the
line being inverted twice so it was effectively uninverted, while
the old code would not have this effect, instead disregarding the
flag on the GPIO line altogether, which is a bug. I admit the
semantics may be unclear but inverting twice is as good a
definition as any on how this should work.

So fix up the buggy device tree. Use proper #includes so the DTS
is clear and readable.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NOlof Johansson <olof@lixom.net>

3181788c

30 7月, 2014 8 次提交

arm64: defconfig: enable devtmpfs mount option · 3666f880

由 Robert Richter 提交于 7月 30, 2014

Matching x86 and making it more convenient to run the arm64 default
kernel as distros like Ubuntu need this option.
Signed-off-by: NRobert Richter <rrichter@cavium.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

3666f880

KVM: vmx: remove duplicate vmx_mpx_supported() prototype · 296f0475

由 Chris J Arges 提交于 7月 29, 2014

Remove a prototype which was added by both 93c4adc7 and 36be0b9d.
Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

296f0475

arm64: vdso: fix build error when switching from LE to BE · 1915e2ad

由 Arun Chandran 提交于 6月 26, 2014

Building a kernel with CPU_BIG_ENDIAN fails if there are stale objects
from a !CPU_BIG_ENDIAN build. Due to a missing FORCE prerequisite on an
if_changed rule in the VDSO Makefile, we attempt to link a stale LE
object into the new BE kernel.

According to Documentation/kbuild/makefiles.txt, FORCE is required for
if_changed rules and forgetting it is a common mistake, so fix it by
'Forcing' the build of vdso. This patch fixes build errors like these:

arch/arm64/kernel/vdso/note.o: compiled for a little endian system and target is big endian
failed to merge target specific data of file arch/arm64/kernel/vdso/note.o

arch/arm64/kernel/vdso/sigreturn.o: compiled for a little endian system and target is big endian
failed to merge target specific data of file arch/arm64/kernel/vdso/sigreturn.o
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NArun Chandran <achandran@mvista.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

1915e2ad

KVM: s390: Fix memory leak on busy SIGP stop · d514f426

由 Christian Borntraeger 提交于 7月 28, 2014

commit 7dfc63cf
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time)
introduced a memory leak if a sigp stop is already pending. Free
the allocated inti structure.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>

d514f426

KVM: PPC: HV: Remove generic instruction emulation · 29577fc0

由 Alexander Graf 提交于 7月 30, 2014

Now that we have properly split load/store instruction emulation and generic
instruction emulation, we can move the generic one from kvm.ko to kvm-pr.ko
on book3s_64.

This reduces the attack surface and amount of code loaded on HV KVM kernels.
Signed-off-by: NAlexander Graf <agraf@suse.de>

29577fc0

x86/xen: safely map and unmap grant frames when in atomic context · b7dd0e35

由 David Vrabel 提交于 7月 11, 2014

arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
atomic context but were calling alloc_vm_area() which might sleep.

Also, if a driver attempts to allocate a grant ref from an interrupt
and the table needs expanding, then the CPU may already by in lazy MMU
mode and apply_to_page_range() will BUG when it tries to re-enable
lazy MMU mode.

These two functions are only used in PV guests.

Introduce arch_gnttab_init() to allocates the virtual address space in
advance.

Avoid the use of apply_to_page_range() by using saving and using the
array of PTE addresses from the alloc_vm_area() call (which ensures
that the required page tables are pre-allocated).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b7dd0e35

KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr · 5a484c7c

由 Bharat Bhushan 提交于 7月 30, 2014

This are not specific to e500hv but applicable for bookehv
(As per comment from Scott Wood on my patch
"kvm: ppc: bookehv: Added wrapper macros for shadow registers")
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5a484c7c

arm: Add devicetree fixup machine function · 5a12a597

由 Laura Abbott 提交于 7月 15, 2014

Commit 1c2f87c2
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Work around this by introducing a dt_fixup function. This function
gets called before the flattened devicetree is scanned for memory
and the like. In this fixup function for exynos, limit the maximum
number of memory regions in the devicetree.
Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
Tested-by: NAndreas Färber <afaerber@suse.de>
[glikely: Added a comment and fixed up function name]
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

5a12a597

29 7月, 2014 8 次提交

arm64: defconfig: add virtio support for running as a kvm guest · af9b9964

由 Will Deacon 提交于 7月 29, 2014

When running as a kvm guest on a para-virtualised platform, it is useful
to have virtio implementations of console, 9pfs and network.

This adds these options to the arm64 defconfig, so we can easily run a
defconfig kernel build as both host and as a kvm guest.
Signed-off-by: NWill Deacon <will.deacon@arm.com>

af9b9964

ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout · 811a2407

由 Konstantin Khlebnikov 提交于 7月 25, 2014

On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.

When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir.  If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.

However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.

When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings, particularly the mappings of the
kernel itself.

[rewrote commit message and added code comment -- rmk]

Fixes: ae2de101 ("ARM: LPAE: Add identity mapping support for the 3-level page table format")
Signed-off-by: NKonstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

811a2407

ARM: fix alignment of keystone page table fixup · 823a19cd

由 Russell King 提交于 7月 29, 2014

If init_mm.brk is not section aligned, the LPAE fixup code will miss
updating the final PMD.  Fix this by aligning map_end.

Fixes: a77e0c7b ("ARM: mm: Recreate kernel mappings in early_paging_init()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

823a19cd

ARM: dts: Revert enabling of twl configuration for n900 · d937678a

由 Tony Lindgren 提交于 7月 25, 2014

Commit 9188883f (ARM: dts: Enable twl4030 off-idle configuration
for selected omaps) allowed n900 to cut off core voltages during
off-idle. This however caused a regression where twl regulator
vaux1 was not getting enabled for the LCD panel as we are not
requesting it for the panel.

Turns out quite a few devices on n900 are using vaux1, and we need
to either stop idling it, or add proper regulator_get calls for all
users. But until we have a proper solution implemented and tested,
let's just disable the twl off-idle configuration for now for n900.
Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 9188883f (ARM: dts: Enable twl4030 off-idle configuration for selected omaps)
Signed-off-by: NTony Lindgren <tony@atomide.com>

d937678a

x86_64/entry/xen: Do not invoke espfix64 on Xen · 7209a75d

由 Andy Lutomirski 提交于 7月 23, 2014

This moves the espfix64 logic into native_iret.  To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.

This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).

[ hpa: this is a nonzero cost on native, but probably not enough to
  measure. Xen needs to fix this in their own code, probably doing
  something equivalent to espfix64. ]
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

7209a75d

KVM: PPC: Remove DCR handling · ce91ddc4

由 Alexander Graf 提交于 7月 28, 2014

DCR handling was only needed for 440 KVM. Since we removed it, we can also
remove handling of DCR accesses.
Signed-off-by: NAlexander Graf <agraf@suse.de>

ce91ddc4

KVM: PPC: Expose helper functions for data/inst faults · 8de12015

由 Alexander Graf 提交于 6月 18, 2014

We're going to implement guest code interpretation in KVM for some rare
corner cases. This code needs to be able to inject data and instruction
faults into the guest when it encounters them.

Expose generic APIs to do this in a reasonably subarch agnostic fashion.
Signed-off-by: NAlexander Graf <agraf@suse.de>

8de12015

KVM: PPC: Separate loadstore emulation from priv emulation · d69614a2

由 Alexander Graf 提交于 6月 18, 2014

Today the instruction emulator can get called via 2 separate code paths. It
can either be called by MMIO emulation detection code or by privileged
instruction traps.

This is bad, as both code paths prepare the environment differently. For MMIO
emulation we already know the virtual address we faulted on, so instructions
there don't have to actually fetch that information.

Split out the two separate use cases into separate files.
Signed-off-by: NAlexander Graf <agraf@suse.de>

d69614a2

28 7月, 2014 12 次提交

KVM: PPC: Handle magic page in kvmppc_ld/st · c12fb43c

由 Alexander Graf 提交于 6月 20, 2014

We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as
well access the magic page. Special case it out so that we can properly access
it.
Signed-off-by: NAlexander Graf <agraf@suse.de>

c12fb43c

KVM: PPC: Use kvm_read_guest in kvmppc_ld · c45c5514

由 Alexander Graf 提交于 6月 20, 2014

We have a nice and handy helper to read from guest physical address space,
so we should make use of it in kvmppc_ld as we already do for its counterpart
in kvmppc_st.
Signed-off-by: NAlexander Graf <agraf@suse.de>

c45c5514

KVM: PPC: Remove kvmppc_bad_hva() · 9897e88a

由 Alexander Graf 提交于 6月 20, 2014

We have a proper define for invalid HVA numbers. Use those instead of the
ppc specific kvmppc_bad_hva().
Signed-off-by: NAlexander Graf <agraf@suse.de>

9897e88a

KVM: PPC: Move kvmppc_ld/st to common code · 35c4a733

由 Alexander Graf 提交于 6月 20, 2014

We have enough common infrastructure now to resolve GVA->GPA mappings at
runtime. With this we can move our book3s specific helpers to load / store
in guest virtual address space to common code as well.
Signed-off-by: NAlexander Graf <agraf@suse.de>

35c4a733

KVM: PPC: Implement kvmppc_xlate for all targets · 7d15c06f

由 Alexander Graf 提交于 6月 20, 2014

We have a nice API to find the translated GPAs of a GVA including protection
flags. So far we only use it on Book3S, but there's no reason the same shouldn't
be used on BookE as well.

Implement a kvmppc_xlate() version for BookE and clean it up to make it more
readable in general.
Signed-off-by: NAlexander Graf <agraf@suse.de>

7d15c06f

KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page · 63fff5c1

由 Aneesh Kumar K.V 提交于 6月 29, 2014

When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

63fff5c1

crypto: arm-aes - fix encryption of unaligned data · f3c400ef

由 Mikulas Patocka 提交于 7月 25, 2014

Fix the same alignment bug as in arm64 - we need to pass residue
unprocessed bytes as the last argument to blkcipher_walk_done.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# 3.13+
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f3c400ef

crypto: arm64-aes - fix encryption of unaligned data · f960d209

由 Mikulas Patocka 提交于 7月 25, 2014

cryptsetup fails on arm64 when using kernel encryption via AF_ALG socket.
See https://bugzilla.redhat.com/show_bug.cgi?id=1122937

The bug is caused by incorrect handling of unaligned data in
arch/arm64/crypto/aes-glue.c. Cryptsetup creates a buffer that is aligned
on 8 bytes, but not on 16 bytes. It opens AF_ALG socket and uses the
socket to encrypt data in the buffer. The arm64 crypto accelerator causes
data corruption or crashes in the scatterwalk_pagedone.

This patch fixes the bug by passing the residue bytes that were not
processed as the last parameter to blkcipher_walk_done.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f960d209

KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode · 7a58777a

由 Alexander Graf 提交于 7月 14, 2014

With Book3S KVM we can create both PR and HV VMs in parallel on the same
machine. That gives us new challenges on the CAPs we return - both have
different capabilities.

When we get asked about CAPs on the kvm fd, there's nothing we can do. We
can try to be smart and assume we're running HV if HV is available, PR
otherwise. However with the newly added VM CHECK_EXTENSION we can now ask
for capabilities directly on a VM which knows whether it's PR or HV.

With this patch I can successfully expose KVM PVINFO data to user space
in the PR case, fixing magic page mapping for PAPR guests.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

7a58777a

KVM: Rename and add argument to check_extension · 784aa3d7

由 Alexander Graf 提交于 7月 14, 2014

In preparation to make the check_extension function available to VM scope
we add a struct kvm * argument to the function header and rename the function
accordingly. It will still be called from the /dev/kvm fd, but with a NULL
argument for struct kvm *.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

784aa3d7

Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 · 9678cdaa

由 Stewart Smith 提交于 7月 18, 2014

The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying "has way to store and load contents of L2 or
L2+MRU way of L3 cache". We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.

The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
  - 32kb for L2
  - 128kb for L2+L3
  - Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)

We should abort any ongoing logging before initiating one.

To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
  32 cycles)

The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.

The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.

This implementation just does L2 store/load. In my benchmarks this proves
to be useful.

Benchmark 1:
 - 16 core POWER8
 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
 - No split core/SMT
 - two guests running sysbench memory test.
   sysbench --test=memory --num-threads=8 run
 - one guest running apache bench (of default HTML page)
   ab -n 490000 -c 400 http://localhost/

This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.

In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.

benchmark 2:
 - Same VM config as benchmark 1
 - all three guests running sysbench memory benchmark

This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)

The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.

benchmark 3:
 - same 3 guests and CPU configuration as benchmark 1 and 2.
 - two idle guests
 - 1 guest running STREAM benchmark

This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).

benchmark 4:
 - same 3 guests as previous benchmarks
 - two guests running sysbench --memory, distinctly different cache heavy
   workload
 - one guest running STREAM benchmark.

Similar improvements to benchmark 3.

benchmark 5:
 - 1 guest, 8 VCPUs, Ubuntu 14.04
 - Host configured with split core (SMT8, subcores-per-core=4)
 - STREAM benchmark

In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.

Based on preliminary investigation and microbenchmarks
by Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9678cdaa

Split out struct kvmppc_vcore creation to separate function · de9bdd1a

由 Stewart Smith 提交于 7月 18, 2014

No code changes, just split it out to a function so that with the addition
of micro partition prefetch buffer allocation (in subsequent patch) looks
neater and doesn't require excessive indentation.
Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

de9bdd1a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功