提交 · 17ed4c8f81da2bf340d33a8c875f4d6b1dfd9398 · openanolis / cloud-kernel

11 4月, 2017 1 次提交

powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f

由 Gautham R. Shenoy 提交于 3月 22, 2017

POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
the HSPRG0 of a thread waking up from can contain the paca pointer of
its sibling.

This patch implements a context recovery framework within threads of a
core, by provisioning space in paca_struct for saving every sibling
threads's paca pointers. Basically, we should be able to arrive at the
right paca pointer from any of the thread's existing paca pointer.

At bootup, during powernv idle-init, we save the paca address of every
CPU in each one its siblings paca_struct in the slot corresponding to
this CPU's index in the core.

On wakeup from a stop, the thread will determine its index in the core
from the TIR register and recover its PACA pointer by indexing into
the correct slot in the provisioned space in the current PACA.

Furthermore, ensure that the NVGPRs are restored from the stack on the
way out by setting the NAPSTATELOST in paca.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Call it a bug]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

17ed4c8f

01 4月, 2017 1 次提交

powerpc/mm/hash: Store addr_limit in PACA · bb183221

由 Aneesh Kumar K.V 提交于 3月 22, 2017

We optmize the slice page size array copy to paca by copying only the
range based on addr_limit. This will require us to not look at page size
array beyond addr_limit in PACA on slb fault. To enable that copy task
size to paca which will be used during slb fault.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rename from task_size to addr_limit, consolidate #ifdefs]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bb183221

15 2月, 2017 2 次提交

powerpc/asm: Define STACK_PT_REGS_OFFSET macro in asm-offsets.c · 10d4cf18

由 Rashmica Gupta 提交于 6月 02, 2016

There are quite a few entries in asm-offests.c which look like:

  DEFINE(REG, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, reg));

So define a macro to do it once.
Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
[mpe: Rename to STACK_PT_REGS_OFFSET for excruciating explicitness]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

10d4cf18

powerpc/asm: Use OFFSET macro in asm-offsets.c · 45465615

由 Rashmica Gupta 提交于 2月 15, 2017

A lot of entries in asm-offests.c look like this:

  DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));

But there is a common macro, OFFSET, which makes this cleaner:

  OFFSET(TI_flags, thread_info, flags)

So use it.
Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

45465615

06 2月, 2017 2 次提交

powerpc/64: Clean up ppc64_caches using a struct per cache · e2827fe5

由 Benjamin Herrenschmidt 提交于 1月 08, 2017

We have two set of identical struct members for the I and D sides
and mostly identical bunches of code to parse the device-tree to
populate them. Instead make a ppc_cache_info structure with one
copy for I and one for D
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e2827fe5

powerpc/64: Fix naming of cache block vs. cache line · bd067f83

由 Benjamin Herrenschmidt 提交于 1月 08, 2017

In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).

We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bd067f83

31 1月, 2017 1 次提交

KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests · f4c51f84

由 Paul Mackerras 提交于 1月 30, 2017

This adds code to  branch around the parts that radix guests don't
need - clearing and loading the SLB with the guest SLB contents,
saving the guest SLB contents on exit, and restoring the host SLB
contents.

Since the host is now using radix, we need to save and restore the
host value for the PID register.

On hypervisor data/instruction storage interrupts, we don't do the
guest HPT lookup on radix, but just save the guest physical address
for the fault (from the ASDR register) in the vcpu struct.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f4c51f84

24 1月, 2017 1 次提交

powerpc: Revert the initial stack protector support · f2574030

由 Michael Ellerman 提交于 1月 24, 2017

Unfortunately the stack protector support we merged recently only works
on some toolchains. If the toolchain is built without glibc support
everything works fine, but if glibc is built then it leads to a panic
at boot.

The solution is not rc5 material, so revert the support for now. This
reverts commits:

6533b7c1 ("powerpc: Initial stack protector (-fstack-protector) support")
902e06eb ("powerpc/32: Change the stack protector canary value per task")

Fixes: 6533b7c1 ("powerpc: Initial stack protector (-fstack-protector) support")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f2574030

14 1月, 2017 1 次提交

sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick · 8c8b73c4

由 Frederic Weisbecker 提交于 1月 05, 2017

In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
cputime accounting to the tick, provide finegrained accumulators to
powerpc in order to store the cputime until flushing.

While at it, normalize the name of several fields according to common
cputime naming.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8c8b73c4

24 11月, 2016 2 次提交

KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 · 7c5b06ca

由 Paul Mackerras 提交于 11月 18, 2016

POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
and tlbiel (local tlbie) instructions.  Both instructions get a
set of new parameters (RIC, PRS and R) which appear as bits in the
instruction word.  The tlbiel instruction now has a second register
operand, which contains a PID and/or LPID value if needed, and
should otherwise contain 0.

This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
as well as older processors.  Since we only handle HPT guests so
far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
word as on previous processors, so we don't need to conditionally
execute different instructions depending on the processor.

The local flush on first entry to a guest in book3s_hv_rmhandlers.S
is a loop which depends on the number of TLB sets.  Rather than
using feature sections to set the number of iterations based on
which CPU we're on, we now work out this number at VM creation time
and store it in the kvm_arch struct.  That will make it possible to
get the number from the device tree in future, which will help with
compatibility with future processors.

Since mmu_partition_table_set_entry() does a global flush of the
whole LPID, we don't need to do the TLB flush on first entry to the
guest on each processor.  Therefore we don't set all bits in the
tlb_need_flush bitmap on VM startup on POWER9.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

7c5b06ca

KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08

由 Paul Mackerras 提交于 11月 18, 2016

This adds code to handle two new guest-accessible special-purpose
registers on POWER9: TIDR (thread ID register) and PSSCR (processor
stop status and control register).  They are context-switched
between host and guest, and the guest values can be read and set
via the one_reg interface.

The PSSCR contains some fields which are guest-accessible and some
which are only accessible in hypervisor mode.  We only allow the
guest-accessible fields to be read or set by userspace.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e9cf1e08

23 11月, 2016 1 次提交

powerpc/32: Change the stack protector canary value per task · 902e06eb

由 Christophe Leroy 提交于 11月 22, 2016

Partially copied from commit df0698be ("ARM: stack protector:
change the canary value per task")

A new random value for the canary is stored in the task struct whenever
a new task is forked.  This is meant to allow for different canary values
per task.  On powerpc, GCC expects the canary value to be found in a global
variable called __stack_chk_guard.  So this variable has to be updated
with the value stored in the task struct whenever a task switch occurs.

Because the variable GCC expects is global, this cannot work on SMP
unfortunately.  So, on SMP, the same initial canary value is kept
throughout, making this feature a bit less effective although it is still
useful.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

902e06eb

21 11月, 2016 1 次提交

KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state · 0d808df0

由 Paul Mackerras 提交于 11月 07, 2016

When switching from/to a guest that has a transaction in progress,
we need to save/restore the checkpointed register state.  Although
XER is part of the CPU state that gets checkpointed, the code that
does this saving and restoring doesn't save/restore XER.

This fixes it by saving and restoring the XER.  To allow userspace
to read/write the checkpointed XER value, we also add a new ONE_REG
specifier.

The visible effect of this bug is that the guest may see its XER
value being corrupted when it uses transactions.

Fixes: e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support")
Fixes: 0a8eccef ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

0d808df0

04 10月, 2016 1 次提交

powerpc: tm: Rename transct_(*) to ck(\1)_state · 000ec280

由 Cyril Bur 提交于 9月 23, 2016

Make the structures being used for checkpointed state named
consistently with the pt_regs/ckpt_regs.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

000ec280

27 9月, 2016 1 次提交

KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9

由 Paul Mackerras 提交于 9月 15, 2016

POWER8 has one virtual timebase (VTB) register per subcore, not one
per CPU thread.  The HV KVM code currently treats VTB as a per-thread
register, which can lead to spurious soft lockup messages from guests
which use the VTB as the time source for the soft lockup detector.
(CPUs before POWER8 did not have the VTB register.)

For HV KVM, this fixes the problem by making only the primary thread
in each virtual core save and restore the VTB value.  With this,
the VTB state becomes part of the kvmppc_vcore structure.  This
also means that "piggybacking" of multiple virtual cores onto one
subcore is not possible on POWER8, because then the virtual cores
would share a single VTB register.

PR KVM emulates a VTB register, which is per-vcpu because PR KVM
has no notion of CPU threads or SMT.  For PR KVM we move the VTB
state into the kvmppc_vcpu_book3s struct.

Cc: stable@vger.kernel.org # v3.14+
Reported-by: NThomas Huth <thuth@redhat.com>
Tested-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

88b02cf9

09 7月, 2016 3 次提交

powerpc/8xx: Force VIRT_IMMR_BASE to be a positive number · 9f595fd8

由 Scott Wood 提交于 7月 09, 2016

The asm-offsets mechanism generates signed numbers, even if the
input value is explicitly unsigned.  This causes a problem with
older binutils (e.g. 2.23), which sign-extend a negative number
when @h is applied.  Thus, this instruction:

	cmpli   cr0, r11, VIRT_IMMR_BASE@h

resulted in this:

Error: operand out of range (0xfffffff0 is not between 0x00000000 and
0x0000ffff)

By casting to a larger type, we can force the output to be expressed
as a positive number.
Signed-off-by: NScott Wood <oss@buserror.net>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>

9f595fd8

powerpc/8xx: Fix vaddr for IMMR early remap · f86ef74e

由 Christophe Leroy 提交于 5月 17, 2016

Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
648K rodata, 508K init, 290K bss, 6644K reserved)
Kernel virtual memory layout:
  * 0xfffdf000..0xfffff000  : fixmap
  * 0xfde00000..0xfe000000  : consistent mem
  * 0xfddf6000..0xfde00000  : early ioremap
  * 0xc9000000..0xfddf6000  : vmalloc & ioremap
SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1

Today, IMMR is mapped 1:1 at startup

Mapping IMMR 1:1 is just wrong because it may overlap with another
area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
but for instance on EP88xC board, IMMR is at 0xfa200000 which
overlaps with VM ioremap area

This patch fixes the virtual address for remapping IMMR with the fixmap
regardless of the value of IMMR.

The size of IMMR area is 256kbytes (CPM at offset 0, security engine
at offset 128k) so a 512k page is enough
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

f86ef74e

powerpc32: provide VIRT_CPU_ACCOUNTING · c223c903

由 Christophe Leroy 提交于 5月 17, 2016

This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
PPC32 doesn't have the PACA structure, so we use the task_info
structure to store the accounting data.

In order to reuse on PPC32 the PPC64 functions, all u64 data has
been replaced by 'unsigned long' so that it is u32 on PPC32 and
u64 on PPC64
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

c223c903

16 6月, 2016 1 次提交

powerpc/asm: Remove unused symbols in asm-offsets.c · aac6a91f

由 Rashmica Gupta 提交于 6月 02, 2016

THREAD_DSCR:
  Added in efcac658 "powerpc: Per process DSCR + some fixes (try#4)"
  Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"

THREAD_DSCR_INHERIT:
  Added in 71433285 "powerpc: Restore correct DSCR in context switch"
  Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"

THREAD_TAR:
  Added in 2468dcf6 "powerpc: Add support for context switching the TAR register"
  Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"

THREAD_BESCR, THREAD_EBBHR and THREAD_EBBRR:
  Added in 9353374b "powerpc: Context switch the new EBB SPRs"
  Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"

THREAD_SIAR, THREAD_SDAR, THREAD_SIER, THREAD_MMCR0, and THREAD_MMCR2:
  Added in 59affcd3 "powerpc: Context switch more PMU related SPRs"
  Last usage removed in b11ae951 "powerpc: Partial revert of "Context switch more PMU related SPRs""

PACA_LOCK_TOKEN:
  Added in 9e368f29 "KVM: PPC: book3s_hv: Add support for PPC970-family processors"
  Last usage removed in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970 processors"

HCALL_STAT_SIZE, HCALL_STAT_CALLS, HCALL_STAT_TB and HCALL_STAT_PURR:
  Added in 57852a85 "[POWERPC] powerpc: Instrument Hypervisor Calls"
  Last usage removed in c8cd093a "powerpc: tracing: Add hypervisor call tracepoints"

VCPU_EPLC:
  Added in d30f6e48 "KVM: PPC: booke: category E.HV (GS-mode) support"
  Never used.

CPU_DOWN_FLUSH:
  Added in e7affb1d "powerpc/cache: add cache flush operation for various e500"
  Never used.

CFG_STAMP_XSEC:
  Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
  Last usage removed in 0e469db8 "powerpc: Rework VDSO gettimeofday to prevent time going backwards"

KVM_LPCR:
  Added in aa04b4cc "KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests"
  Last usage removed in a0144e2a "KVM: PPC: Book3S HV: Store LPCR value for each virtual core"

GPR15, GPR16, GPR17, GPR18, GPR19, GPR20, GPR21, GPR22, GPR23, GPR24,
GPR25, GPR26, GPR27, GPR28, GPR29, GPR30 and GPR31:
  Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
  Never used.

VCPU_SHADOW_FSCR:
  Added in 616dff86 "KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR"
  Never used.

VCPU_SHADOW_SRR1:
  Added in a2d56020 "KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu"
  Never used.

KVM_SPLIT_SIZE:
  Added in b4deba5c "KVM: PPC: Book3S HV: Implement dynamicmicro-threading on POWER8"
  Never used.

VCPU_VCPUID:
  Added in de56a948 "KVM: PPC: Add support for Book3S processors in hypervisor mode"
  Last usage removed 1b400ba0 "KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations"

_MQ:
  Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
  Never used.

AUDITCONTEXT:
  Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
  Last usage removed in 401d1f02 "[PATCH] syscall entry/exit revamp"

CLONE_VM:
  Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
  Currently unused.

CLONE_UNTRACED:
  Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
  Currently unused.
Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
[mpe: Munge change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aac6a91f

01 5月, 2016 1 次提交

powerpc/mm: Make page table size a variable · dd1842a2

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Radix and hash MMU models support different page table sizes. Make
the #defines a variable so that existing code can work with variable
sizes.

Slice related code is only used by hash, so use hash constants there. We
will replicate some of the boundary conditions with resepct to TASK_SIZE
using radix values too. Right now we do boundary condition check using
hash constants.

Swapper pgdir size is initialized in asm code. We select the max pgd
size to keep it simple. For now we select hash pgdir. When adding radix
we will switch that to radix pgdir which is 64K.

BUILD_BUG_ON check which is removed is already done in hugepage_init()
using MAYBE_BUILD_BUG_ON().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dd1842a2

14 4月, 2016 1 次提交

powerpc/livepatch: Add live patching support on ppc64le · 85baa095

由 Michael Ellerman 提交于 3月 24, 2016

Add the kconfig logic & assembly support for handling live patched
functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
depends on the new -mprofile-kernel ftrace ABI, which is only supported
currently on ppc64le.

Live patching is handled by a special ftrace handler. This means it runs
from ftrace_caller(). The live patch handler modifies the NIP so as to
redirect the return from ftrace_caller() to the new patched function.

However there is one particularly tricky case we need to handle.

If a function A calls another function B, and it is known at link time
that they share the same TOC, then A will not save or restore its TOC,
and will call the local entry point of B.

When we live patch B, we replace it with a new function C, which may
not have the same TOC as A. At live patch time it's too late to modify A
to do the TOC save/restore, so the live patching code must interpose
itself between A and C, and do the TOC save/restore that A omitted.

An additionaly complication is that the livepatch code can not create a
stack frame in order to save the TOC. That is because if C takes > 8
arguments, or is varargs, A will have written the arguments for C in
A's stack frame.

To solve this, we introduce a "livepatch stack" which grows upward from
the base of the regular stack, and is used to store the TOC & LR when
calling a live patched function.

When the patched function returns, we retrieve the real LR & TOC from
the livepatch stack, restore them, and pop the livepatch "stack frame".
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NTorsten Duwe <duwe@suse.de>
Reviewed-by: NBalbir Singh <bsingharora@gmail.com>

85baa095

05 3月, 2016 1 次提交

powerpc/cache: add cache flush operation for various e500 · e7affb1d

由 chenhui zhao 提交于 11月 20, 2015

Various e500 core have different cache architecture, so they
need different cache flush operations. Therefore, add a callback
function cpu_flush_caches to the struct cpu_spec. The cache flush
operation for the specific kind of e500 is selected at init time.
The callback function will flush all caches inside the current cpu.
Signed-off-by: NChenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: NTang Yuantian <Yuantian.Tang@feescale.com>
Signed-off-by: NScott Wood <oss@buserror.net>

e7affb1d

02 3月, 2016 1 次提交

powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98

由 Cyril Bur 提交于 2月 29, 2016

Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
a problem unless a process is using these facilities.

Modern versions of GCC are very good at automatically vectorising code,
new and modernised workloads make use of floating point and vector
facilities, even the kernel makes use of vectorised memcpy.

All this combined greatly increases the cost of a syscall since the
kernel uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.

The obvious overcompensation to this problem is to simply always load
all the facilities on every exit to userspace. Loading up all FPU, VEC
and VSX registers every time can be expensive and if a workload does
avoid using them, it should not be forced to incur this penalty.

An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back
to zero.

Several versions of the assembly in entry_64.S were tested:

  1. Always calling C.
  2. Performing a common case check and then calling C.
  3. A complex check in asm.

After some benchmarking it was determined that avoiding C in the common
case is a performance benefit (option 2). The full check in asm (option
3) greatly complicated that codepath for a negligible performance gain
and the trade-off was deemed not worth it.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
[mpe: Move load_vec in the struct to fill an existing hole, reword change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fixup

70fe3d98

27 12月, 2015 1 次提交

powerpc: Copy only required pieces of the mm_context_t to the paca · 2fc251a8

由 Michael Neuling 提交于 12月 11, 2015

Currently we copy the whole mm_context_t to the paca but only access a
few bits of it.  This is wasteful of space paca and also takes quite
some time in the hot path of context switching.

This patch pulls in only the required bits from the mm_context_t to
the paca and on context switch, copies only those.

Benchmarking this (On top of Anton's recent MSR context switching
changes [1]) using processes and yield shows an improvement of almost
3% on POWER8:

  http://ozlabs.org/~anton/junkcode/context_switch2.c
  ./context_switch2 --test=yield --process 0 0

1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.htmlSigned-off-by: NMichael Neuling <mikey@neuling.org>
[mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2fc251a8

19 12月, 2015 1 次提交

powerpc: Add function to copy mm_context_t to the paca · c395465d

由 Michael Neuling 提交于 10月 28, 2015

This adds a function to copy the mm->context to the paca.  This is
only a basic conversion for now but will be used more extensively in
the next patch.

This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's
not used elsewhere.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c395465d

22 8月, 2015 2 次提交

KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 · b4deba5c

由 Paul Mackerras 提交于 7月 02, 2015

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b4deba5c

KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165

由 Paul Mackerras 提交于 6月 24, 2015

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu->arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Tested-by: NLaurent Vivier <lvivier@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ec257165

18 8月, 2015 1 次提交

powerpc/e6500: remove the stale TCD_LOCK macro · e5e55cc0

由 Kevin Hao 提交于 8月 13, 2015

Since we moved the "lock" to be the first element of
struct tlb_core_data in commit 82d86de2 ("powerpc/e6500: Make TLB
lock recursive"), this macro is not used by any code. Just delete it.
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Signed-off-by: NScott Wood <scottwood@freescale.com>

e5e55cc0

07 6月, 2015 1 次提交

powerpc/kernel: Rename PACA_DSCR to PACA_DSCR_DEFAULT · 1db36525

由 Anshuman Khandual 提交于 5月 21, 2015

PACA_DSCR offset macro tracks dscr_default element in the paca
structure. Better change the name of this macro to match that of the
data element it tracks. Makes the code more readable.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1db36525

21 4月, 2015 5 次提交

KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61

由 Paul Mackerras 提交于 3月 28, 2015

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
  rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr:     1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry:     3060.1ns (212 - 65138, 953873 samples)
  rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
  rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

66feed61

KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da

由 Paul Mackerras 提交于 3月 28, 2015

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each. The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7d6c40da

KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd

由 Paul Mackerras 提交于 3月 28, 2015

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5d5b99cd

KVM: PPC: Book3S HV: Minor cleanups · 1f09c3ed

由 Paul Mackerras 提交于 3月 28, 2015

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

1f09c3ed

KVM: PPC: Book3S HV: Accumulate timing information for real-mode code · b6c295df

由 Paul Mackerras 提交于 3月 28, 2015

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b6c295df

29 12月, 2014 1 次提交

powerpc/kvm: Create proper names for the kvm_host_state PMU fields · 9a4fc4ea

由 Michael Ellerman 提交于 7月 10, 2014

We have two arrays in kvm_host_state that contain register values for
the PMU. Currently we only create an asm-offsets symbol for the base of
the arrays, and do the array offset in the assembly code.

Creating an asm-offsets symbol for each field individually makes the
code much nicer to read, particularly for the MMCRx/SIxR/SDAR fields, and
might have helped us notice the recent double restore bug we had in this
code.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NAlexander Graf <agraf@suse.de>

9a4fc4ea

17 12月, 2014 2 次提交

KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61

由 Paul Mackerras 提交于 12月 03, 2014

There are two ways in which a guest instruction can be obtained from
the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
instruction), the offending instruction is in the HEIR register
(Hypervisor Emulation Instruction Register).  If the exit was caused
by a load or store to an emulated MMIO device, we load the instruction
from the guest by turning data relocation on and loading the instruction
with an lwz instruction.

Unfortunately, in the case where the guest has opposite endianness to
the host, these two methods give results of different endianness, but
both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
using guest endianness, whereas the lwz will load the instruction using
host endianness.  The rest of the code that uses vcpu->arch.last_inst
assumes it was loaded using host endianness.

To fix this, we define a new vcpu field to store the HEIR value.  Then,
in kvmppc_handle_exit_hv(), we transfer the value from this new field to
vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
differ.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4a157d61

KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf

由 Paul Mackerras 提交于 12月 03, 2014

This removes the code that was added to enable HV KVM to work
on PPC970 processors.  The PPC970 is an old CPU that doesn't
support virtualizing guest memory.  Removing PPC970 support also
lets us remove the code for allocating and managing contiguous
real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
case, the code for pinning pages of guest memory when first
accessed and keeping track of which pages have been pinned, and
the code for handling H_ENTER hypercalls in virtual mode.

Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
The KVM_CAP_PPC_RMA capability now always returns 0.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c17b98cf

15 12月, 2014 2 次提交

powernv/powerpc: Add winkle support for offline cpus · 77b54e9f

由 Shreyas B. Prabhu 提交于 12月 10, 2014

Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

77b54e9f

powernv/cpuidle: Redesign idle states management · 7cba160a

由 Shreyas B. Prabhu 提交于 12月 10, 2014

Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.

This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7cba160a

02 12月, 2014 1 次提交

powerpc/powernv: Cleanup unused MCE definitions/declarations. · 6d626c5e

由 Mahesh Salgaonkar 提交于 11月 24, 2014

Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NBenjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6d626c5e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功