- 28 4月, 2017 3 次提交
-
-
由 Nicholas Piggin 提交于
The system reset interrupt is used for crash/debug situations, so it is desirable to have as little impact on the normal state of the system as possible. Currently it uses the current kernel stack to process the exception. This stores into the stack which may be involved with the crash. The stack pointer may be corrupted, or it may have overflowed. Avoid or minimise these problems by creating a dedicated NMI stack for the system reset interrupt to use. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
In preparation for using a dedicated stack for system reset interrupts, prevent a nested system reset from recovering, in order to simplify code that is called in crash/debug path. This allows a system reset interrupt to just use the base stack pointer. Keep an in_nmi nesting counter similarly to the in_mce counter. Consider the interrrupt non-recoverable if it is taken inside another system reset. Interrupt nesting could be allowed similarly to MCE, but system reset is a special case that's not for normal operation, so simplicity wins until there is requirement for nested system reset interrupts. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The system reset interrupt can occur when MSR_EE=0, and it currently uses the PACA_EXGEN save area. Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0 when the save area is still in use. A system reset interrupt in this window can lead to undetected corruption when the save area gets overwritten. This patch introduces PACA_EXNMI save area for system reset exceptions, which closes this corruption window. It's also helpful to retain the EXGEN state for debugging situations, even if not considering the recoverability aspect. This patch also moves the PACA_EXMC area down to a less frequently used part of the paca with the new save area. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 4月, 2017 1 次提交
-
-
由 Gautham R. Shenoy 提交于
POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus the HSPRG0 of a thread waking up from can contain the paca pointer of its sibling. This patch implements a context recovery framework within threads of a core, by provisioning space in paca_struct for saving every sibling threads's paca pointers. Basically, we should be able to arrive at the right paca pointer from any of the thread's existing paca pointer. At bootup, during powernv idle-init, we save the paca address of every CPU in each one its siblings paca_struct in the slot corresponding to this CPU's index in the core. On wakeup from a stop, the thread will determine its index in the core from the TIR register and recover its PACA pointer by indexing into the correct slot in the provisioned space in the current PACA. Furthermore, ensure that the NVGPRs are restored from the stack on the way out by setting the NAPSTATELOST in paca. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: NNicholas Piggin <npiggin@gmail.com> [mpe: Call it a bug] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 4月, 2017 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
We optmize the slice page size array copy to paca by copying only the range based on addr_limit. This will require us to not look at page size array beyond addr_limit in PACA on slb fault. To enable that copy task size to paca which will be used during slb fault. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Rename from task_size to addr_limit, consolidate #ifdefs] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 31 3月, 2017 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
We also update the function arg to struct mm_struct. Move this so that function finds the definition of struct mm_struct. No functional change in this patch. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 1月, 2017 1 次提交
-
-
由 Frederic Weisbecker 提交于
That in order to gather all cputime accumulation to the same place. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 9月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 22 8月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 7月, 2016 1 次提交
-
-
由 Christophe Leroy 提交于
This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture. PPC32 doesn't have the PACA structure, so we use the task_info structure to store the accounting data. In order to reuse on PPC32 the PPC64 functions, all u64 data has been replaced by 'unsigned long' so that it is u32 on PPC32 and u64 on PPC64 Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
- 20 6月, 2016 1 次提交
-
-
由 Mahesh Salgaonkar 提交于
When a guest is assigned to a core it converts the host Timebase (TB) into guest TB by adding guest timebase offset before entering into guest. During guest exit it restores the guest TB to host TB. This means under certain conditions (Guest migration) host TB and guest TB can differ. When we get an HMI for TB related issues the opal HMI handler would try fixing errors and restore the correct host TB value. With no guest running, we don't have any issues. But with guest running on the core we run into TB corruption issues. If we get an HMI while in the guest, the current HMI handler invokes opal hmi handler before forcing guest to exit. The guest exit path subtracts the guest TB offset from the current TB value which may have already been restored with host value by opal hmi handler. This leads to incorrect host and guest TB values. With split-core, things become more complex. With split-core, TB also gets split and each subcore gets its own TB register. When a hmi handler fixes a TB error and restores the TB value, it affects all the TB values of sibling subcores on the same core. On TB errors all the thread in the core gets HMI. With existing code, the individual threads call opal hmi handle independently which can easily throw TB out of sync if we have guest running on subcores. Hence we will need to co-ordinate with all the threads before making opal hmi handler call followed by TB resync. This patch introduces a sibling subcore state structure (shared by all threads in the core) in paca which holds information about whether sibling subcores are in Guest mode or host mode. An array in_guest[] of size MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore. The subcore id is used as index into in_guest[] array. Only primary thread entering/exiting the guest is responsible to set/unset its designated array element. On TB error, we get HMI interrupt on every thread on the core. Upon HMI, this patch will now force guest to vacate the core/subcore. Primary thread from each subcore will then turn off its respective bit from the above bitmap during the guest exit path just after the guest->host partition switch is complete. All other threads that have just exited the guest OR were already in host will wait until all other subcores clears their respective bit. Once all the subcores turn off their respective bit, all threads will will make call to opal hmi handler. It is not necessary that opal hmi handler would resync the TB value for every HMI interrupts. It would do so only for the HMI caused due to TB errors. For rest, it would not touch TB value. Hence to make things simpler, primary thread would call TB resync explicitly once for each core immediately after opal hmi handler instead of subtracting guest offset from TB. TB resync call will restore the TB with host value. Thus we can be sure about the TB state. One of the primary threads exiting the guest will take up the responsibility of calling TB resync. It will use one of the top bits (bit 63) from subcore state flags bitmap to make the decision. The first primary thread (among the subcores) that is able to set the bit will have to call the TB resync. Rest all other threads will wait until TB resync is complete. Once TB resync is complete all threads will then proceed. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 09 1月, 2016 1 次提交
-
-
由 Michael Ellerman 提交于
Commit 2fc251a8 ("powerpc: Copy only required pieces of the mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y and CONFIG_PPC_MM_SLICES=n. That only happens for a kernel built with 4K pages and HUGETLB disabled, which is why we missed it. Fix it by adding a mm_ctx_user_psize member to the paca and populating it in the appropriate places. Fixes: 2fc251a8 ("powerpc: Copy only required pieces of the mm_context_t to the paca") Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 12月, 2015 1 次提交
-
-
由 Michael Neuling 提交于
Currently we copy the whole mm_context_t to the paca but only access a few bits of it. This is wasteful of space paca and also takes quite some time in the hot path of context switching. This patch pulls in only the required bits from the mm_context_t to the paca and on context switch, copies only those. Benchmarking this (On top of Anton's recent MSR context switching changes [1]) using processes and yield shows an improvement of almost 3% on POWER8: http://ozlabs.org/~anton/junkcode/context_switch2.c ./context_switch2 --test=yield --process 0 0 1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.htmlSigned-off-by: NMichael Neuling <mikey@neuling.org> [mpe: Rename paca fields to be mm_ctx_foo rather than context_foo] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 19 12月, 2015 1 次提交
-
-
由 Michael Neuling 提交于
This adds a function to copy the mm->context to the paca. This is only a basic conversion for now but will be used more extensively in the next patch. This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's not used elsewhere. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 4月, 2015 1 次提交
-
-
由 Kevin Hao 提交于
All the cache line size of the current book3e 64bit SoCs are 64 bytes. So we should use this size to align the member of paca_struct. This only change the paca_struct's members which are private to book3e CPUs, and should not have any effect to book3s ones. With this, we save 192 bytes. Also change it to __aligned(size) since it is preferred over __attribute__((aligned(size))). Before: /* size: 1920, cachelines: 30, members: 46 */ /* sum members: 1667, holes: 6, sum holes: 141 */ /* padding: 112 */ After: /* size: 1728, cachelines: 27, members: 46 */ /* sum members: 1667, holes: 4, sum holes: 13 */ /* padding: 48 */ Signed-off-by: NKevin Hao <haokexin@gmail.com> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 15 12月, 2014 2 次提交
-
-
由 Shreyas B. Prabhu 提交于
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Shreyas B. Prabhu 提交于
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Originally-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 02 12月, 2014 1 次提交
-
-
由 Mahesh Salgaonkar 提交于
Cleanup OpalMCE_* definitions/declarations and other related code which is not used anymore. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: NBenjamin Herrrenschmidt <benh@kernel.crashing.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 05 8月, 2014 1 次提交
-
-
由 Mahesh Salgaonkar 提交于
When we hit the HMI in Linux, invoke opal call to handle/recover from HMI errors in real mode and then in virtual mode during check_irq_replay() invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from OPAL and act accordingly. Now that we are ready to handle HMI interrupt directly in linux, remove the HMI interrupt registration with firmware. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 7月, 2014 1 次提交
-
-
由 Michael Ellerman 提交于
Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had a Segment Table (STAB). Now that we've dropped support for those cpus, we can remove the STAB support entirely. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 5月, 2014 1 次提交
-
-
由 Sam bobroff 提交于
Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)" it is no longer possible to set the DSCR on a per-CPU basis. The old behaviour was to minipulate the DSCR SPR directly but this is no longer sufficient: the value is quickly overwritten by context switching. This patch stores the per-CPU DSCR value in a kernel variable rather than directly in the SPR and it is used whenever a process has not set the DSCR itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged. Writes to the old global default (/sys/devices/system/cpu/dscr_default) now set all of the per-CPU values and reads return the last written value. The new per-CPU default is added to the paca_struct and is used everywhere outside of sysfs.c instead of the old global default. Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 20 3月, 2014 2 次提交
-
-
由 Scott Wood 提交于
Add special state saving for critical and machine check exceptions. Most of this code could be used to handle debug exceptions taken from kernel space, but actually doing so is outside the scope of this patch. The various critical and machine check exceptions now point to their real handlers, rather than hanging the kernel. Signed-off-by: NScott Wood <scottwood@freescale.com>
-
由 Scott Wood 提交于
Previously SPRG3 was marked for use by both VDSO and critical interrupts (though critical interrupts were not fully implemented). In commit 8b64a9df ("powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman made an attempt to resolve this conflict by restoring the VDSO value early in the critical interrupt, but this has some issues: - It's incompatible with EXCEPTION_COMMON which restores r13 from the by-then-overwritten scratch (this cost me some debugging time). - It forces critical exceptions to be a special case handled differently from even machine check and debug level exceptions. - It didn't occur to me that it was possible to make this work at all (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after I made (most of) this patch. :-) It might be worth investigating using a load rather than SPRG on return from all exceptions (except TLB misses where the scratch never leaves the SPRG) -- it could save a few cycles. Until then, let's stick with SPRG for all exceptions. Since we cannot use SPRG4-7 for scratch without corrupting the state of a KVM guest, move VDSO to SPRG7 on book3e. Since neither SPRG4-7 nor critical interrupts exist on book3s, SPRG3 is still used for VDSO there. Signed-off-by: NScott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm-ppc@vger.kernel.org
-
- 15 1月, 2014 1 次提交
-
-
由 Paul Gortmaker 提交于
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. The one instance where we add an include for init.h covers off a case where that file was implicitly getting it from another header which itself didn't need it. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 1月, 2014 1 次提交
-
-
由 Scott Wood 提交于
There are a few things that make the existing hw tablewalk handlers unsuitable for e6500: - Indirect entries go in TLB1 (though the resulting direct entries go in TLB0). - It has threads, but no "tlbsrx." -- so we need a spinlock and a normal "tlbsx". Because we need this lock, hardware tablewalk is mandatory on e6500 unless we want to add spinlock+tlbsx to the normal bolted TLB miss handler. - TLB1 has no HES (nor next-victim hint) so we need software round robin (TODO: integrate this round robin data with hugetlb/KVM) - The existing tablewalk handlers map half of a page table at a time, because IBM hardware has a fixed 1MiB indirect page size. e6500 has variable size indirect entries, with a minimum of 2MiB. So we can't do the half-page indirect mapping, and even if we could it would be less efficient than mapping the full page. - Like on e5500, the linear mapping is bolted, so we don't need the overhead of supporting nested tlb misses. Note that hardware tablewalk does not work in rev1 of e6500. We do not expect to support e6500 rev1 in mainline Linux. Signed-off-by: NScott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com>
-
- 05 12月, 2013 1 次提交
-
-
由 Mahesh Salgaonkar 提交于
This patch introduces exclusive emergency stack for machine check exception. We use emergency stack to handle machine check exception so that we can save MCE information (srr1, srr0, dar and dsisr) before turning on ME bit and be ready for re-entrancy. This helps us to prevent clobbering of MCE information in case of nested machine checks. The reason for using emergency stack over normal kernel stack is that the machine check might occur in the middle of setting up a stack frame which may result into improper use of kernel stack. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 17 10月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With later patches supporting PR kvm as a kernel module, the changes that has to be built into the main kernel binary to enable PR KVM module is now selected via KVM_BOOK3S_PR_POSSIBLE Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 14 8月, 2013 2 次提交
-
-
由 Anton Blanchard 提交于
Our ppc64 spinlocks and rwlocks use a trick where a lock token and the paca index are placed in the lock with a single store. Since we are using two u16s they need adjusting for little endian. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
When we have MMU on exceptions (POWER8) and a relocatable kernel, we need to branch from the initial exception vectors at 0x0 to up high where the kernel might be located. Currently we do this using the link register. Unfortunately this corrupts the link stack and instead we should use the count register. We did this for the syscall entry path in: 6a404806 powerpc: Avoid link stack corruption in MMU on syscall entry path but I stupidly forgot to do the same for other exceptions. This patch changes the initial exception vectors to use the count register instead of the link register when we need to branch up to the relocated kernel. I have a dodgy userspace test which loops calling a function that reads the PVR (mfpvr in userspace will be emulated by the kernel via the program check exception). On POWER8 and with CONFIG_RELOCATABLE=y, I get a ~10% performance improvement with my userspace test with this patch. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 15 2月, 2013 2 次提交
-
-
由 Michael Neuling 提交于
Add transactional memory paca scratch register to show_regs. This is useful for debugging. Signed-off-by: NMatt Evans <matt@ozlabs.org> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Geoff Levand 提交于
The powerpc boot_paca symbol is now only used within the early_setup() routine, so move it from its global definition into early_setup(). Signed-off-by: NGeoff Levand <geoff@infradead.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 1月, 2013 1 次提交
-
-
由 Haren Myneni 提交于
[PATCH 3/6] powerpc: Increase exceptions arrays in paca struct to save PPR Using paca to save user defined PPR value in the first level exception vector. Signed-off-by: NHaren Myneni <haren@us.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 17 9月, 2012 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With larger vsid we need to track more bits of ESID in slb cache for slb invalidate. Reviewed-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 07 9月, 2012 1 次提交
-
-
由 Mihai Caraman 提交于
Critical exception on 64-bit booke uses user-visible SPRG3 as scratch. Restore VDSO information in SPRG3 on exception prolog. Use a common sprg3 field in PACA for all powerpc64 architectures. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 3月, 2012 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
-
- 08 12月, 2011 1 次提交
-
-
由 Paul Mackerras 提交于
This fixes a problem where a CPU thread coming out of nap mode can think it has valid values in the nonvolatile GPRs (r14 - r31) as saved away in power7_idle, but in fact the values have been trashed because the thread was used for KVM in the mean time. The result is that the thread crashes because code that called power7_idle (e.g., pnv_smp_cpu_kill_self()) goes to use values in registers that have been trashed. The bit field in SRR1 that tells whether state was lost only reflects the most recent nap, which may not have been the nap instruction in power7_idle. So we need an extra PACA field to indicate that state has been lost even if SRR1 indicates that the most recent nap didn't lose state. We clear this field when saving the state in power7_idle, we set it to a non-zero value when we use the thread for KVM, and we test it in power7_wakeup_noloss. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 20 9月, 2011 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
OPAL can handle various interrupt for us such as Machine Checks (it performs all sorts of recovery tasks and passes back control to us with informations about the error), Hardware Management Interrupts and Softpatch interrupts. This wires up the mechanisms and prints out specific informations returned by HAL when a machine check occurs. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 12 7月, 2011 2 次提交
-
-
由 Paul Mackerras 提交于
This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
There are several fields in struct kvmppc_book3s_shadow_vcpu that temporarily store bits of host state while a guest is running, rather than anything relating to the particular guest or vcpu. This splits them out into a new kvmppc_host_state structure and modifies the definitions in asm-offsets.c to suit. On 32-bit, we have a kvmppc_host_state structure inside the kvmppc_book3s_shadow_vcpu since the assembly code needs to be able to get to them both with one pointer. On 64-bit they are separate fields in the PACA. This means that on 64-bit we don't need to copy the kvmppc_host_state in and out on vcpu load/unload, and in future will mean that the book3s_hv code doesn't need a shadow_vcpu struct in the PACA at all. That does mean that we have to be careful not to rely on any values persisting in the hstate field of the paca across any point where we could block or get preempted. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 29 6月, 2011 1 次提交
-
-
由 Scott Wood 提交于
On MMUs such as FSL where we can guarantee the entire linear mapping is bolted, we don't need to worry about linear TLB misses. If on top of that we do a full table walk, we get rid of all recursive TLB faults, and can dispense with some state saving. This gains a few percent on TLB-miss-heavy workloads, and around 50% on a benchmark that had a high rate of virtual page table faults under the normal handler. While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and EX_TLB_SRR1 as they're not used. [BenH: Fixed build with 64K pages (wsp config)] Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-