- 11 5月, 2016 4 次提交
-
-
由 Aneesh Kumar K.V 提交于
We use the existing "ibm,pa-features" device-tree property to enable Radix MMU mode. This means we default to hash mode unless firmware tells us it's OK to start using Radix mode. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
With 4K page size radix config our level 1 page table size is 64K and it should be naturally aligned. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
The vmalloc range differs between hash and radix config. Hence make VMALLOC_START and related constants a variable which will be runtime initialized depending on whether hash or radix mode is active. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix missing init of ioremap_bot in pgtable_64.c for ppc64e] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
We also use MMU_FTR_RADIX to branch out from code path specific to hash. No functionality change. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 5月, 2016 4 次提交
-
-
由 Aneesh Kumar K.V 提交于
Core kernel doesn't track the page size of the VA range that we are invalidating. Hence we end up flushing TLB for the entire mm here. Later patches will improve this. We also don't flush page walk cache separetly instead use RIC=2 when flushing TLB, because we do a MMU gather flush after freeing page table. MMU_NO_CONTEXT is updated for hash. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
How we switch MMU context differs between hash and radix. For hash we need to switch the SLB details and for radix we need to switch the PID SPR. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Radix and hash MMU models support different page table sizes. Make the #defines a variable so that existing code can work with variable sizes. Slice related code is only used by hash, so use hash constants there. We will replicate some of the boundary conditions with resepct to TASK_SIZE using radix values too. Right now we do boundary condition check using hash constants. Swapper pgdir size is initialized in asm code. We select the max pgd size to keep it simple. For now we select hash pgdir. When adding radix we will switch that to radix pgdir which is 64K. BUILD_BUG_ON check which is removed is already done in hugepage_init() using MAYBE_BUILD_BUG_ON(). Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Use a helper instead of open coding with constants. A later patch will drop the WIMG bits and use PowerISA 3.0 defines. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 4月, 2016 3 次提交
-
-
由 Thiago Jung Bauermann 提交于
In the ppc64 big endian ABI, function symbols point to function descriptors. The symbols which point to the function entry points have a dot in front of the function name. Consequently, when the ftrace filter mechanism searches for the symbol corresponding to an entry point address, it gets the dot symbol. As a result, ftrace filter users have to be aware of this ABI detail on ppc64 and prepend a dot to the function name when setting the filter. The perf probe command insulates the user from this by ignoring the dot in front of the symbol name when matching function names to symbols, but the sysfs interface does not. This patch makes the ftrace filter mechanism do the same when searching symbols. Fixes the following failure in ftracetest's kprobe_ftrace.tc: .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument That failure is on this line of kprobe_ftrace.tc: echo _do_fork > set_ftrace_filter This is because there's no _do_fork entry in the functions list: # cat available_filter_functions | grep _do_fork ._do_fork This change introduces no regressions on the perf and ftracetest testsuite results. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Chris Smart 提交于
The copy paste facility introduced in POWER9 provides an optimised mechanism for a userspace application to copy a cacheline. This is provided by a pair of instructions, copy and paste, while a third, cp_abort (copy paste abort), provides a clean up of the state in case of a failure. The copy instruction will read a 128 byte cacheline and store it in an internal buffer. The subsequent paste instruction will store this internal buffer to memory and set a CR field if the paste succeeds. Since the state of the copy paste buffer is internal (and not architecturally visible), in the unlikely event of a context switch, the state cannot be stored and the paste should therefore fail. The cp_abort instruction exists to fail and clean up any such interrupted copy paste sequence and is to be called by the kernel as part of the context switch. Doing so prevents data from a preceding copy in one process leaking into the paste of another. This code enables use of the cp_abort instruction if a supported processor is detected. NOTE: this is for userspace only, not in kernel, and does not deal with KVM guests. Patch created with much assistance from Michael Neuling <mikey@neuling.org> Signed-off-by: NChris Smart <chris@distroguy.com> Reviewed-by: NCyril Bur <cyrilbur@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Andrew Donnellan 提交于
Found by smatch. Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 21 4月, 2016 2 次提交
-
-
由 Hari Bathini 提交于
The __end_handlers marker was intended to mark down upto code that gets called from exception prologs. But that hasn't kept pace with code changes. Case in point, slb_miss_realmode being called from exception prolog code but isn't below __end_handlers marker. So, __end_handlers marker is as good as a comment but could be misleading at times if it isn't in sync with the code, as is the case now. So, let us avoid this confusion by having a better comment and removing __end_handlers marker altogether. Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Hari Bathini 提交于
Some of the interrupt vectors on 64-bit POWER server processors are only 32 bytes long (8 instructions), which is not enough for the full first-level interrupt handler. For these we need to branch to an out-of-line (OOL) handler. But when we are running a relocatable kernel, interrupt vectors till __end_interrupts marker are copied down to real address 0x100. So, branching to labels (ie. OOL handlers) outside this section must be handled differently (see LOAD_HANDLER()), considering relocatable kernel, which would need at least 4 instructions. However, branching from interrupt vector means that we corrupt the CFAR (come-from address register) on POWER7 and later processors as mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions) that contains the part up to the point where the CFAR is saved in the PACA should be part of the short interrupt vectors before we branch out to OOL handlers. But as mentioned already, there are interrupt vectors on 64-bit POWER server processors that are only 32 bytes long (like vectors 0x4f00, 0x4f20, etc.), which cannot accomodate the above two cases at the same time owing to space constraint. Currently, in these interrupt vectors, we simply branch out to OOL handlers, without using LOAD_HANDLER(), which leaves us vulnerable when running a relocatable kernel (eg. kdump case). While this has been the case for sometime now and kdump is used widely, we were fortunate not to see any problems so far, for three reasons: 1. In almost all cases, production kernel (relocatable) is used for kdump as well, which would mean that crashed kernel's OOL handler would be at the same place where we end up branching to, from short interrupt vector of kdump kernel. 2. Also, OOL handler was unlikely the reason for crash in almost all the kdump scenarios, which meant we had a sane OOL handler from crashed kernel that we branched to. 3. On most 64-bit POWER server processors, page size is large enough that marking interrupt vector code as executable (see commit 429d2e83) leads to marking OOL handler code from crashed kernel, that sits right below interrupt vector code from kdump kernel, as executable as well. Let us fix this by moving the __end_interrupts marker down past OOL handlers to make sure that we also copy OOL handlers to real address 0x100 when running a relocatable kernel. This fix has been tested successfully in kdump scenario, on an LPAR with 4K page size by using different default/production kernel and kdump kernel. Also tested by manually corrupting the OOL handlers in the first kernel and then kdump'ing, and then causing the OOL handlers to fire - mpe. Fixes: c1fb6816 ("powerpc: Add relocation on exception vector handlers") Cc: stable@vger.kernel.org Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 4月, 2016 2 次提交
-
-
由 Michael Ellerman 提交于
Add the kconfig logic & assembly support for handling live patched functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn depends on the new -mprofile-kernel ftrace ABI, which is only supported currently on ppc64le. Live patching is handled by a special ftrace handler. This means it runs from ftrace_caller(). The live patch handler modifies the NIP so as to redirect the return from ftrace_caller() to the new patched function. However there is one particularly tricky case we need to handle. If a function A calls another function B, and it is known at link time that they share the same TOC, then A will not save or restore its TOC, and will call the local entry point of B. When we live patch B, we replace it with a new function C, which may not have the same TOC as A. At live patch time it's too late to modify A to do the TOC save/restore, so the live patching code must interpose itself between A and C, and do the TOC save/restore that A omitted. An additionaly complication is that the livepatch code can not create a stack frame in order to save the TOC. That is because if C takes > 8 arguments, or is varargs, A will have written the arguments for C in A's stack frame. To solve this, we introduce a "livepatch stack" which grows upward from the base of the regular stack, and is used to store the TOC & LR when calling a live patched function. When the patched function returns, we retrieve the real LR & TOC from the livepatch stack, restore them, and pop the livepatch "stack frame". Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NTorsten Duwe <duwe@suse.de> Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
-
由 Michael Ellerman 提交于
In order to support live patching we need to maintain an alternate stack of TOC & LR values. We use the base of the stack for this, and store the "live patch stack pointer" in struct thread_info. Unlike the other fields of thread_info, we can not statically initialise that value, so it must be done at run time. This patch just adds the code to support that, it is not enabled until the next patch which actually adds live patch support. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NBalbir Singh <bsingharora@gmail.com>
-
- 12 4月, 2016 2 次提交
-
-
由 Daniel Axtens 提交于
Sometimes when sparse warns about undefined symbols, it isn't because they should have 'static' added, it's because they're overriding __weak symbols defined elsewhere, and the header has been missed. Fix a few of them by adding appropriate headers. Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Daniel Axtens 提交于
As sparse suggests, these should be made static. Signed-off-by: NDaniel Axtens <dja@axtens.net> Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 4月, 2016 4 次提交
-
-
由 Paul Gortmaker 提交于
The Makefile/Kconfig currently controlling compilation of this code is: obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ paca.o nvram_64.o firmware.o arch/powerpc/platforms/Kconfig.cputype:config PPC64 arch/powerpc/platforms/Kconfig.cputype: bool "64-bit kernel" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We don't replace module.h with init.h since the file already has that. We delete the MODULE_LICENSE tag since that information is already contained at the top of the file in the comments. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Anton Blanchard <anton@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Russell Currey 提交于
IBM online documentation for EEH uses "extended error handling" and "enhanced error handling" to refer to the same thing, in different places. The only place mentioning it as "enhanced error handling" in the kernel is the MAINTAINERS file, and it's "extended" in some documentation. IBM originally defined EEH as "enhanced error handling", so standardise all mentions of EEH to use that term. Signed-off-by: NRussell Currey <ruscur@russell.cc> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
This has been unused since ~2004, remove it. Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
We have a bunch of SLB related code in the tree which is there to handle dynamic VSIDs - but currently it's all disabled at compile time. The comments say "Keep that around for when we re-implement dynamic VSIDs". But that was over 10 years ago (commit 3c726f8d ("[PATCH] ppc64: support 64k pages")). The chance that it would still work unchanged is minimal, and in the meantime it's confusing to folks browsing/grepping the code. If we ever want to re-instate it, it's in the git history. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NBalbir Singh <bsingharora@gmail.com>
-
- 29 3月, 2016 1 次提交
-
-
由 Oliver O'Halloran 提交于
In save_sprs() in process.c contains the following test: if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) t->vrsave = mfspr(SPRN_VRSAVE); CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test is equivilent to: if (cpu_has_feature(CPU_FTR_ALTIVEC) && cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) On CPUs without support for both (i.e G5) this results in vrsave not being saved between context switches. The vector register save/restore code doesn't use VRSAVE to determine which registers to save/restore, but the value of VRSAVE is used to determine if altivec is being used in several code paths. Fixes: 152d523e ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") Cc: stable@vger.kernel.org Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 26 3月, 2016 1 次提交
-
-
由 Alexander Potapenko 提交于
KASAN needs to know whether the allocation happens in an IRQ handler. This lets us strip everything below the IRQ entry point to reduce the number of unique stack traces needed to be stored. Move the definition of __irq_entry to <linux/interrupt.h> so that the users don't need to pull in <linux/ftrace.h>. Also introduce the __softirq_entry macro which is similar to __irq_entry, but puts the corresponding functions to the .softirqentry.text section. Signed-off-by: NAlexander Potapenko <glider@google.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2016 2 次提交
-
-
由 Kees Cook 提交于
This changes several users of manual "on"/"off" parsing to use strtobool. Some side-effects: - these uses will now parse y/n/1/0 meaningfully too - the early_param uses will now bubble up parse errors Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Amitkumar Karwar <akarwar@marvell.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Joe Perches <joe@perches.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nishant Sarmukadam <nishants@marvell.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Steve French <sfrench@samba.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
We can disable debug_pagealloc processing even if the code is compiled with CONFIG_DEBUG_PAGEALLOC. This patch changes the code to query whether it is enabled or not in runtime. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 3月, 2016 2 次提交
-
-
由 Cyril Bur 提交于
Commit 70fe3d98 "powerpc: Restore FPU/VEC/VSX if previously used" introduces a call to restore_math() late in the syscall return path, after MSR_RI has been cleared. The MSR_RI flag is used to indicate whether the kernel can take another exception or not. A cleared MSR_RI flag indicates that the kernel cannot. Unfortunately when a machine is under SLB pressure an SLB miss can occur in restore_math() which (with MSR_RI cleared) leads to an unrecoverable exception. Unrecoverable exception 4100 at c0000000000088d8 cpu 0x0: Vector: 4100 at [c0000003fa473b20] pc: c0000000000088d8: .load_vr_state+0x70/0x110 lr: c00000000000f710: .restore_math+0x130/0x188 sp: c0000003fa473da0 msr: 9000000002003030 current = 0xc0000007f876f180 paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01 pid = 1944, comm = K08umountfs [link register ] c00000000000f710 .restore_math+0x130/0x188 [c0000003fa473da0] c0000003fa473e30 (unreliable) [c0000003fa473e30] c000000000007b6c system_call+0x84/0xfc The clearing of MSR_RI is actually an optimisation to avoid multiple MSR writes, what must be disabled are interrupts. See comment in entry_64.S: /* * For performance reasons we clear RI the same time that we * clear EE. We only need to clear RI just before we restore r13 * below, but batching it with EE saves us one expensive mtmsrd call. * We have to be careful to restore RI if we branch anywhere from * here (eg syscall_exit_work). */ At the point of calling restore_math() r13 has not been restored, as such, the quick fix of turning MSR_RI back on for the call to restore_math() will eliminate the occurrence of an unrecoverable exception. We'd like to do a better fix in future. Fixes: 70fe3d98 ("powerpc: Restore FPU/VEC/VSX if previously used") Signed-off-by: NCyril Bur <cyrilbur@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Scott Wood 提交于
This preserves the ability to build using older binutils (reportedly <= 2.22). Fixes: 6becef7e ("powerpc/mpc85xx: Add CPU hotplug support for E6500") Signed-off-by: NScott Wood <oss@buserror.net> Cc: chenhui.zhao@freescale.com Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 12 3月, 2016 9 次提交
-
-
由 Christophe Leroy 提交于
Remove one instruction in mulhdu Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
Inlining of _dcache_range() functions has shown that the compiler does the same thing a bit better with one insn less Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
flush/clean/invalidate _dcache_range() functions are all very similar and are quite short. They are mainly used in __dma_sync() perf_event locate them in the top 3 consumming functions during heavy ethernet activity They are good candidate for inlining, as __dma_sync() does almost nothing but calling them Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
clear_pages() is never used expect by clear_page, and PPC32 is the only architecture (still) having this function. Neither PPC64 nor any other architecture has it. This patch removes clear_pages() and moves clear_page() function inline (same as PPC64) as it only is a few isns Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
On PPC8xx, flushing instruction cache is performed by writing in register SPRN_IC_CST. This registers suffers CPU6 ERRATA. The patch rewrites the fonction in C so that CPU6 ERRATA will be handled transparently Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
There is no real need to have set_context() in assembly. Now that we have mtspr() handling CPU6 ERRATA directly, we can rewrite set_context() in C language for easier maintenance. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
CPU6 ERRATA is now handled directly in mtspr(), so we can use the standard set_dec() fonction in all cases. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
On a live running system (VoIP gateway for Air Trafic Control), over a 10 minutes period (with 277s idle), we get 87 millions DTLB misses and approximatly 35 secondes are spent in DTLB handler. This represents 5.8% of the overall time and even 10.8% of the non-idle time. Among those 87 millions DTLB misses, 15% are on user addresses and 85% are on kernel addresses. And within the kernel addresses, 93% are on addresses from the linear address space and only 7% are on addresses from the virtual address space. MPC8xx has no BATs but it has 8Mb page size. This patch implements mapping of kernel RAM using 8Mb pages, on the same model as what is done on the 40x. In 4k pages mode, each PGD entry maps a 4Mb area: we map every two entries to the same 8Mb physical page. In each second entry, we add 4Mb to the page physical address to ease life of the FixupDAR routine. This is just ignored by HW. In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry will point to the first page of the area. The DTLB handler adds the 3 bits from EPN to map the correct page. With this patch applied, we now get only 13 millions TLB misses during the 10 minutes period. The idle time has increased to 313s and the overall time spent in DTLB miss handler is 6.3s, which represents 1% of the overall time and 2.2% of non-idle time. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
由 Christophe Leroy 提交于
We are spending between 40 and 160 cycles with a mean of 65 cycles in the DTLB handling routine (measured with mftbl) so make it more simple althought it adds one instruction. With this modification, we get three registers available at all time, which will help with following patch. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
- 10 3月, 2016 1 次提交
-
-
由 Christophe Leroy 提交于
When CONFIG_DEBUG_PAGEALLOC is activated, the initial TLB mapping gets flushed to track accesses to wrong areas. Therefore, kernel addresses will also generate ITLB misses. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
-
- 09 3月, 2016 3 次提交
-
-
由 Andrew Donnellan 提交于
In eeh_pci_enable(), after making the request to set the new options, we call eeh_ops->wait_state() to check that the request finished successfully. At the moment, if eeh_ops->wait_state() returns 0, we return 0 without checking that it reflects the expected outcome. This can lead to callers further up the chain incorrectly assuming the slot has been successfully unfrozen and continuing to attempt recovery. On powernv, this will occur if pnv_eeh_get_pe_state() or pnv_eeh_get_phb_state() return 0, which in turn occurs if the relevant OPAL call returns OPAL_EEH_STOPPED_MMIO_DMA_FREEZE or OPAL_EEH_PHB_ERROR respectively. On pseries, this will occur if pseries_eeh_get_state() returns 0, which in turn occurs if RTAS reports that the PE is in the MMIO Stopped and DMA Stopped states. Obviously, none of these cases represent a successful completion of a request to thaw MMIO or DMA. Fix the check so that a wait_state() return value of 0 won't be considered successful for the EEH_OPT_THAW_MMIO or EEH_OPT_THAW_DMA cases. Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
When eeh_dump_pe_log() is only called by eeh_slot_error_detail(), we already have the check that the PE isn't in PCI config blocked state in eeh_slot_error_detail(). So we needn't the duplicated check in eeh_dump_pe_log(). This removes the duplicated check in eeh_dump_pe_log(). No logical changes introduced. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
When passing through SRIOV VFs to guest, we possibly encounter EEH error on PF. In this case, the VF PEs are put into frozen state. The error could be reported to guest before it's captured by the host. That means the guest could attempt to recover errors on VFs before host gets chance to recover errors on PFs. The VFs won't be recovered successfully. This enforces the recovery order for above case: the recovery on child PE in guest is hold until the recovery on parent PE in host is completed. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-