- 27 6月, 2017 3 次提交
-
-
由 Michael Neuling 提交于
Update to real function name. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Neuling 提交于
The asm code assumes the FP regs are at the start of fp_state. While this is true now, it may not always be the case and there is nothing enforcing it. This fixes the asm-offsets to point to the actual FP registers inside the fp_state. Similarly for VMX. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Neuling 提交于
The P9 PVR bits 12-15 don't indicate a revision but instead different chip configurations. From BookIV we have: Bits Configuration 0 : Scale out 12 cores 1 : Scale out 24 cores 2 : Scale up 12 cores 3 : Scale up 24 cores DD1 doesn't use this but DD2 does. Linux will mostly use the "Scale out 24 core" configuration (ie. SMT4 not SMT8) which results in a PVR of 0x004e1200. The reported revision in /proc/cpuinfo is hence reported incorrectly as "18.0". This patch fixes this to mask off only the relevant bits for the major revision (ie. bits 8-11) for POWER9. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 6月, 2017 1 次提交
-
-
由 Paul Mackerras 提交于
This converts the powerpc VDSO time update function to use the new interface introduced in commit 576094b7 ("time: Introduce new GENERIC_TIME_VSYSCALL", 2012-09-11). Where the old interface gave us the time as of the last update in seconds and whole nanoseconds, with the new interface we get the nanoseconds part effectively in a binary fixed-point format with tk->tkr_mono.shift bits to the right of the binary point. With the old interface, the fractional nanoseconds got truncated, meaning that the value returned by the VDSO clock_gettime function would have about 1ns of jitter in it compared to the value computed by the generic timekeeping code in the kernel. The powerpc VDSO time functions (clock_gettime and gettimeofday) already work in units of 2^-32 seconds, or 0.23283 ns, because that makes it simple to split the result into seconds and fractional seconds, and represent the fractional seconds in either microseconds or nanoseconds. This is good enough accuracy for now, so this patch avoids changing how the VDSO works or the interface in the VDSO data page. This patch converts the powerpc update_vsyscall_old to be called update_vsyscall and use the new interface. We convert the fractional second to units of 2^-32 seconds without truncating to whole nanoseconds. (There is still a conversion to whole nanoseconds for any legacy users of the vdso_data/systemcfg stamp_xtime field.) In addition, this improves the accuracy of the computation of tb_to_xs for those systems with high-frequency timebase clocks (>= 268.5 MHz) by doing the right shift in two parts, one before the multiplication and one after, rather than doing the right shift before the multiplication. (We can't do all of the right shift after the multiplication unless we use 128-bit arithmetic.) Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Acked-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 21 6月, 2017 4 次提交
-
-
由 Santosh Sivaraj 提交于
Since trace_clock is in a different file and already marked with notrace, enable tracing in time.c by removing it from the disabled list in Makefile. Also annotate clocksource read functions and sched_clock with notrace. Testing: Timer and ftrace selftests run with different trace clocks. Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NSantosh Sivaraj <santosh@fossix.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid confusion over whether it runs in real or virtual mode - it runs in both. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
-
由 Michael Ellerman 提交于
slb_miss_realmode() doesn't always runs in real mode, which is what the name implies. So rename it to avoid confusing people. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
-
由 Michael Ellerman 提交于
All the callers of slb_miss_realmode currently open code the #ifndef CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case. We have a macro to do this, BRANCH_TO_COMMON(), so use it. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
-
- 20 6月, 2017 3 次提交
-
-
由 Nicholas Piggin 提交于
The SLB miss handler uses r3 for the faulting address but r12 is mostly able to be freed up to save r3 in. It just requires SRR1 be reloaded again on error. It would be more conventional to use r12 for SRR1 (and use r11 to save r3), but slb_allocate_realmode clobbers r11 and not r12. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The EXCEPTION_PROLOG_1 used by SLB miss already saves CTR when the kernel is built with CONFIG_RELOCATABLE. So it does not have to be saved and reloaded when branching to slb_miss_realmode. It can be restored from the PACA as usual. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The EX_DAR save area is only used in exceptional cases. With r3 no longer clobbered by slb_allocate_realmode, saving faulting address to EX_DAR can be deferred to those cases. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 19 6月, 2017 7 次提交
-
-
由 Nicholas Piggin 提交于
In a busy system, idle wakeups can be expected from IPIs and device interrupts. Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Idle code now always runs at the 0xc... effective address whether in real or virtual mode. This means rfid can be ditched, along with a lot of SRR manipulations. In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change MSR states as required. This also balances the return prediction for the idle call, by doing blr rather than rfid to return to the idle caller. On POWER9, 2-process context switch on different cores, with snooze disabled, increases performance by 2%. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> [mpe: Incorporate v2 fixes from Nick] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Have the system reset idle wakeup handlers branched to in real mode with the 0xc... kernel address applied. This allows simplifications of avoiding rfid when switching to virtual mode in the wakeup handler. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The __replay_interrupt() code is branched to with bl, but the caller is returned to directly with rfid from the interrupt. Instead, rfid to a stub that returns to the caller with blr, which should keep the return branch predictor balanced. Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
msgsnd doorbell exceptions are cleared when the doorbell interrupt is taken. However if a doorbell exception causes a system reset interrupt wake from power saving state, the message is not cleared. Processing the doorbell from the system reset interrupt requires msgclr to avoid taking the exception again. Testing this plus the previous wakup direct patch gives: original wakeup direct msgclr Different threads, same core: 315k/s 264k/s 345k/s Different cores: 235k/s 242k/s 242k/s Net speedup is +10% for same core, and +3% for different core. Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
When the CPU wakes from low power state, it begins at the system reset interrupt with the exception that caused the wakeup encoded in SRR1. Today, powernv idle wakeup ignores the wakeup reason (except a special case for HMI), and the regular interrupt corresponding to the exception will fire after the idle wakeup exits. Change this to replay the interrupt from the idle wakeup before interrupts are hard-enabled. Test on POWER8 of context_switch selftests benchmark with polling idle disabled (e.g., always nap, giving cross-CPU IPIs) gives the following results: original wakeup direct Different threads, same core: 315k/s 264k/s Different cores: 235k/s 242k/s There is a slowdown for doorbell IPI (same core) case because system reset wakeup does not clear the message and the doorbell interrupt fires again needlessly. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This simplifies the asm and fixes irq-off tracing over sleep instructions. Also move powersave_nap check for POWER8 into C code, and move PSSCR register value calculation for POWER9 into C. Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 15 6月, 2017 7 次提交
-
-
由 Nicholas Piggin 提交于
The ISA v3.0B copy-paste facility only requires cpabort when switching to a process that has foreign real addresses mapped (direct access to accelerators), to clear a potential copy buffer filled by a previous thread. There is no accelerator driver implemented yet, so cpabort can be removed. It can be be re-added when a driver is implemented. POWER9 DD1 requires the copy buffer to always be cleared on context switch, but if accelerators are not in use, then an unpaired copy from a dummy region is sufficient to clear data out of the copy buffer. This increases context switch performance by about 5% on POWER9. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The sync (aka. hwsync, aka. heavyweight sync) in the context switch code to prevent MMIO access being reordered from the point of view of a single process if it gets migrated to a different CPU is not required because there is an hwsync performed earlier in the context switch path. Comment this so it's clear enough if anything changes on the scheduler or the powerpc sides. Remove the hwsync from _switch. This improves context switch performance by 2-3% on POWER8. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
There is no need to explicitly break the reservation in _switch, because we are guaranteed that the context switch path will include a larx/stcx. Comment the guarantee and remove the reservation clear from _switch. This is worth 1-2% in context switch performance. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug") hard disabled interrupts over the low level context switch, because the SLB management can't cope with a PMU interrupt accesing the stack in that window. Radix based kernel mapping does not use the SLB so it does not require interrupts hard disabled here. This is worth 1-2% in context switch performance on POWER9. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The syscall exit code that branches to restore_math is quite heavy on Book3S, consisting of 2 mtmsr instructions. Threads that don't use both FP and vector can get caught here if the kernel ever uses FP or vector. Lazy-FP/vec context switching also trips this case. So check for lazy FP and vector before switching RI for restore_math. Move most of this case out of line. For threads that do want to restore math registers, the MSR switches are still suboptimal. Future direction may be to use a soft-RI bit to avoid MSR switches in kernel (similar to soft-EE), but for now at least the no-restore POWER9 context switch rate increases by about 5% due to sched_yield(2) return performance. I haven't constructed a test to measure the syscall cost. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
After bc355125 ("powerpc/64: Allow for relocation-on interrupts from guest to host"), a getppid() system call goes from 307 cycles to 358 cycles (+17%) on POWER8. This is due significantly to the scratch SPR used by the hypercall check. It turns out there are a some volatile registers common to both system call and hypercall (in particular, r12, cr0, ctr), which can be used to avoid the SPR and some other overheads. This brings getppid to 320 cycles (+4%). Testing hcall entry performance by running "sc 1" in guest userspace before this patch is 854 cycles, afterwards is 826. Also a small win there. POWER9 syscall is improved by about the same amount, hcall not tested. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
This reverts commit 45cb08f4. For some reason this is causing IRQ problems on Freescale Book3E machines, eg on my p5020ds: irq 25: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc3-gcc-6.3.1-00037-g45cb08f4 #624 Call Trace: [c0000000fffdbb10] [c00000000049962c] .dump_stack+0xa8/0xe8 (unreliable) [c0000000fffdbba0] [c0000000000babf4] .__report_bad_irq+0x54/0x140 [c0000000fffdbc40] [c0000000000bb11c] .note_interrupt+0x324/0x380 [c0000000fffdbd00] [c0000000000b7110] .handle_irq_event_percpu+0x68/0x88 [c0000000fffdbd90] [c0000000000b718c] .handle_irq_event+0x5c/0xa8 [c0000000fffdbe10] [c0000000000bc01c] .handle_fasteoi_irq+0xe4/0x298 [c0000000fffdbe90] [c0000000000b59c4] .generic_handle_irq+0x50/0x74 [c0000000fffdbf10] [c0000000000075d8] .__do_irq+0x74/0x1f0 [c0000000fffdbf90] [c0000000000189f8] .call_do_irq+0x14/0x24 [c0000000f7173060] [c0000000000077e4] .do_IRQ+0x90/0x120 [c0000000f7173100] [c00000000001d93c] exc_0x500_common+0xfc/0x100 --- interrupt: 501 at .prepare_to_wait_event+0xc/0x14c LR = .fsl_elbc_run_command+0xc8/0x23c [c0000000f71734d0] [c00000000065f418] .nand_reset+0xb8/0x168 [c0000000f7173560] [c00000000065fec4] .nand_scan_ident+0x2b0/0x1638 [c0000000f7173650] [c000000000666cd8] .fsl_elbc_nand_probe+0x34c/0x5f0 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [c0000000f7173750] [c0000000005a3c60] .platform_drv_probe+0x64/0xb0 [c0000000f71737d0] [c0000000005a12e0] .really_probe+0x290/0x334 [c0000000f7173870] [c0000000005a14a0] .__driver_attach+0x11c/0x120 [c0000000f7173900] [c00000000059e6a0] .bus_for_each_dev+0x98/0xfc [c0000000f71739a0] [c0000000005a0b3c] .driver_attach+0x34/0x4c [c0000000f7173a20] [c0000000005a04b0] .bus_add_driver+0x1ac/0x2e0 [c0000000f7173ac0] [c0000000005a2170] .driver_register+0x94/0x160 [c0000000f7173b40] [c0000000005a3be0] .__platform_driver_register+0x60/0x7c [c0000000f7173bc0] [c000000000d6aab4] .fsl_elbc_nand_driver_init+0x24/0x38 [c0000000f7173c30] [c000000000001934] .do_one_initcall+0x68/0x1b8 [c0000000f7173d00] [c000000000d210f8] .kernel_init_freeable+0x260/0x338 [c0000000f7173db0] [c0000000000021b0] .kernel_init+0x20/0xe70 [c0000000f7173e30] [c0000000000009bc] .ret_from_kernel_thread+0x58/0x9c handlers: [<c000000000ed85c8>] .fsl_lbc_ctrl_irq Disabling IRQ #25 Ben also had concerns with the implementation being potentially slow on some PICs, so revert it for now. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 06 6月, 2017 1 次提交
-
-
由 Nicholas Piggin 提交于
The i-side 0111b machine check, which is "Instruction Fetch to foreign address space", was missed by 7b9f71f9 ("powerpc/64s: POWER9 machine check handler"). The POWER9 processor core considers host real addresses with a nonzero value in RA(8:12) as foreign address space, accessible only by the copy and paste instructions. The copy and paste instruction pair can be used to invoke the Nest accelerators via the Virtual Accelerator Switchboard (VAS). It is an error for any regular load/store or ifetch to go to a foreign addresses. When relocation is on, this causes an MMU exception. When relocation is off, a machine check exception. It is possible to trigger this machine check by branching to a foreign address with MSR[IR]=0. Fixes: 7b9f71f9 ("powerpc/64s: POWER9 machine check handler") Reported-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 02 6月, 2017 6 次提交
-
-
由 Hari Bathini 提交于
By default, 5% of system RAM is reserved for preserving boot memory. Alternatively, a user can specify the amount of memory to reserve. See Documentation/powerpc/firmware-assisted-dump.txt for details. In addition to the memory reserved for preserving boot memory, some more memory is reserved, to save HPTE region, CPU state data and ELF core headers. Memory Reservation during first kernel looks like below: Low memory Top of memory 0 boot memory size | | | |<--Reserved dump area -->| V V | Permanent Reservation V +-----------+----------/ /----------+---+----+-----------+----+ | | |CPU|HPTE| DUMP |ELF | +-----------+----------/ /----------+---+----+-----------+----+ | ^ | | \ / ------------------------------------------- Boot memory content gets transferred to reserved area by firmware at the time of crash This implicitly means that the sum of the sizes of boot memory, CPU state data, HPTE region, DUMP preserving area and ELF core headers can't be greater than the total memory size. But currently, a user is allowed to specify any value as boot memory size. So, the above rule is violated when a boot memory size around 50% of the total available memory is specified. As the kernel is not handling this currently, it may lead to undefined behavior. Fix it by setting an upper limit for boot memory size to 25% of the total available memory. Also, instead of using memblock_end_of_DRAM(), which doesn't take the holes, if any, in the memory layout into account, use memblock_phys_mem_size() to calculate the percentage of total available memory. Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Hari Bathini 提交于
With commit f6e6bedb ("powerpc/fadump: Reserve memory at an offset closer to bottom of RAM"), memory for fadump is no longer reserved at the top of RAM. But there are still a few places which say so. Change them appropriately. Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Hari Bathini 提交于
With commit 11550dc0 ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation"), 'fadump_reserve_mem=' parameter is deprecated in favor of 'crashkernel=' parameter. Add a warning if 'fadump_reserve_mem=' is still used. Fixes: 11550dc0 ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation") Suggested-by: NPrarit Bhargava <prarit@redhat.com> Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Unsplit long printk strings] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michal Suchanek 提交于
- log an error message when registration fails and no error code listed in the switch is returned - translate the hv error code to posix error code and return it from fw_register - return the posix error code from fw_register to the process writing to sysfs - return EEXIST on re-registration - return success on deregistration when fadump is not registered - return ENODEV when no memory is reserved for fadump Signed-off-by: NMichal Suchanek <msuchanek@suse.de> Tested-by: NHari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Use pr_err() to shrink the error print] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
It often happens to have simultaneous interrupts, for instance when having double Ethernet attachment. With the current implementation, we suffer the cost of kernel entry/exit for each interrupt. This patch introduces a loop in __do_irq() to handle all interrupts at once before returning. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 5月, 2017 8 次提交
-
-
由 Nicholas Piggin 提交于
Add --orphan-handling=warn to final link flags. This ensures we can handle all sections explicitly. This would have caught subtle breakage such as 7de3b27b at build-time. Also bring existing orphan sections into the fold: - .text.hot and .text.unlikely are compiler generated sections. - .sdata2, .dynsbss, .plt are used by PPC32 - We previously did not specify DWARF_DEBUG or STABS_DEBUG - DWARF_DEBUG did not include all DWARF sections that can be emitted - A number of sections are unused and can be discarded. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Use a tool to check that the location of "fixed sections" are where we expected them to be, which catches cases the linker script can't (stubs being added to start of .text section), and which ends up being neater. Sample output: ERROR: start_text address is c000000000008100, should be c000000000008000 ERROR: see comments in arch/powerpc/tools/head_check.sh Signed-off-by: NNicholas Piggin <npiggin@gmail.com> [mpe: Fold in fix from Nick for 4.6 era toolchains] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Very large kernels may require linker stubs for branches from HEAD text code. The linker may place these stubs before the HEAD text sections, which breaks the assumption that HEAD text is located at 0 (or the .text section being located at 0x7000/0x8000 on Book3S kernels). Provide an option to create a small section just before the .text section with an empty 256 - 4 bytes, and adjust the start of the .text section to match. The linker will tend to put stubs in that section and not break our relative-to-absolute offset assumptions. This causes a small waste of space on common kernels, but allows large kernels to build and boot. For now, it is an EXPERT config option, defaulting to =n, but a reference is provided for it in the build-time check for such breakage. This is good enough for allyesconfig and custom users / hackers. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Ivan Mikhaylov 提交于
Prevent a kernel panic caused by unintentionally clearing TCR watchdog bits. At this point in the kernel boot, the watchdog may have already been enabled by u-boot. The original code's attempt to write to the TCR register results in an inadvertent clearing of the watchdog configuration bits, causing the 476 to reset. Signed-off-by: NIvan Mikhaylov <ivan@de.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gautham R. Shenoy 提交于
On Power9 DD1 due to a hardware bug the Power-Saving Level Status field (PLS) of the PSSCR for a thread waking up from a deep state can under-report if some other thread in the core is in a shallow stop state. The scenario in which this can manifest is as follows: 1) All the threads of the core are in deep stop. 2) One of the threads is woken up. The PLS for this thread will correctly reflect that it is waking up from deep stop. 3) The thread that has woken up now executes a shallow stop. 4) When some other thread in the core is woken, its PLS will reflect the shallow stop state. Thus, the subsequent thread for which the PLS is under-reporting the wakeup state will not restore the hypervisor resources. Hence, on DD1 systems, use the Requested Level (RL) field as a workaround to restore the contents of the hypervisor resources on the wakeup from the stop state. Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gautham R. Shenoy 提交于
On wakeup from a deep stop state which is supposed to lose the hypervisor state, we don't restore the LPCR to the old value but set it to a "sane" value via cur_cpu_spec->cpu_restore(). The problem is that the "sane" value doesn't include UPRT and the HR bits which are required to run correctly in Radix mode. Fix this on POWER9 onwards by restoring the LPCR value whatever it was before executing the stop instruction. Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gautham R. Shenoy 提交于
On POWER8, in case of - nap: both timebase and hypervisor state is retained. - fast-sleep: timebase is lost. But the hypervisor state is retained. - winkle: timebase and hypervisor state is lost. Hence, the current code for handling exit from a idle state assumes that if the timebase value is retained, then so is the hypervisor state. Thus, the current code doesn't restore per-core hypervisor state in such cases. But that is no longer the case on POWER9 where we do have stop states in which timebase value is retained, but the hypervisor state is lost. So we have to ensure that the per-core hypervisor state gets restored in such cases. Fix this by ensuring that even in the case when timebase is retained, we explicitly check if we are waking up from a deep stop that loses per-core hypervisor state (indicated by cr4 being eq or gt), and if this is the case, we restore the per-core hypervisor state. Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-