- 20 3月, 2015 1 次提交
-
-
由 Paul Mackerras 提交于
Since we can now use hypervisor doorbells for host IPIs, this makes sure we clear the host IPI flag when taking a doorbell interrupt, and clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we already do for IPIs sent via the XICS interrupt controller). Otherwise if there did happen to be a leftover pending doorbell interrupt for an offline CPU thread for any reason, it would prevent that thread from going into a power-saving mode; it would instead keep waking up because of the interrupt. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 15 12月, 2014 2 次提交
-
-
由 Shreyas B. Prabhu 提交于
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. Cc: stable@vger.kernel.org [ shreyas@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 15 10月, 2014 2 次提交
-
-
由 Anton Blanchard 提交于
Michael points out that __get_SP() is a pretty horrible function name. Let's give it a better name. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
Li Zhong points out an issue with our current __get_SP() implementation. If ftrace function tracing is enabled (ie -pg profiling using _mcount) we spill a stack frame on 64bit all the time. If a function calls __get_SP() and later calls a function that is tail call optimised, we will pop the stack frame and the value returned by __get_SP() is no longer valid. An example from Li can be found in save_stack_trace -> save_context_stack: c0000000000432c0 <.save_stack_trace>: c0000000000432c0: mflr r0 c0000000000432c4: std r0,16(r1) c0000000000432c8: stdu r1,-128(r1) <-- stack frame for _mcount c0000000000432cc: std r3,112(r1) c0000000000432d0: bl <._mcount> c0000000000432d4: nop c0000000000432d8: mr r4,r1 <-- __get_SP() c0000000000432dc: ld r5,632(r13) c0000000000432e0: ld r3,112(r1) c0000000000432e4: li r6,1 c0000000000432e8: addi r1,r1,128 <-- pop stack frame c0000000000432ec: ld r0,16(r1) c0000000000432f0: mtlr r0 c0000000000432f4: b <.save_context_stack> <-- tail call optimized save_context_stack ends up with a stack pointer below the current one, and it is likely to be scribbled over. Fix this by making __get_SP() a function which returns the callers stack frame. Also replace inline assembly which grabs the stack pointer in save_stack_trace and show_stack with __get_SP(). This also fixes an issue with perf_arch_fetch_caller_regs(). It currently unwinds the stack once, which will skip a valid stack frame on a leaf function. With the __get_SP() fixes in this patch, we never need to unwind the stack frame to get to the first interesting frame. We have to export __get_SP() because perf_arch_fetch_caller_regs() (which is used in modules) calls it from a header file. Reported-by: NLi Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 05 9月, 2014 1 次提交
-
-
由 LEROY Christophe 提交于
Since commit 469d62be, SPRG2 is used as a scratch register just like SPRG0 and SPRG1. So Declare it as such and fix the comment which is not valid anymore since that commit. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 13 8月, 2014 1 次提交
-
-
由 Nishanth Aravamudan 提交于
It appears that commits 7f06f21d ("powerpc/tm: Add checking to treclaim/trechkpt") and e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support") both added definitions of TEXASR_FS. Remove one of them. At the same time, fix the alignment of the remaining definition (should be tab-separated like the rest of the #defines). Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 7月, 2014 4 次提交
-
-
由 Stewart Smith 提交于
The POWER8 processor has a Micro Partition Prefetch Engine, which is a fancy way of saying "has way to store and load contents of L2 or L2+MRU way of L3 cache". We initiate the storing of the log (list of addresses) using the logmpp instruction and start restore by writing to a SPR. The logmpp instruction takes parameters in a single 64bit register: - starting address of the table to store log of L2/L2+L3 cache contents - 32kb for L2 - 128kb for L2+L3 - Aligned relative to maximum size of the table (32kb or 128kb) - Log control (no-op, L2 only, L2 and L3, abort logout) We should abort any ongoing logging before initiating one. To initiate restore, we write to the MPPR SPR. The format of what to write to the SPR is similar to the logmpp instruction parameter: - starting address of the table to read from (same alignment requirements) - table size (no data, until end of table) - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or 32 cycles) The idea behind loading and storing the contents of L2/L3 cache is to reduce memory latency in a system that is frequently swapping vcores on a physical CPU. The best case scenario for doing this is when some vcores are doing very cache heavy workloads. The worst case is when they have about 0 cache hits, so we just generate needless memory operations. This implementation just does L2 store/load. In my benchmarks this proves to be useful. Benchmark 1: - 16 core POWER8 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each - No split core/SMT - two guests running sysbench memory test. sysbench --test=memory --num-threads=8 run - one guest running apache bench (of default HTML page) ab -n 490000 -c 400 http://localhost/ This benchmark aims to measure performance of real world application (apache) where other guests are cache hot with their own workloads. The sysbench memory benchmark does pointer sized writes to a (small) memory buffer in a loop. In this benchmark with this patch I can see an improvement both in requests per second (~5%) and in mean and median response times (again, about 5%). The spread of minimum and maximum response times were largely unchanged. benchmark 2: - Same VM config as benchmark 1 - all three guests running sysbench memory benchmark This benchmark aims to see if there is a positive or negative affect to this cache heavy benchmark. Although due to the nature of the benchmark (stores) we may not see a difference in performance, but rather hopefully an improvement in consistency of performance (when vcore switched in, don't have to wait many times for cachelines to be pulled in) The results of this benchmark are improvements in consistency of performance rather than performance itself. With this patch, the few outliers in duration go away and we get more consistent performance in each guest. benchmark 3: - same 3 guests and CPU configuration as benchmark 1 and 2. - two idle guests - 1 guest running STREAM benchmark This scenario also saw performance improvement with this patch. On Copy and Scale workloads from STREAM, I got 5-6% improvement with this patch. For Add and triad, it was around 10% (or more). benchmark 4: - same 3 guests as previous benchmarks - two guests running sysbench --memory, distinctly different cache heavy workload - one guest running STREAM benchmark. Similar improvements to benchmark 3. benchmark 5: - 1 guest, 8 VCPUs, Ubuntu 14.04 - Host configured with split core (SMT8, subcores-per-core=4) - STREAM benchmark In this benchmark, we see a 10-20% performance improvement across the board of STREAM benchmark results with this patch. Based on preliminary investigation and microbenchmarks by Prerna Saxena <prerna@linux.vnet.ibm.com> Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer, but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment is not valid now. Note: SPRN_SPRG3R is not supported (do not see any need as of now), and if we want to support this in future then we have to shift to using SPRG1 for VCPU pointer. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
virtual time base register is a per VM, per cpu register that needs to be saved and restored on vm exit and entry. Writing to VTB is not allowed in the privileged mode. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: fix compile error] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Michael Ellerman 提交于
Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had a Segment Table (STAB). Now that we've dropped support for those cpus, we can remove the STAB support entirely. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 30 5月, 2014 1 次提交
-
-
由 Paul Mackerras 提交于
This adds workarounds for two hardware bugs in the POWER8 performance monitor unit (PMU), both related to interrupt generation. The effect of these bugs is that PMU interrupts can get lost, leading to tools such as perf reporting fewer counts and samples than they should. The first bug relates to the PMAO (perf. mon. alert occurred) bit in MMCR0; setting it should cause an interrupt, but doesn't. The other bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0. Setting PMAE when a counter is negative and counter negative conditions are enabled to cause alerts should cause an alert, but doesn't. The workaround for the first bug is to create conditions where a counter will overflow, whenever we are about to restore a MMCR0 value that has PMAO set (and PMAO_SYNC clear). The workaround for the second bug is to freeze all counters using MMCR2 before reading MMCR0. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 5月, 2014 1 次提交
-
-
由 Michael Ellerman 提交于
Upcoming POWER8 chips support a concept called split core. This is where the core can be split into subcores that although not full cores, are able to appear as full cores to a guest. The splitting & unsplitting procedure is mildly complicated, and explained at length in the comments within the patch. One notable detail is that when splitting or unsplitting we need to pull offline cpus out of their offline state to do work as part of the procedure. The interface for changing the split mode is via a sysfs file, eg: $ echo 2 > /sys/devices/system/cpu/subcores_per_core Currently supported values are '1', '2' and '4'. And indicate respectively that the core should be unsplit, split in half, and split in quarters. These modes correspond to threads_per_subcore of 8, 4 and 2. We do not allow changing the split mode while KVM VMs are active. This is to prevent the value changing while userspace is configuring the VM, and also to prevent the mode being changed in such a way that existing guests are unable to be run. CPU hotplug fixes by Srivatsa. max_cpus fixes by Mahesh. cpuset fixes by benh. Fix for irq race by paulus. The rest by mikey and mpe. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 4月, 2014 1 次提交
-
-
由 Michael Neuling 提交于
If we do a treclaim and we are not in TM suspend mode, it results in a TM bad thing (ie. a 0x700 program check). Similarly if we do a trechkpt and we have an active transaction or TEXASR Failure Summary (FS) is not set, we also take a TM bad thing. This should never happen, but if it does (ie. a kernel bug), the cause is almost impossible to debug as the GPR state is mostly userspace and hence we don't get a call chain. This adds some checks in these cases case a BUG_ON() (in asm) in case we ever hit these cases. It moves the register saving around to preserve r1 till later also. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 07 4月, 2014 1 次提交
-
-
由 Vaidyanathan Srinivasan 提交于
Backend driver to dynamically set voltage and frequency on IBM POWER non-virtualized platforms. Power management SPRs are used to set the required PState. This driver works in conjunction with cpufreq governors like 'ondemand' to provide a demand based frequency and voltage setting on IBM POWER non-virtualized platforms. PState table is obtained from OPAL v3 firmware through device tree. powernv_cpufreq back-end driver would parse the relevant device-tree nodes and initialise the cpufreq subsystem on powernv platform. The code was originally written by svaidy@linux.vnet.ibm.com. Over time it was modified to accomodate bug-fixes as well as updates to the the cpu-freq core. Relevant portions of the change logs corresponding to those modifications are noted below: * The policy->cpus needs to be populated in a hotplug-invariant manner instead of using cpu_sibling_mask() which varies with cpu-hotplug. This is because the cpufreq core code copies this content into policy->related_cpus mask which should not vary on cpu-hotplug. [Authored by srivatsa.bhat@linux.vnet.ibm.com] * Create a helper routine that can return the cpu-frequency for the corresponding pstate_id. Also, cache the values of the pstate_max, pstate_min and pstate_nominal and nr_pstates in a static structure so that they can be reused in the future to perform any validations. [Authored by ego@linux.vnet.ibm.com] * Create a driver attribute named cpuinfo_nominal_freq which creates a sysfs read-only file named cpuinfo_nominal_freq. Export the frequency corresponding to the nominal_pstate through this interface. Nominal frequency is the highest non-turbo frequency for the platform. This is generally used for setting governor policies from user space for optimal energy efficiency. [Authored by ego@linux.vnet.ibm.com] * Implement a powernv_cpufreq_get(unsigned int cpu) method which will return the current operating frequency. Export this via the sysfs interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to powernv_cpufreq_get(). [Authored by ego@linux.vnet.ibm.com] [Change log updated by ego@linux.vnet.ibm.com] Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 29 3月, 2014 1 次提交
-
-
由 Michael Neuling 提交于
This adds saving of the transactional memory (TM) checkpointed state on guest entry and exit. We only do this if we see that the guest has an active transaction. It also adds emulation of the TM state changes when delivering IRQs into the guest. According to the architecture, if we are transactional when an IRQ occurs, the TM state is changed to suspended, otherwise it's left unchanged. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NScott Wood <scottwood@freescale.com>
-
- 24 3月, 2014 2 次提交
-
-
由 Michael Ellerman 提交于
The previous commit added constraint and register handling to allow processes using EBB (Event Based Branches) to request access to the BHRB (Branch History Rolling Buffer). With that in place we can allow processes using EBB to access the BHRB. This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We must also clear BHRBA when we are disabling. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
Some power8 revisions have a hardware bug where we can lose a PMU exception, this commit adds a workaround to detect the bad condition and rectify the situation. See the comment in the commit for a full description. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 20 3月, 2014 2 次提交
-
-
由 Scott Wood 提交于
Previously SPRG3 was marked for use by both VDSO and critical interrupts (though critical interrupts were not fully implemented). In commit 8b64a9df ("powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman made an attempt to resolve this conflict by restoring the VDSO value early in the critical interrupt, but this has some issues: - It's incompatible with EXCEPTION_COMMON which restores r13 from the by-then-overwritten scratch (this cost me some debugging time). - It forces critical exceptions to be a special case handled differently from even machine check and debug level exceptions. - It didn't occur to me that it was possible to make this work at all (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after I made (most of) this patch. :-) It might be worth investigating using a load rather than SPRG on return from all exceptions (except TLB misses where the scratch never leaves the SPRG) -- it could save a few cycles. Until then, let's stick with SPRG for all exceptions. Since we cannot use SPRG4-7 for scratch without corrupting the state of a KVM guest, move VDSO to SPRG7 on book3e. Since neither SPRG4-7 nor critical interrupts exist on book3s, SPRG3 is still used for VDSO there. Signed-off-by: NScott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm-ppc@vger.kernel.org
-
由 Wang Dongsheng 提交于
Signed-off-by: NWang Dongsheng <dongsheng.wang@freescale.com> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 27 1月, 2014 5 次提交
-
-
由 Paul Mackerras 提交于
The DABRX (DABR extension) register on POWER7 processors provides finer control over which accesses cause a data breakpoint interrupt. It contains 3 bits which indicate whether to enable accesses in user, kernel and hypervisor modes respectively to cause data breakpoint interrupts, plus one bit that enables both real mode and virtual mode accesses to cause interrupts. Currently, KVM sets DABRX to allow both kernel and user accesses to cause interrupts while in the guest. This adds support for the guest to specify other values for DABRX. PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR and DABRX with one call. This adds a real-mode implementation of H_SET_XDABR, which shares most of its code with the existing H_SET_DABR implementation. To support this, we add a per-vcpu field to store the DABRX value plus code to get and set it via the ONE_REG interface. For Linux guests to use this new hcall, userspace needs to add "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions property in the device tree. If userspace does this and then migrates the guest to a host where the kernel doesn't include this patch, then userspace will need to implement H_SET_XDABR by writing the specified DABR value to the DABR using the ONE_REG interface. In that case, the old kernel will set DABRX to DABRX_USER | DABRX_KERNEL. That should still work correctly, at least for Linux guests, since Linux guests cope with getting data breakpoint interrupts in modes that weren't requested by just ignoring the interrupt, and Linux guests never set DABRX_BTI. The other thing this does is to make H_SET_DABR and H_SET_XDABR work on POWER8, which has the DAWR and DAWRX instead of DABR/X. Guests that know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but guests running in POWER7 compatibility mode will still use H_SET_[X]DABR. For them, this adds the logic to convert DABR/X values into DAWR/X values on POWER8. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR registers to count when in the guest. Set this bit. POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location) which is used to enable relocation-on interrupts. Allow userspace to set this field. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
* SRR1 wake reason field for system reset interrupt on wakeup from nap is now a 4-bit field on P8, compared to 3 bits on P7. * Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells will wake us up. * Waking up from nap because of a guest doorbell interrupt is not a reason to exit the guest. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7) compatibility modes on a POWER8 processor. (Note that transactional memory is disabled for usermode if either or both of the PCR_TM_DIS and PCR_ARCH_206 bits are set.) Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Michael Neuling 提交于
This adds fields to the struct kvm_vcpu_arch to store the new guest-accessible SPRs on POWER8, adds code to the get/set_one_reg functions to allow userspace to access this state, and adds code to the guest entry and exit to context-switch these SPRs between host and guest. Note that DPDES (Directed Privileged Doorbell Exception State) is shared between threads on a core; hence we store it in struct kvmppc_vcore and have the master thread save and restore it. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 08 1月, 2014 1 次提交
-
-
由 Wang Dongsheng 提交于
E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec idle patches. Signed-off-by: NWang Dongsheng <dongsheng.wang@freescale.com> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 23 11月, 2013 1 次提交
-
-
由 LEROY Christophe 提交于
Commit beb2dc0a breaks the MPC8xx which seems to not support using mfspr SPRN_TBRx instead of mftb/mftbu despite what is written in the reference manual. This patch reverts to the use of mftb/mftbu when CONFIG_8xx is selected. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 17 10月, 2013 3 次提交
-
-
由 Paul Mackerras 提交于
This enables us to use the Processor Compatibility Register (PCR) on POWER7 to put the processor into architecture 2.05 compatibility mode when running a guest. In this mode the new instructions and registers that were introduced on POWER7 are disabled in user mode. This includes all the VSX facilities plus several other instructions such as ldbrx, stdbrx, popcntw, popcntd, etc. To select this mode, we have a new register accessible through the set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT. Setting this to zero gives the full set of capabilities of the processor. Setting it to one of the "logical" PVR values defined in PAPR puts the vcpu into the compatibility mode for the corresponding architecture level. The supported values are: 0x0f000002 Architecture 2.05 (POWER6) 0x0f000003 Architecture 2.06 (POWER7) 0x0f100003 Architecture 2.06+ (POWER7+) Since the PCR is per-core, the architecture compatibility level and the corresponding PCR value are stored in the struct kvmppc_vcore, and are therefore shared between all vcpus in a virtual core. Signed-off-by: NPaul Mackerras <paulus@samba.org> [agraf: squash in fix to add missing break statements and documentation] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This adds the ability to have a separate LPCR (Logical Partitioning Control Register) value relating to a guest for each virtual core, rather than only having a single value for the whole VM. This corresponds to what real POWER hardware does, where there is a LPCR per CPU thread but most of the fields are required to have the same value on all active threads in a core. The per-virtual-core LPCR can be read and written using the GET/SET_ONE_REG interface. Userspace can can only modify the following fields of the LPCR value: DPFD Default prefetch depth ILE Interrupt little-endian TC Translation control (secondary HPT hash group search disable) We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which contains bits relating to memory management, i.e. the Virtualized Partition Memory (VPM) bits and the bits relating to guest real mode. When this default value is updated, the update needs to be propagated to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do that. Signed-off-by: NPaul Mackerras <paulus@samba.org> [agraf: fix whitespace] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This allows guests to have a different timebase origin from the host. This is needed for migration, where a guest can migrate from one host to another and the two hosts might have a different timebase origin. However, the timebase seen by the guest must not go backwards, and should go forwards only by a small amount corresponding to the time taken for the migration. Therefore this provides a new per-vcpu value accessed via the one_reg interface using the new KVM_REG_PPC_TB_OFFSET identifier. This value defaults to 0 and is not modified by KVM. On entering the guest, this value is added onto the timebase, and on exiting the guest, it is subtracted from the timebase. This is only supported for recent POWER hardware which has the TBU40 (timebase upper 40 bits) register. Writing to the TBU40 register only alters the upper 40 bits of the timebase, leaving the lower 24 bits unchanged. This provides a way to modify the timebase for guest migration without disturbing the synchronization of the timebase registers across CPU cores. The kernel rounds up the value given to a multiple of 2^24. Timebase values stored in KVM structures (struct kvm_vcpu, struct kvmppc_vcore, etc.) are stored as host timebase values. The timebase values in the dispatch trace log need to be guest timebase values, however, since that is read directly by the guest. This moves the setting of vcpu->arch.dec_expires on guest exit to a point after we have restored the host timebase so that vcpu->arch.dec_expires is a host timebase value. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 11 10月, 2013 1 次提交
-
-
由 Anton Blanchard 提交于
We need to set MSR_LE in kernel and userspace for little endian builds Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 05 9月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
Commit 74e400ce ("powerpc: Rework setting up H/FSCR bit definitions") ended up with incorrect bit numbers for FSCR_PM_LG and FSCR_BHRB_LG. This fixes them. Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 21 8月, 2013 2 次提交
-
-
由 Scott Wood 提交于
Some CPUs (such as e500v1/v2) don't implement mftb and will take a trap. mfspr should work on everything that has a timebase, and is the preferred instruction according to ISA v2.06. Currently we get away with mftb on 85xx because the assembler converts it to mfspr due to -Wa,-me500. However, that flag has other effects that are undesireable for certain targets (e.g. lwsync is converted to sync), and is hostile to multiplatform kernels. Thus we would like to stop setting it for all e500-family builds. mftb/mftbu instances which are in 85xx code or common code are converted. Instances which will never run on 85xx are left alone. Signed-off-by: NScott Wood <scottwood@freescale.com>
-
由 Scott Wood 提交于
Erratum A-006598 says that 64-bit mftb is not atomic -- it's subject to a similar race condition as doing mftbu/mftbl on 32-bit. The lower half of timebase is updated before the upper half; thus, we can share the workaround for a similar bug on Cell. This workaround involves looping if the lower half of timebase is zero, thus avoiding the need for a scratch register (other than CR0). This workaround must be avoided when the timebase is frozen, such as during the timebase sync code. This deals with kernel and vdso accesses, but other userspace accesses will of course need to be fixed elsewhere. Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 14 8月, 2013 1 次提交
-
-
由 Anton Blanchard 提交于
Not having parentheses around a macro is asking for trouble. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 8月, 2013 1 次提交
-
-
由 Michael Neuling 提交于
This reworks the Facility Status and Control Regsiter (FSCR) config bit definitions so that we can access the bit numbers. This is needed for a subsequent patch to fix the userspace DSCR handling. HFSCR and FSCR bit definitions are the same, so reuse them. Signed-off-by: NMichael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> [v3.10] Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 7月, 2013 1 次提交
-
-
由 Michael Neuling 提交于
POWER8 comes with two different PVRs. This patch enables the additional PVR in the cputable. The existing entry (PVR=0x4b) is renamed to POWER8E and the new entry (PVR=0x4d) is given POWER8. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 01 7月, 2013 2 次提交
-
-
由 Michael Ellerman 提交于
Add support for EBB (Event Based Branches) on 64-bit book3s. See the included documentation for more details. EBBs are a feature which allows the hardware to branch directly to a specified user space address when a PMU event overflows. This can be used by programs for self-monitoring with no kernel involvement in the inner loop. Most of the logic is in the generic book3s code, primarily to avoid a proliferation of PMU callbacks. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
On Power8 we can freeze PMC5 and 6 if we're not using them. Normally they run all the time. As noticed by Anshuman, we should unfreeze them when we disable the PMU as there are legacy tools which expect them to run all the time. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.10] Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 01 6月, 2013 1 次提交
-
-
由 Michael Neuling 提交于
These cause codes are usable by userspace, so let's export to uapi. Signed-off-by: NMichael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-