- 28 7月, 2014 30 次提交
-
-
由 Bharat Bhushan 提交于
Uses kvmppc_set_dar() and kvmppc_get_dar() helper functions Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Use kvmppc_set_srr0/srr1() and kvmppc_get_srr0/srr1() helper functions Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on BOOKE-HV and these shadow registers are guest accessible. So these shadow registers needs to be updated on BOOKE-HV. This patch adds new macro for get/set helper of shadow register . Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
The magic page is defined as a 4k page of per-vCPU data that is shared between the guest and the host to accelerate accesses to privileged registers. However, when the host is using 64k page size granularity we weren't quite as strict about that rule anymore. Instead, we partially treated all of the upper 64k as magic page and mapped only the uppermost 4k with the actual magic contents. This works well enough for Linux which doesn't use any memory in kernel space in the upper 64k, but Mac OS X got upset. So this patch makes magic page actually stay in a 4k range even on 64k page size hosts. This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE hosts for me. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
Today we handle split real mode by mapping both instruction and data faults into a special virtual address space that only exists during the split mode phase. This is good enough to catch 32bit Linux guests that use split real mode for copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our instruction pointer and can map the user space process freely below there. However, that approach fails when we're running KVM inside of KVM. Here the 1st level last_inst reader may well be in the same virtual page as a 2nd level interrupt handler. It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a kernel copy_from/to_user implementation can easily overlap with user space addresses. The architecturally correct way to fix this would be to implement an instruction interpreter in KVM that kicks in whenever we go into split real mode. This interpreter however would not receive a great amount of testing and be a lot of bloat for a reasonably isolated corner case. So I went back to the drawing board and tried to come up with a way to make split real mode work with a single flat address space. And then I realized that we could get away with the same trick that makes it work for Linux: Whenever we see an instruction address during split real mode that may collide, we just move it higher up the virtual address space to a place that hopefully does not collide (keep your fingers crossed!). That approach does work surprisingly well. I am able to successfully run Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I apply a tiny timing probe hack to QEMU. I'd say this is a win over even more broken split real mode :). Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When a page lookup failed because we're not allowed to write to the page, we should not overwrite that value with another lookup on the second PTEG which will return "page not found". Instead, we should just tell the caller that we had a permission problem. This fixes Mac OS X guests looping endlessly in page lookup code for me. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When we have a page that we're not allowed to write to, xlate() will already tell us -EPERM on lookup of that page. With the code as is we change it into a "page missing" error which a guest may get confused about. Instead, just tell the caller about the -EPERM directly. This fixes Mac OS X guests when run with DCBZ32 emulation. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When building KVM with a lot of vcores (NR_CPUS is big), we can potentially get out of the ld immediate range for dereferences inside that struct. Move the array to the end of our kvm_arch struct. This fixes compilation issues with NR_CPUS=2048 for me. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0) to enable idle power down for cores and devices by setting up the idle count period at boot time. With the host already controlling the power management configuration the guest could simply benefit from it, so emulate guest request as a general store. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
Now that we've fixed all the issues that HV KVM code had on little endian hosts, we can enable it in the kernel configuration for users to play with. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
For code that doesn't live in modules we can just branch to the real function names, giving us compatibility with ABIv1 and ABIv2. Do this for the compiled-in code of HV KVM. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
On the exit path from the guest we check what type of interrupt we received if we received one. This means we're doing hardware access to the XICS interrupt controller. However, when running on a little endian system, this access is byte reversed. So let's make sure to swizzle the bytes back again and virtually make XICS accesses big endian. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
Some data structures are always stored in big endian. Among those are the LPPACA fields as well as the shadow slb. These structures might be shared with a hypervisor. So whenever we access those fields, make sure we do so in big endian byte order. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
There are a few shared data structures between the host and the guest. Most of them get registered through the VPA interface. These data structures are defined to always be in big endian byte order, so let's make sure we always access them in big endian. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When running on an LE host all data structures are kept in little endian byte order. However, the HTAB still needs to be maintained in big endian. So every time we access any HTAB we need to make sure we do so in the right byte order. Fix up all accesses to manually byte swap. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
From assembly code we might not only have to explicitly BE access 64bit values, but sometimes also 32bit ones. Add helpers that allow for easy use of lwzx/stwx in their respective byte-reverse or native form. Signed-off-by: NAlexander Graf <agraf@suse.de> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Mihai Caraman 提交于
Tlb search operation used for victim hint relies on the default tlb set by the host. When hardware tablewalk support is enabled in the host, the default tlb is TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb when searching for victim hint. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Reviewed-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Michael Neuling 提交于
This adds support for the H_SET_MODE hcall. This hcall is a multiplexer that has several functions, some of which are called rarely, and some which are potentially called very frequently. Here we add support for the functions that set the debug registers CIABR (Completed Instruction Address Breakpoint Register) and DAWR/DAWRX (Data Address Watchpoint Register and eXtension), since they could be updated by the guest as often as every context switch. This also adds a kvmppc_power8_compatible() function to test to see if a guest is compatible with POWER8 or not. The CIABR and DAWR/X only exist on POWER8. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL capability is used to enable or disable in-kernel handling of an hcall, that the hcall is actually implemented by the kernel. If not an EINVAL error is returned. This also checks the default-enabled list of hcalls and prints a warning if any hcall there is not actually implemented. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This provides a way for userspace controls which sPAPR hcalls get handled in the kernel. Each hcall can be individually enabled or disabled for in-kernel handling, except for H_RTAS. The exception for H_RTAS is because userspace can already control whether individual RTAS functions are handled in-kernel or not via the KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for H_RTAS is out of the normal sequence of hcall numbers. Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM. The args field of the struct kvm_enable_cap specifies the hcall number in args[0] and the enable/disable flag in args[1]; 0 means disable in-kernel handling (so that the hcall will always cause an exit to userspace) and 1 means enable. Enabling or disabling in-kernel handling of an hcall is effective across the whole VM. The ability for KVM_ENABLE_CAP to be used on a VM file descriptor on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM capability advertises that this ability exists. When a VM is created, an initial set of hcalls are enabled for in-kernel handling. The set that is enabled is the set that have an in-kernel implementation at this point. Any new hcall implementations from this point onwards should not be added to the default set without a good reason. No distinction is made between real-mode and virtual-mode hcall implementations; the one setting controls them both. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition keeping last_vcpu per logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. Guest - old invalidation condition real 3.89 user 3.87 sys 0.01 Guest - enhanced invalidation condition real 3.75 user 3.73 sys 0.01 Host real 3.70 user 1.85 sys 0.00 The memory stress application accesses 4KB pages backed by 75% of available TLB0 entries: char foo[ENTRIES][4096] __attribute__ ((aligned (4096))); int main() { char bar; int i, j; for (i = 0; i < ITERATIONS; i++) for (j = 0; j < ENTRIES; j++) bar = foo[j][0]; return 0; } Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Reviewed-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
While sending sparse with endian checks over the code base, it triggered at some places that were missing casts or had wrong types. Fix them up. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
We switched to ABIv2 on Little Endian systems now which gets rid of the dotted function names. Branch to the actual functions when we see such a system. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Anton Blanchard 提交于
Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are assembly functions that are exported to modules and also require a valid r2. As such we need to use _GLOBAL_TOC so we provide a global entry point that establishes the TOC (r2). Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Anton Blanchard 提交于
To establish addressability quickly, ABIv2 requires the target address of the function being called to be in r12. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
If we're running PR KVM in HV mode, we may get hypervisor doorbell interrupts. Handle those the same way we treat normal doorbells. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
Some compilers complain about uninitialized variables in the compute_tlbie_rb function. When you follow the code path you'll realize that we'll never get to that point, but the compiler isn't all that smart. So just default to 4k page sizes for everything, making the compiler happy and the code slightly easier to read. Signed-off-by: NAlexander Graf <agraf@suse.de> Acked-by: NPaul Mackerras <paulus@samba.org>
-
由 Alexander Graf 提交于
When we're using PR KVM we must not allow the CPU to take interrupts in virtual mode, as the SLB does not contain host kernel mappings when running inside the guest context. To make sure we get good performance for non-KVM tasks but still properly functioning PR KVM, let's just disable AIL whenever a vcpu is scheduled in. This is fundamentally different from how we deal with AIL on pSeries type machines where we disable AIL for the whole machine as soon as a single KVM VM is up. The reason for that is easy - on pSeries we do not have control over per-cpu configuration of AIL. We also don't want to mess with CPU hotplug races and AIL configuration, so setting it per CPU is easier and more flexible. This patch fixes running PR KVM on POWER8 bare metal for me. Signed-off-by: NAlexander Graf <agraf@suse.de> Acked-by: NPaul Mackerras <paulus@samba.org>
-
由 Aneesh Kumar K.V 提交于
Writing to IC is not allowed in the privileged mode. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
virtual time base register is a per VM, per cpu register that needs to be saved and restored on vm exit and entry. Writing to VTB is not allowed in the privileged mode. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: fix compile error] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 06 7月, 2014 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
We use time base for PURR and SPURR emulation with PR KVM since we are emulating a single threaded core. When using time base we need to make sure that we don't accumulate time spent in the host in PURR and SPURR value. Also we don't need to emulate mtspr because both the registers are hypervisor resource. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 12 6月, 2014 2 次提交
-
-
由 Michal Marek 提交于
The rule to create the final images uses a zImage.% pattern. Unfortunately, this also matches the names of the zImage.*.lds linker scripts, which appear as a dependency of the final images. This somehow worked when $(srctree) used to be an absolute path, but now the pattern matches too much. List only the images from $(image-y) as the target of the rule, to avoid the circular dependency. Reported-and-tested-by: NMike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: NMichal Marek <mmarek@suse.cz>
-
由 Anton Blanchard 提交于
Commit 2749a2f2 (powerpc/book3s: Fix machine check handling for unhandled errors) introduced a few ABIv2 issues. We can maintain ABIv1 and ABIv2 compatibility by branching to the function rather than the dot symbol. Fixes: 2749a2f2 ("powerpc/book3s: Fix machine check handling for unhandled errors") Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 11 6月, 2014 7 次提交
-
-
由 Mahesh Salgaonkar 提交于
Currently we forward MCEs to guest which have been recovered by guest. And for unhandled errors we do not deliver the MCE to guest. It looks like with no support of FWNMI in qemu, guest just panics whenever we deliver the recovered MCEs to guest. Also, the existig code used to return to host for unhandled errors which was casuing guest to hang with soft lockups inside guest and makes it difficult to recover guest instance. This patch now forwards all fatal MCEs to guest causing guest to crash/panic. And, for recovered errors we just go back to normal functioning of guest instead of returning to host. This fixes soft lockup issues in guest. This patch also fixes an issue where guest MCE events were not logged to host console. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Mahesh Salgaonkar 提交于
We don't see MCE counter getting increased in /proc/interrupts which gives false impression of no MCE occurred even when there were MCE events. The machine check early handling was added for PowerKVM and we missed to increment the MCE count in the early handler. We also increment mce counters in the machine_check_exception call, but in most cases where we handle the error hypervisor never reaches there unless its fatal and we want to crash. Only during fatal situation we may see double increment of mce count. We need to fix that. But for now it always good to have some count increased instead of zero. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Mahesh Salgaonkar 提交于
Currently machine check handler does not check for stack overflow for nested machine check. If we hit another MCE while inside the machine check handler repeatedly from same address then we get into risk of stack overflow which can cause huge memory corruption. This patch limits the nested MCE level to 4 and panic when we cross level 4. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Mahesh Salgaonkar 提交于
Current code does not check for unhandled/unrecovered errors and return from interrupt if it is recoverable exception which in-turn triggers same machine check exception in a loop causing hypervisor to be unresponsive. This patch fixes this situation and forces hypervisor to panic for unhandled/unrecovered errors. This patch also fixes another issue where unrecoverable_exception routine was called in real mode in case of unrecoverable exception (MSR_RI = 0). This causes another exception vector 0x300 (data access) during system crash leading to confusion while debugging cause of the system crash. Also turn ME bit off while going down, so that when another MCE is hit during panic path, system will checkstop and hypervisor will get restarted cleanly by SP. With the above fixes we now throw correct console messages (see below) while crashing the system in case of unhandled/unrecoverable machine checks. -------------- Severe Machine check interrupt [[Not recovered] Initiator: CPU Error type: UE [Instruction fetch] Effective address: 0000000030002864 Oops: Machine check, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: bork(O) bridge stp llc kvm [last unloaded: bork] CPU: 36 PID: 55162 Comm: bash Tainted: G O 3.14.0mce #1 task: c000002d72d022d0 ti: c000000007ec0000 task.ti: c000002d72de4000 NIP: 0000000030002864 LR: 00000000300151a4 CTR: 000000003001518c REGS: c000000007ec3d80 TRAP: 0200 Tainted: G O (3.14.0mce) MSR: 9000000000041002 <SF,HV,ME,RI> CR: 28222848 XER: 20000000 CFAR: 0000000030002838 DAR: d0000000004d0000 DSISR: 00000000 SOFTE: 1 GPR00: 000000003001512c 0000000031f92cb0 0000000030078af0 0000000030002864 GPR04: d0000000004d0000 0000000000000000 0000000030002864 ffffffffffffffc9 GPR08: 0000000000000024 0000000030008af0 000000000000002c c00000000150e728 GPR12: 9000000000041002 0000000031f90000 0000000010142550 0000000040000000 GPR16: 0000000010143cdc 0000000000000000 00000000101306fc 00000000101424dc GPR20: 00000000101424e0 000000001013c6f0 0000000000000000 0000000000000000 GPR24: 0000000010143ce0 00000000100f6440 c000002d72de7e00 c000002d72860250 GPR28: c000002d72860240 c000002d72ac0038 0000000000000008 0000000000040000 NIP [0000000030002864] 0x30002864 LR [00000000300151a4] 0x300151a4 Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 7285f0beac1e29d3 ]--- Sending IPI to other CPUs IPI complete OPAL V3 detected ! -------------- Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
As Ben suggested, it's meaningful to dump PE's location code for site engineers when hitting EEH errors. The patch introduces function eeh_pe_loc_get() to retireve the location code from dev-tree so that we can output it when hitting EEH errors. If primary PE bus is root bus, the PHB's dev-node would be tried prior to root port's dev-node. Otherwise, the upstream bridge's dev-node of the primary PE bus will be check for the location code directly. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
This patch enables POWER8 doorbell IPIs on powernv. Since doorbells can only IPI within a core, we test to see when we can use doorbells and if not we fall back to XICS. This also enables hypervisor doorbells to wakeup us up from nap/sleep via the LPCR PECEDH bit. Based on tests by Anton, the best case IPI latency between two threads dropped from 894ns to 512ns. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
On PowerNV platform, EEH errors are reported by IO accessors or poller driven by interrupt. After the PE is isolated, we won't produce EEH event for the PE. The current implementation has possibility of EEH event lost in this way: The interrupt handler queues one "special" event, which drives the poller. EEH thread doesn't pick the special event yet. IO accessors kicks in, the frozen PE is marked as "isolated" and EEH event is queued to the list. EEH thread runs because of special event and purge all existing EEH events. However, we never produce an other EEH event for the frozen PE. Eventually, the PE is marked as "isolated" and we don't have EEH event to recover it. The patch fixes the issue to keep EEH events for PEs that have been marked as "isolated" with the help of additional "force" help to eeh_remove_event(). Reported-by: NRolf Brudeseth <rolfb@us.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-