- 10 1月, 2013 5 次提交
-
-
由 Alexander Graf 提交于
The EPR register is potentially valid for PR KVM as well, so we need to emulate accesses to it. It's only defined for reading, so only handle the mfspr case. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When injecting an interrupt into guest context, we usually don't need to check for requests anymore. At least not until today. With the introduction of EPR, we will have to create a request when the guest has successfully accepted an external interrupt though. So we need to prepare the interrupt delivery to abort guest entry gracefully. Otherwise we'd delay the EPR request. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
On mfspr/mtspr emulation path Book3E's MMUCFG SPR with value 1015 clashes with G4's MSSSR0 SPR. Move MSSSR0 emulation from generic part to Books3S. MSSSR0 also clashes with Book3S's DABRX SPR. DABRX was not explicitly handled so Book3S execution flow will behave as before. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When running on top of pHyp, the hypercall instruction "sc 1" goes straight into pHyp without trapping in supervisor mode. So if we want to support PAPR guest in this configuration we need to add a second way of accessing PAPR hypercalls, preferably with the exact same semantics except for the instruction. So let's overlay an officially reserved instruction and emulate PAPR hypercalls whenever we hit that one. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When we hit an emulation result that we didn't expect, that is an error, but it's nothing that warrants a BUG(), because it can be guest triggered. So instead, let's only WARN() the user that this happened. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 14 12月, 2012 3 次提交
-
-
由 Alex Williamson 提交于
There's no need for this to be an int, it holds a boolean. Move to the end of the struct for alignment. Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Alex Williamson 提交于
Seems like everyone copied x86 and defined 4 private memory slots that never actually get used. Even x86 only uses 3 of the 4. These aren't exposed so there's no need to add padding. Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Alex Williamson 提交于
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is the user accessible slots and the other is user + private. Make this more obvious. Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 13 12月, 2012 1 次提交
-
-
由 David Rientjes 提交于
out_of_memory() is a globally defined function to call the oom killer. x86, sh, and powerpc all use a function of the same name within file scope in their respective fault.c unnecessarily. Inline the functions into the pagefault handlers to clean the code up. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 12月, 2012 2 次提交
-
-
由 Joonsoo Kim 提交于
It is strange that alloc_bootmem() returns a virtual address and free_bootmem() requires a physical address. Anyway, free_bootmem()'s first parameter should be physical address. There are some call sites for free_bootmem() with virtual address. So fix them. [akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation] Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wen Congyang 提交于
We use a static array to store struct node. In many cases, we don't have too many nodes, and some memory will be unused. Convert it to per-device dynamically allocated memory. Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 12月, 2012 26 次提交
-
-
由 Mihai Caraman 提交于
Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to the list of ONE_REG PPC supported registers. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> [agraf: remove HV dependency, use get/put_user] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Add EPCR support in booke mtspr/mfspr emulation. EPCR register is defined only for 64-bit and HV categories, we will expose it at this point only to 64-bit virtual processors running on 64-bit HV hosts. Define a reusable setter function for vcpu's EPCR. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> [agraf: move HV dependency in the code] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
When delivering guest IRQs, update MSR computation mode according to guest interrupt computation mode found in EPCR. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> [agraf: remove HV dependency in the code] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
In BookE, EPCR is defined and valid when either the HV or the 64bit category are implemented. Reflect this in the field definition. Today the only KVM target on 64bit is HV enabled, so there is no change in actual source code, but this keeps the code closer to the spec and doesn't build up artificial road blocks for a PR KVM on 64bit. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Extend MAS2 EPN mask to retain most significant bits on 64-bit hosts. Use this mask in tlb effective address accessor. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Mask high 32 bits of MAS2's effective page number in tlbwe emulation for guests running in 32-bit mode. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Mask high 32 bits of effective address in emulation layer for guests running in 32-bit mode. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> [agraf: fix indent] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Add emulation helper for getting instruction ea and refactor tlb instruction emulation to use it. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> [agraf: keep rt variable around] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Add interrupt handling support for 64-bit bookehv hosts. Unify 32 and 64 bit implementations using a common stack layout and a common execution flow starting from kvm_handler_common macro. Update documentation for 64-bit input register values. This patch only address the bolted TLB miss exception handlers version. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
GET_VCPU define will not be implemented for 64-bit for performance reasons so get rid of it also on 32-bit. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Include header file for get_tb() declaration. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
64-bit GCC 4.5.1 warns about an uninitialized variable which was guarded by a flag. Initialize the variable to make it happy. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> [agraf: reword comment] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently, if a machine check interrupt happens while we are in the guest, we exit the guest and call the host's machine check handler, which tends to cause the host to panic. Some machine checks can be triggered by the guest; for example, if the guest creates two entries in the SLB that map the same effective address, and then accesses that effective address, the CPU will take a machine check interrupt. To handle this better, when a machine check happens inside the guest, we call a new function, kvmppc_realmode_machine_check(), while still in real mode before exiting the guest. On POWER7, it handles the cases that the guest can trigger, either by flushing and reloading the SLB, or by flushing the TLB, and then it delivers the machine check interrupt directly to the guest without going back to the host. On POWER7, the OPAL firmware patches the machine check interrupt vector so that it gets control first, and it leaves behind its analysis of the situation in a structure pointed to by the opal_mc_evt field of the paca. The kvmppc_realmode_machine_check() function looks at this, and if OPAL reports that there was no error, or that it has handled the error, we also go straight back to the guest with a machine check. We have to deliver a machine check to the guest since the machine check interrupt might have trashed valid values in SRR0/1. If the machine check is one we can't handle in real mode, and one that OPAL hasn't already handled, or on PPC970, we exit the guest and call the host's machine check handler. We do this by jumping to the machine_check_fwnmi label, rather than absolute address 0x200, because we don't want to re-execute OPAL's handler on POWER7. On PPC970, the two are equivalent because address 0x200 just contains a branch. Then, if the host machine check handler decides that the system can continue executing, kvmppc_handle_exit() delivers a machine check interrupt to the guest -- once again to let the guest know that SRR0/1 have been modified. Signed-off-by: NPaul Mackerras <paulus@samba.org> [agraf: fix checkpatch warnings] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
When we change or remove a HPT (hashed page table) entry, we can do either a global TLB invalidation (tlbie) that works across the whole machine, or a local invalidation (tlbiel) that only affects this core. Currently we do local invalidations if the VM has only one vcpu or if the guest requests it with the H_LOCAL flag, though the guest Linux kernel currently doesn't ever use H_LOCAL. Then, to cope with the possibility that vcpus moving around to different physical cores might expose stale TLB entries, there is some code in kvmppc_hv_entry to flush the whole TLB of entries for this VM if either this vcpu is now running on a different physical core from where it last ran, or if this physical core last ran a different vcpu. There are a number of problems on POWER7 with this as it stands: - The TLB invalidation is done per thread, whereas it only needs to be done per core, since the TLB is shared between the threads. - With the possibility of the host paging out guest pages, the use of H_LOCAL by an SMP guest is dangerous since the guest could possibly retain and use a stale TLB entry pointing to a page that had been removed from the guest. - The TLB invalidations that we do when a vcpu moves from one physical core to another are unnecessary in the case of an SMP guest that isn't using H_LOCAL. - The optimization of using local invalidations rather than global should apply to guests with one virtual core, not just one vcpu. (None of this applies on PPC970, since there we always have to invalidate the whole TLB when entering and leaving the guest, and we can't support paging out guest memory.) To fix these problems and simplify the code, we now maintain a simple cpumask of which cpus need to flush the TLB on entry to the guest. (This is indexed by cpu, though we only ever use the bits for thread 0 of each core.) Whenever we do a local TLB invalidation, we set the bits for every cpu except the bit for thread 0 of the core that we're currently running on. Whenever we enter a guest, we test and clear the bit for our core, and flush the TLB if it was set. On initial startup of the VM, and when resetting the HPT, we set all the bits in the need_tlb_flush cpumask, since any core could potentially have stale TLB entries from the previous VM to use the same LPID, or the previous contents of the HPT. Then, we maintain a count of the number of online virtual cores, and use that when deciding whether to use a local invalidation rather than the number of online vcpus. The code to make that decision is extracted out into a new function, global_invalidates(). For multi-core guests on POWER7 (i.e. when we are using mmu notifiers), we now never do local invalidations regardless of the H_LOCAL flag. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
The mask of MSR bits that get transferred from the guest MSR to the shadow MSR included MSR_DE. In fact that bit only exists on Book 3E processors, and it is assigned the same bit used for MSR_BE on Book 3S processors. Since we already had MSR_BE in the mask, this just removes MSR_DE. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This fixes various issues in how we were handling the VSX registers that exist on POWER7 machines. First, we were running off the end of the current->thread.fpr[] array. Ultimately this was because the vcpu->arch.vsr[] array is sized to be able to store both the FP registers and the extra VSX registers (i.e. 64 entries), but PR KVM only uses it for the extra VSX registers (i.e. 32 entries). Secondly, calling load_up_vsx() from C code is a really bad idea, because it jumps to fast_exception_return at the end, rather than returning with a blr instruction. This was causing it to jump off to a random location with random register contents, since it was using the largely uninitialized stack frame created by kvmppc_load_up_vsx. In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx, since giveup_fpu and load_up_fpu handle the extra VSX registers as well as the standard FP registers on machines with VSX. Also, since VSX instructions can access the VMX registers and the FP registers as well as the extra VSX registers, we have to load up the FP and VMX registers before we can turn on the MSR_VSX bit for the guest. Conversely, if we save away any of the VSX or FP registers, we have to turn off MSR_VSX for the guest. To handle all this, it is more convenient for a single call to kvmppc_giveup_ext() to handle all the state saving that needs to be done, so we make it take a set of MSR bits rather than just one, and the switch statement becomes a series of if statements. Similarly kvmppc_handle_ext needs to be able to load up more than one set of registers. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This adds basic emulation of the PURR and SPURR registers. We assume we are emulating a single-threaded core, so these advance at the same rate as the timebase. A Linux kernel running on a POWER7 expects to be able to access these registers and is not prepared to handle a program interrupt on accessing them. This also adds a very minimal emulation of the DSCR (data stream control register). Writes are ignored and reads return zero. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently, if the guest does an H_PROTECT hcall requesting that the permissions on a HPT entry be changed to allow writing, we make the requested change even if the page is marked read-only in the host Linux page tables. This is a problem since it would for instance allow a guest to modify a page that KSM has decided can be shared between multiple guests. To fix this, if the new permissions for the page allow writing, we need to look up the memslot for the page, work out the host virtual address, and look up the Linux page tables to get the PTE for the page. If that PTE is read-only, we reduce the HPTE permissions to read-only. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This fixes a bug in the code which allows userspace to read out the contents of the guest's hashed page table (HPT). On the second and subsequent passes through the HPT, when we are reporting only those entries that have changed, we were incorrectly initializing the index field of the header with the index of the first entry we skipped rather than the first changed entry. This fixes it. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
With HV-style KVM, we maintain reverse-mapping lists that enable us to find all the HPT (hashed page table) entries that reference each guest physical page, with the heads of the lists in the memslot->arch.rmap arrays. When we reset the HPT (i.e. when we reboot the VM), we clear out all the HPT entries but we were not clearing out the reverse mapping lists. The result is that as we create new HPT entries, the lists get corrupted, which can easily lead to loops, resulting in the host kernel hanging when it tries to traverse those lists. This fixes the problem by zeroing out all the reverse mapping lists when we zero out the HPT. This incidentally means that we are also zeroing our record of the referenced and changed bits (not the bits in the Linux PTEs, used by the Linux MM subsystem, but the bits used by the KVM_GET_DIRTY_LOG ioctl, and those used by kvm_age_hva() and kvm_test_age_hva()). Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the "bolted" entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. The format of the data provides a simple run-length compression of the invalid entries. Each block of data starts with a header that indicates the index (position in the HPT, which is just an array), the number of valid entries starting at that index (may be zero), and the number of invalid entries following those valid entries. The valid entries, 16 bytes each, follow the header. The invalid entries are not explicitly represented. Signed-off-by: NPaul Mackerras <paulus@samba.org> [agraf: fix documentation] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This makes a HPTE removal function, kvmppc_do_h_remove(), available outside book3s_hv_rm_mmu.c. This will be used by the HPT writing code. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This uses a bit in our record of the guest view of the HPTE to record when the HPTE gets modified. We use a reserved bit for this, and ensure that this bit is always cleared in HPTE values returned to the guest. The recording of modified HPTEs is only done if other code indicates its interest by setting kvm->arch.hpte_mod_interest to a non-zero value. The reason for this is that when later commits add facilities for userspace to read the HPT, the first pass of reading the HPT will be quicker if there are no (or very few) HPTEs marked as modified, rather than having most HPTEs marked as modified. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This fixes a bug where adding a new guest HPT entry via the H_ENTER hcall would lose the "changed" bit in the reverse map information for the guest physical page being mapped. The result was that the KVM_GET_DIRTY_LOG could return a zero bit for the page even though the page had been modified by the guest. This fixes it by only modifying the index and present bits in the reverse map entry, thus preserving the reference and change bits. We were also unnecessarily setting the reference bit, and this fixes that too. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This restructures the code that creates HPT (hashed page table) entries so that it can be called in situations where we don't have a struct vcpu pointer, only a struct kvm pointer. It also fixes a bug where kvmppc_map_vrma() would corrupt the guest R4 value. Most of the work of kvmppc_virtmode_h_enter is now done by a new function, kvmppc_virtmode_do_h_enter, which itself calls another new function, kvmppc_do_h_enter, which contains most of the old kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments for the place to return the HPTE index, the Linux page tables to use, and whether it is being called in real mode, thus removing the need for it to have the vcpu as an argument. Currently kvmppc_map_vrma creates the VRMA (virtual real mode area) HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily to handle H_ENTER hcalls from the guest that need to pin a page of memory. Since H_ENTER returns the index of the created HPTE in R4, kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4 in the case when it gets called from kvmppc_map_vrma on the first VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls kvmppc_virtmode_do_h_enter with the address of a dummy word as the place to store the HPTE index, thus avoiding corrupting the guest R4. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
In order to support the generic eventfd infrastructure on PPC, we need to call into the generic KVM in-kernel device mmio code. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 11月, 2012 1 次提交
-
-
由 Grant Likely 提交于
Commit c22618a1, "drivers/of: Constify device_node->name and ->path_component_name" changes device_node name to a const value, but the PowerPC scom code still assigns it to a non-void field in debugfs_blob_wrapper. The /right/ solution might be to change the debugfs_blob_wrapper->data to also be const, but that is a bit risky. Instead, cast the value to (void*). It is a bit ugly, but it is the safest change until it can be investigated where debugfs_blob_wrapper can be modified. Reported-by: NMichael Neuling <mikey@neuling.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
- 29 11月, 2012 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-