- 29 8月, 2014 1 次提交
-
-
由 Paolo Bonzini 提交于
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid "'struct foo' declared inside parameter list" warnings (and consequent breakage due to conflicting types). Move them from individual files to a generic place in linux/kvm_types.h. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 8月, 2014 7 次提交
-
-
由 Aneesh Kumar K.V 提交于
On ppc64 we support 4K hash pte with 64K page size. That requires us to track the hash pte slot information on a per 4k basis. We do that by storing the slot details in the second half of pte page. The pte bit _PAGE_COMBO is used to indicate whether the second half need to be looked while building real_pte. We need to use read memory barrier while doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO check. On the store side we already do a lwsync in __hash_page_4K CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
If we changed base page size of the segment, either via sub_page_protect or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash table entries. We do a lazy hash page table flush for all mapped pages in the demoted segment. This happens when we handle hash page fault for these pages. We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte, that implies that we could possibly have older 64K hash pte entries in the hash page table and we need to invalidate those entries. Use _PAGE_COMBO to determine the page size with which we should invalidate the hash table entries on unmap. CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
The segment identifier and segment size will remain the same in the loop, So we can compute it outside. We also change the hugepage_invalidate interface so that we can use it the later patch CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nishanth Aravamudan 提交于
It appears that commits 7f06f21d ("powerpc/tm: Add checking to treclaim/trechkpt") and e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support") both added definitions of TEXASR_FS. Remove one of them. At the same time, fix the alignment of the remaining definition (should be tab-separated like the rest of the #defines). Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Vasant Hegde 提交于
PowerNV platform is capable of capturing host memory region when system crashes (because of host/firmware). We have new OPAL API to register/ unregister memory region to be captured when system crashes. This patch adds support for new API. Also during boot time we register kernel log buffer and unregister before doing kexec. Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
We have been a bit slack about updating the CPU_FTRS_POSSIBLE and CPU_FTRS_ALWAYS masks. When we added POWER8, and also POWER8E we forgot to update the ALWAYS mask. And when we added POWER8_DD1 we forgot to update both the POSSIBLE and ALWAYS masks. Luckily this hasn't caused any actual bugs AFAICS. Failing to update the ALWAYS mask just forgoes a potential optimisation opportunity. Failing to update the POSSIBLE mask for POWER8_DD1 is also OK because it only removes a bit rather than adding any. Regardless they should all be in both masks so as to avoid any future bugs when the set of ALWAYS/POSSIBLE bits changes, or the masks themselves change. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NMichael Neuling <mikey@neuling.org> Acked-by: NJoel Stanley <joel@jms.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked. Using spin_is_locked() on a lock you don't hold is obviously racy. That is, even though you may observe that the lock is unlocked, it may become locked at any time. There is (at least) one exception to that, which is if two locks are used as a pair, and the holder of each checks the status of the other before doing any update. Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic value: The first CPU does: spin_lock(*A) if spin_is_locked(*B) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*A) And the second CPU does: spin_lock(*B) if spin_is_locked(*A) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*B) Although this is a strange locking construct, it should work. It seems to be understood, but not documented, that spin_is_locked() is not a memory barrier, so in the examples above and below the caller inserts its own memory barrier before acting on the result of spin_is_locked(). For now we assume spin_is_locked() is implemented as below, and we break it out in our examples: bool spin_is_locked(*LOCK) { LOAD l = *LOCK return l.locked } Our intuition is that there should be no problem even if the two code sequences run simultaneously such as: CPU 0 CPU 1 ================================================== spin_lock(*A) spin_lock(*B) LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) If one CPU gets the lock before the other then it will do the update and the other CPU will back off: CPU 0 CPU 1 ================================================== spin_lock(*A) LOAD b = *B spin_lock(*B) if b.locked # false LOAD a = *A else if a.locked # true smp_mb() # nothing LOAD r1 = *COUNTER spin_unlock(*B) r1++ STORE *COUNTER = r1 spin_unlock(*A) However in reality spin_lock() itself is not indivisible. On powerpc we implement it as a load-and-reserve and store-conditional. Ignoring the retry logic for the lost reservation case, it boils down to: spin_lock(*LOCK) { LOAD l = *LOCK l.locked = true STORE *LOCK = l ACQUIRE_BARRIER } The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as defined in memory-barriers.txt: This acts as a one-way permeable barrier. It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system. On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is also know as "lightweight sync", or "sync 1". As described in Power ISA v2.07 section B.2.1.1, in this scenario the lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to act as the barrier, preventing any loads or stores in the locked region from occurring prior to the load of *LOCK. Whether this behaviour is in accordance with the definition of ACQUIRE semantics in memory-barriers.txt is open to discussion, we may switch to a different barrier in future. What this means in practice is that the following can occur: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true LOAD b = *B LOAD a = *A STORE *A = a STORE *B = b if b.locked # false if a.locked # false else else smp_mb() smp_mb() LOAD r1 = *COUNTER LOAD r2 = *COUNTER r1++ r2++ STORE *COUNTER = r1 STORE *COUNTER = r2 # Lost update spin_unlock(*A) spin_unlock(*B) That is, the load of *B can occur prior to the store that makes *A visibly locked. And similarly for CPU 1. The result is both CPUs hold their lock and believe the other lock is unlocked. The easiest fix for this is to add a full memory barrier to the start of spin_is_locked(), so adding to our previous definition would give us: bool spin_is_locked(*LOCK) { smp_mb() LOAD l = *LOCK return l.locked } The new barrier orders the store to the lock we are locking vs the load of the other lock: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true STORE *A = a STORE *B = b smp_mb() smp_mb() LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) Although the above example is theoretical, there is code similar to this example in sem_lock() in ipc/sem.c. This commit in addition to the next commit appears to be a fix for crashes we are seeing in that code where we believe this race happens in practice. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 8月, 2014 2 次提交
-
-
由 Andy Lutomirski 提交于
The core mm code will provide a default gate area based on FIXADDR_USER_START and FIXADDR_USER_END if !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR). This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit UML, and x86_32 have their own code just to disable it. arm, 32-bit UML, and x86_64 have gate areas, but they have their own implementations. This gets rid of the default and moves the code into ia64. This should save some code on architectures without a gate area: it's now possible to inline the gate_area functions in the default case. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NNathan Lynch <nathan_lynch@mentor.com> Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle] Acked-by: Richard Weinberger <richard@nod.at> [for um] Acked-by: Will Deacon <will.deacon@arm.com> [for arm64] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nathan Lynch <Nathan_Lynch@mentor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Laura Abbott 提交于
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an architecture specific scatterlist.h, make it a proper Kconfig option and use that instead. At same time, remove the header files are are now mostly useless and just include asm-generic/scatterlist.h. [sfr@canb.auug.org.au: powerpc files now need asm/dma.h] Signed-off-by: NLaura Abbott <lauraa@codeaurora.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc] Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 8月, 2014 11 次提交
-
-
由 Madhusudanan Kandasamy 提交于
remap_4k_pfn() silently truncates upper bits of input 4K PFN if it cannot be contained in PTE. This leads invalid memory mapping and could result in a system crash when the memory is accessed. This patch fails remap_4k_pfn() and returns -EINVAL if the input 4K PFN cannot be contained in PTE. V3 : Added parentheses to protect 'pfn' and entire macro as suggested by Brian. V2 : Rewritten to avoid helper function as suggested by Stephen Rothwell. Signed-off-by: NMadhusudanan Kandasamy <kmadhu@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Mahesh Salgaonkar 提交于
When we hit the HMI in Linux, invoke opal call to handle/recover from HMI errors in real mode and then in virtual mode during check_irq_replay() invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from OPAL and act accordingly. Now that we are ready to handle HMI interrupt directly in linux, remove the HMI interrupt registration with firmware. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Mahesh Salgaonkar 提交于
Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements basic infrastructure to handle HMI in Linux host. The design is to invoke opal handle hmi in real mode for recovery and set irq_pending when we hit HMI. During check_irq_replay pull opal hmi event and print hmi info on console. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The patch synchronizes header file with firmware to have new OPAL API opal_pci_eeh_freeze_set(), which is used to freeze the specified PE in order to support "compound" PE. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Guo Chao 提交于
This patch enables M64 aperatus for PHB3. We already had platform hook (ppc_md.pcibios_window_alignment) to affect the PCI resource assignment done in PCI core so that each PE's M32 resource was built on basis of M32 segment size. Similarly, we're using that for M64 assignment on basis of M64 segment size. * We're using last M64 BAR to cover M64 aperatus, and it's shared by all 256 PEs. * We don't support P7IOC yet. However, some function callbacks are added to (struct pnv_phb) so that we can reuse them on P7IOC in future. * PE, corresponding to PCI bus with large M64 BAR device attached, might span multiple M64 segments. We introduce "compound" PE to cover the case. The compound PE is a list of PEs and the master PE is used as before. The slave PEs are just for MMIO isolation. Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The patch allows PE (struct eeh_pe) instance to have auxillary data, whose size is configurable on basis of platform. For PowerNV, the auxillary data will be used to cache PHB diag-data for that PE (frozen PE or fenced PHB). In turn, we can retrieve the diag-data at any later points. It's useful for the case of VFIO PCI devices where the error log should be cached, and then be retrieved by the guest at later point. Also, it can avoid PHB diag-data overwritting if another frozen PE reported and the previous diag-data isn't fetched by guest. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
It's followup of commit ddf0322a ("powerpc/powernv: Fix endianness problems in EEH"). The patch helps to get non-endian-dependent diag-data. Cc: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
According to the experiment I did, PCI config access is blocked on P7IOC frozen PE by hardware, but PHB3 doesn't do that. That means we always get 0xFF's while dumping PCI config space of the frozen PE on P7IOC. We don't have the problem on PHB3. So we have to enable I/O prioir to collecting error log. Otherwise, meaningless 0xFF's are always returned. The patch fixes it by EEH flag (EEH_ENABLE_IO_FOR_LOG), which is selectively set to indicate the case for: P7IOC on PowerNV platform, pSeries platform. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
There are multiple global EEH flags. Almost each flag has its own accessor, which doesn't make sense. The patch refactors EEH flag accessors so that they look unified: eeh_add_flag(): Add EEH flag eeh_clear_flag(): Clear EEH flag eeh_has_flag(): Check if one specific flag has been set eeh_enabled(): Check if EEH functionality has been enabled Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The patch exports functions to be used by new VFIO ioctl command, which will be introduced in subsequent patch, to support EEH functinality for VFIO PCI devices. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
We must not handle EEH error on devices which are passed to somebody else. Instead, we expect that the frozen device owner detects an EEH error and recovers from it. This avoids EEH error handling on passed through devices so the device owner gets a chance to handle them. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 31 7月, 2014 1 次提交
-
-
由 Alexander Graf 提交于
We handle FSCR feature bits (well, TAR only really today) lazily when the guest starts using them. So when a guest activates the bit and later uses that feature we enable it for real in hardware. However, when the guest stops using that bit we don't stop setting it in hardware. That means we can potentially lose a trap that the guest expects to happen because it thinks a feature is not active. This patch adds support to drop TAR when then guest turns it off in FSCR. While at it it also restricts FSCR access to 64bit systems - 32bit ones don't have it. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 7月, 2014 3 次提交
-
-
由 Bharat Bhushan 提交于
This are not specific to e500hv but applicable for bookehv (As per comment from Scott Wood on my patch "kvm: ppc: bookehv: Added wrapper macros for shadow registers") Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Andy Fleming 提交于
The general idea is that each core will release all of its threads into the secondary thread startup code, which will eventually wait in the secondary core holding area, for the appropriate bit in the PACA to be set. The kick_cpu function pointer will set that bit in the PACA, and thus "release" the core/thread to boot. We also need to do a few things that U-Boot normally does for CPUs (like enable branch prediction). Signed-off-by: NAndy Fleming <afleming@freescale.com> [scottwood@freescale.com: various changes, including only enabling threads if Linux wants to kick them] Signed-off-by: NScott Wood <scottwood@freescale.com>
-
由 Scott Wood 提交于
This ensures that all MSR definitions are consistently unsigned long, and that MSR_CM does not become 0xffffffff80000000 (this is usually harmless because MSR is 32-bit on booke and is mainly noticeable when debugging, but still I'd rather avoid it). Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 29 7月, 2014 3 次提交
-
-
由 Alexander Graf 提交于
DCR handling was only needed for 440 KVM. Since we removed it, we can also remove handling of DCR accesses. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
We're going to implement guest code interpretation in KVM for some rare corner cases. This code needs to be able to inject data and instruction faults into the guest when it encounters them. Expose generic APIs to do this in a reasonably subarch agnostic fashion. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
Today the instruction emulator can get called via 2 separate code paths. It can either be called by MMIO emulation detection code or by privileged instruction traps. This is bad, as both code paths prepare the environment differently. For MMIO emulation we already know the virtual address we faulted on, so instructions there don't have to actually fetch that information. Split out the two separate use cases into separate files. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 7月, 2014 12 次提交
-
-
由 Alexander Graf 提交于
We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as well access the magic page. Special case it out so that we can properly access it. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
We have enough common infrastructure now to resolve GVA->GPA mappings at runtime. With this we can move our book3s specific helpers to load / store in guest virtual address space to common code as well. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
We have a nice API to find the translated GPAs of a GVA including protection flags. So far we only use it on Book3S, but there's no reason the same shouldn't be used on BookE as well. Implement a kvmppc_xlate() version for BookE and clean it up to make it more readable in general. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Stewart Smith 提交于
The POWER8 processor has a Micro Partition Prefetch Engine, which is a fancy way of saying "has way to store and load contents of L2 or L2+MRU way of L3 cache". We initiate the storing of the log (list of addresses) using the logmpp instruction and start restore by writing to a SPR. The logmpp instruction takes parameters in a single 64bit register: - starting address of the table to store log of L2/L2+L3 cache contents - 32kb for L2 - 128kb for L2+L3 - Aligned relative to maximum size of the table (32kb or 128kb) - Log control (no-op, L2 only, L2 and L3, abort logout) We should abort any ongoing logging before initiating one. To initiate restore, we write to the MPPR SPR. The format of what to write to the SPR is similar to the logmpp instruction parameter: - starting address of the table to read from (same alignment requirements) - table size (no data, until end of table) - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or 32 cycles) The idea behind loading and storing the contents of L2/L3 cache is to reduce memory latency in a system that is frequently swapping vcores on a physical CPU. The best case scenario for doing this is when some vcores are doing very cache heavy workloads. The worst case is when they have about 0 cache hits, so we just generate needless memory operations. This implementation just does L2 store/load. In my benchmarks this proves to be useful. Benchmark 1: - 16 core POWER8 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each - No split core/SMT - two guests running sysbench memory test. sysbench --test=memory --num-threads=8 run - one guest running apache bench (of default HTML page) ab -n 490000 -c 400 http://localhost/ This benchmark aims to measure performance of real world application (apache) where other guests are cache hot with their own workloads. The sysbench memory benchmark does pointer sized writes to a (small) memory buffer in a loop. In this benchmark with this patch I can see an improvement both in requests per second (~5%) and in mean and median response times (again, about 5%). The spread of minimum and maximum response times were largely unchanged. benchmark 2: - Same VM config as benchmark 1 - all three guests running sysbench memory benchmark This benchmark aims to see if there is a positive or negative affect to this cache heavy benchmark. Although due to the nature of the benchmark (stores) we may not see a difference in performance, but rather hopefully an improvement in consistency of performance (when vcore switched in, don't have to wait many times for cachelines to be pulled in) The results of this benchmark are improvements in consistency of performance rather than performance itself. With this patch, the few outliers in duration go away and we get more consistent performance in each guest. benchmark 3: - same 3 guests and CPU configuration as benchmark 1 and 2. - two idle guests - 1 guest running STREAM benchmark This scenario also saw performance improvement with this patch. On Copy and Scale workloads from STREAM, I got 5-6% improvement with this patch. For Add and triad, it was around 10% (or more). benchmark 4: - same 3 guests as previous benchmarks - two guests running sysbench --memory, distinctly different cache heavy workload - one guest running STREAM benchmark. Similar improvements to benchmark 3. benchmark 5: - 1 guest, 8 VCPUs, Ubuntu 14.04 - Host configured with split core (SMT8, subcores-per-core=4) - STREAM benchmark In this benchmark, we see a 10-20% performance improvement across the board of STREAM benchmark results with this patch. Based on preliminary investigation and microbenchmarks by Prerna Saxena <prerna@linux.vnet.ibm.com> Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
The 440 target hasn't been properly functioning for a few releases and before I was the only one who fixes a very serious bug that indicates to me that nobody used it before either. Furthermore KVM on 440 is slow to the extent of unusable. We don't have to carry along completely unused code. Remove 440 and give us one less thing to worry about. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer, but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment is not valid now. Note: SPRN_SPRG3R is not supported (do not see any need as of now), and if we want to support this in future then we have to shift to using SPRG1 for VCPU pointer. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
SPRN_SPRG is used by debug interrupt handler, so this is required for debug support. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
On book3e, guest last instruction is read on the exit path using load external pid (lwepx) dedicated instruction. This load operation may fail due to TLB eviction and execute-but-not-read entries. This patch lay down the path for an alternative solution to read the guest last instruction, by allowing kvmppc_get_lat_inst() function to fail. Architecture specific implmentations of kvmppc_load_last_inst() may read last guest instruction and instruct the emulation layer to re-execute the guest in case of failure. Make kvmppc_get_last_inst() definition common between architectures. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So rename and move get_epr helper function to same file. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> [agraf: remove duplicate return] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-