- 21 10月, 2015 1 次提交
-
-
由 Paul Mackerras 提交于
This reverts commit 9678cdaa ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8") because the original commit had multiple, partly self-cancelling bugs, that could cause occasional memory corruption. In fact the logmpp instruction was incorrectly using register r0 as the source of the buffer address and operation code, and depending on what was in r0, it would either do nothing or corrupt the 64k page pointed to by r0. The logmpp instruction encoding and the operation code definitions could be corrected, but then there is the problem that there is no clearly defined way to know when the hardware has finished writing to the buffer. The original commit attempted to work around this by aborting the write-out before starting the prefetch, but this is ineffective in the case where the virtual core is now executing on a different physical core from the one where the write-out was initiated. These problems plus advice from the hardware designers not to use the function (since the measured performance improvement from using the feature was actually mostly negative), mean that reverting the code is the best option. Fixes: 9678cdaa ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8") Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 09 10月, 2015 1 次提交
-
-
由 Cyril Bur 提交于
native_hpte_clear() is called in real mode from two places: - Early in boot during htab initialisation if firmware assisted dump is active. - Late in the kexec path. In both contexts there is no need to disable interrupts are they are already disabled. Furthermore, locking around the tlbie() is only required for pre POWER5 hardware. On POWER5 or newer hardware concurrent tlbie()s work as expected and on pre POWER5 hardware concurrent tlbie()s could result in deadlock. This code would only be executed at crashdump time, during which all bets are off, concurrent tlbie()s are unlikely and taking locks is unsafe therefore the best course of action is to simply do nothing. Concurrent tlbie()s are not possible in the first case as secondary CPUs have not come up yet. Signed-off-by: NCyril Bur <cyrilbur@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 9月, 2015 1 次提交
-
-
由 David Hildenbrand 提交于
We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 9月, 2015 1 次提交
-
-
由 Michael Ellerman 提交于
The selftest passes on 64-bit LE & BE, and 32-bit. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 16 9月, 2015 2 次提交
-
-
由 Thomas Gleixner 提交于
Most interrupt flow handlers do not use the irq argument. Those few which use it can retrieve the irq number from the irq descriptor. Remove the argument. Search and replace was done with coccinelle and some extra helper scripts around it. Thanks to Julia for her help! Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Jiang Liu <jiang.liu@linux.intel.com>
-
由 Paolo Bonzini 提交于
This new statistic can help diagnosing VCPUs that, for any reason, trigger bad behavior of halt_poll_ns autotuning. For example, say halt_poll_ns = 480000, and wakeups are spaced exactly like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes 10+20+40+80+160+320+480 = 1110 microseconds out of every 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then is consuming about 30% more CPU than it would use without polling. This would show as an abnormally high number of attempted polling compared to the successful polls. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com< Reviewed-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 9月, 2015 5 次提交
-
-
由 Christoph Hellwig 提交于
Almost everyone implements dma_set_mask the same way, although some time that's hidden in ->set_dma_mask methods. This patch consolidates those into a common implementation that either calls ->set_dma_mask if present or otherwise uses the default implementation. Some architectures used to only call ->set_dma_mask after the initial checks, and those instance have been fixed to do the full work. h8300 implemented dma_set_mask bogusly as a no-ops and has been fixed. Unfortunately some architectures overload unrelated semantics like changing the dma_ops into it so we still need to allow for an architecture override for now. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Most architectures just call into ->dma_supported, but some also return 1 if the method is not present, or 0 if no dma ops are present (although that should never happeb). Consolidate this more broad version into common code. Also fix h8300 which inorrectly always returned 0, which would have been a problem if it's dma_set_mask implementation wasn't a similarly buggy noop. As a few architectures have much more elaborate implementations, we still allow for arch overrides. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Currently there are three valid implementations of dma_mapping_error: (1) call ->mapping_error (2) check for a hardcoded error code (3) always return 0 This patch provides a common implementation that calls ->mapping_error if present, then checks for DMA_ERROR_CODE if defined or otherwise returns 0. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Most architectures do not support non-coherent allocations and either define dma_{alloc,free}_noncoherent to their coherent versions or stub them out. Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips implements them directly. This patch moves the Openrisc version to common code, and handles the DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance. Note that actual non-coherent allocations require a dma_cache_sync implementation, so if non-coherent allocations didn't work on an architecture before this patch they still won't work after it. [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Since 2009 we have a nice asm-generic header implementing lots of DMA API functions for architectures using struct dma_map_ops, but unfortunately it's still missing a lot of APIs that all architectures still have to duplicate. This series consolidates the remaining functions, although we still need arch opt outs for two of them as a few architectures have very non-standard implementations. This patch (of 5): The coherent DMA allocator works the same over all architectures supporting dma_map operations. This patch consolidates them and converges the minor differences: - the debug_dma helpers are now called from all architectures, including those that were previously missing them - dma_alloc_from_coherent and dma_release_from_coherent are now always called from the generic alloc/free routines instead of the ops dma-mapping-common.h always includes dma-coherent.h to get the defintions for them, or the stubs if the architecture doesn't support this feature - checks for ->alloc / ->free presence are removed. There is only one magic instead of dma_map_ops without them (mic_dma_ops) and that one is x86 only anyway. Besides that only x86 needs special treatment to replace a default devices if none is passed and tweak the gfp_flags. An optional arch hook is provided for that. [linux@roeck-us.net: fix build] [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2015 1 次提交
-
-
由 Michael Ellerman 提交于
The selftest passes on 64-bit LE and BE. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 03 9月, 2015 1 次提交
-
-
由 Thomas Huth 提交于
The size of the Problem State Priority Boost Register is only 32 bits, but the kvm_vcpu_arch->pspb variable is declared as "ulong", ie. 64-bit. However, the assembler code accesses this variable with 32-bit accesses, and the KVM_REG_PPC_PSPB macro is defined with SIZE_U32, too, so that the current code is broken on big endian hosts: kvmppc_get_one_reg_hv() will only return zero for this register since it is using the wrong half of the pspb variable. Let's fix this problem by adjusting the size of the pspb field in the kvm_vcpu_arch structure. Signed-off-by: NThomas Huth <thuth@redhat.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 22 8月, 2015 6 次提交
-
-
由 Michael Ellerman 提交于
When I merged the OPAL support for the powernv LEDS driver I missed a hunk. This is slightly modified from the original patch, as the original added code to opal-api.h which is not in the skiboot version, which is discouraged. Instead those values are moved into the driver, which is the only place they are used. Fixes: 8a8d9181 ("powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states") Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Sam bobroff 提交于
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: NLaurent Vivier <lvivier@redhat.com> Reviewed-by: NThomas Huth <thuth@redhat.com> Tested-by: NThomas Huth <thuth@redhat.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This fixes a bug in the tracking of pages that get modified by the guest. If the guest creates a large-page HPTE, writes to memory somewhere within the large page, and then removes the HPTE, we only record the modified state for the first normal page within the large page, when in fact the guest might have modified some other normal page within the large page. To fix this we use some unused bits in the rmap entry to record the order (log base 2) of the size of the page that was modified, when removing an HPTE. Then in kvm_test_clear_dirty_npages() we use that order to return the correct number of modified pages. The same thing could in principle happen when removing a HPTE at the host's request, i.e. when paging out a page, except that we never page out large pages, and the guest can only create large-page HPTEs if the guest RAM is backed by large pages. However, we also fix this case for the sake of future-proofing. The reference bit is also subject to the same loss of information. We don't make the same fix here for the reference bit because there isn't an interface for userspace to find out which pages the guest has referenced, whereas there is one for userspace to find out which pages the guest has modified. Because of this loss of information, the kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly say that a page has not been referenced when it has, but that doesn't matter greatly because we never page or swap out large pages. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu->arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Tested-by: NLaurent Vivier <lvivier@redhat.com> Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Thomas Huth 提交于
When compiling the KVM code for POWER with "make C=1", sparse complains about functions missing proper prototypes and a 64-bit constant missing the ULL prefix. Let's fix this by making the functions static or by including the proper header with the prototypes, and by appending a ULL prefix to the constant PPC_MPPE_ADDRESS_MASK. Signed-off-by: NThomas Huth <thuth@redhat.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 20 8月, 2015 1 次提交
-
-
由 Anshuman Khandual 提交于
This patch registers the following two new OPAL interfaces calls for the platform LED subsystem. With the help of these new OPAL calls, the kernel will be able to get or set the state of various individual LEDs on the system at any given location code which is passed through the LED specific device tree nodes. (1) OPAL_LEDS_GET_INDICATOR opal_leds_get_ind (2) OPAL_LEDS_SET_INDICATOR opal_leds_set_ind Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com> Tested-by: NStewart Smith <stewart@linux.vnet.ibm.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 18 8月, 2015 5 次提交
-
-
由 Gavin Shan 提交于
pcibios_set_pcie_reset_state() could be called to complete reset request when passing through PCI device, flag EEH_PE_ISOLATED is set before saving the PCI config sapce. On some Broadcom adapters, EEH_PE_CFG_BLOCKED is automatically set when the flag EEH_PE_ISOLATED is marked. It caused bogus data saved from the PCI config space, which will be restored to the PCI adapter after the reset. Eventually, the hardware can't work with corrupted data in PCI config space. The patch fixes the issue with eeh_pe_state_mark_no_cfg(), which doesn't set EEH_PE_CFG_BLOCKED when seeing EEH_PE_ISOLATED on the PE, in order to avoid the bogus data saved and restored to the PCI config space. Reported-by: NRajanikanth H. Adaveeshaiah <rajanikanth.ha@in.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Andrew Donnellan 提交于
Simplify the dma_get_required_mask call chain by moving it from pnv_phb to pci_controller_ops, similar to commit 763d2d8d ("powerpc/powernv: Move dma_set_mask from pnv_phb to pci_controller_ops"). Previous call chain: 0) call dma_get_required_mask() (kernel/dma.c) 1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that points to pnv_dma_get_required_mask() (platforms/powernv/setup.c) 2) device is PCI, therefore call pnv_pci_dma_get_required_mask() (platforms/powernv/pci.c) 3) call phb->dma_get_required_mask if it exists 4) it only exists in the ioda case, where it points to pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c) New call chain: 0) call dma_get_required_mask() (kernel/dma.c) 1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask if it exists 2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c) In the p5ioc2 case, the call chain remains the same - dma_get_required_mask() does not find either a ppc_md call or pci_controller_ops call, so it calls __dma_get_required_mask(). Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Now that support for 64k pages with a 4K kernel is removed, this code is unreachable. CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is also true. But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which includes pte-hash64-64k.h, which defines both pte_pagesize_index() and crucially __real_pte, which means this definition can never be used. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
由 Michael Ellerman 提交于
Back in the olden days we added support for using 64K pages to map the SPU (Synergistic Processing Unit) local store on Cell, when the main kernel was using 4K pages. This was useful at the time because distros were using 4K pages, but using 64K pages on the SPUs could reduce TLB pressure there. However these days the number of Cell users is approaching zero, and supporting this option adds unpleasant complexity to the memory management code. So drop the option, CONFIG_SPU_FS_64K_LS, and all related code. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NJeremy Kerr <jk@ozlabs.org>
-
由 Michael Ellerman 提交于
The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K PAGE_SIZE. However when built with a 4K PAGE_SIZE there is an additional config option which can be enabled, PPC_HAS_HASH_64K, which means the kernel also knows how to hash a 64K page even though the base PAGE_SIZE is 4K. This is used in one obscure configuration, to support 64K pages for SPU local store on the Cell processor when the rest of the kernel is using 4K pages. In this configuration, pte_pagesize_index() is defined to just pass through its arguments to get_slice_psize(). However pte_pagesize_index() is called for both user and kernel addresses, whereas get_slice_psize() only knows how to handle user addresses. This has been broken forever, however until recently it happened to work. That was because in get_slice_psize() the large kernel address would cause the right shift of the slice mask to return zero. However in commit 7aa0727f ("powerpc/mm: Increase the slice range to 64TB"), the get_slice_psize() code was changed so that instead of a right shift we do an array lookup based on the address. When passed a kernel address this means we index way off the end of the slice array and return random junk. That is only fatal if we happen to hit something non-zero, but when we do return a non-zero value we confuse the MMU code and eventually cause a check stop. This fix is ugly, but simple. When we're called for a kernel address we return 4K, which is always correct in this configuration, otherwise we use the slice mask. Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB") Reported-by: NCyril Bur <cyrilbur@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
- 14 8月, 2015 1 次提交
-
-
由 Gautham R. Shenoy 提交于
Section 3.7 of Version 1.2 of the Power8 Processor User's Manual prescribes that updates to HID0 be preceded by a SYNC instruction and followed by an ISYNC instruction (Page 91). Create an inline function name update_power8_hid0() which follows this recipe and invoke it from the static split core path. Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: NSam Bobroff <sam.bobroff@au1.ibm.com> Tested-by: NSam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 8月, 2015 5 次提交
-
-
由 Scott Wood 提交于
In CoreNet systems it is not allowed to mix M and non-M mappings to the same memory, and coherent DMA accesses are considered to be M mappings for this purpose. Ignoring this has been observed to cause hard lockups in non-SMP kernels on e6500. Furthermore, e6500 implements the LRAT (logical to real address table) which allows KVM guests to control the WIMGE bits. This means that KVM cannot force the M bit on the way it usually does, so the guest had better set it itself. Signed-off-by: NScott Wood <scottwood@freescale.com>
-
由 Scott Wood 提交于
map_kernel() doesn't catch all places that create kernel PTEs. In particular, vmalloc() calls set_pte_at() directly. This causes a crash when booting a non-SMP kernel on e6500. Move the sync to __set_pte(), to be executed only for kernel addresses. Signed-off-by: NScott Wood <scottwood@freescale.com>
-
由 Scott Wood 提交于
__flush_dcache_icache_phys() requires the ability to access the memory with the MMU disabled, which means that on a 32-bit system any memory above 4 GiB is inaccessible. In particular, mpc86xx is 32-bit and can have more than 4 GiB of RAM. Signed-off-by: NScott Wood <scottwood@freescale.com>
-
由 LEROY Christophe 提交于
The C version of csum_add() as defined in include/net/checksum.h gives the following assembly in ppc32: 0: 7c 04 1a 14 add r0,r4,r3 4: 7c 64 00 10 subfc r3,r4,r0 8: 7c 63 19 10 subfe r3,r3,r3 c: 7c 63 00 50 subf r3,r3,r0 and the following in ppc64: 0xc000000000001af8 <+0>: add r3,r3,r4 0xc000000000001afc <+4>: cmplw cr7,r3,r4 0xc000000000001b00 <+8>: mfcr r4 0xc000000000001b04 <+12>: rlwinm r4,r4,29,31,31 0xc000000000001b08 <+16>: add r3,r4,r3 0xc000000000001b0c <+20>: clrldi r3,r3,32 0xc000000000001b10 <+24>: blr include/net/checksum.h also offers the possibility to define an arch specific function. This patch provides a specific csum_add() inline function. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
由 LEROY Christophe 提交于
csum_tcpudp_magic() is only a few instructions, and does modify really few registers. So it is not worth having it as a separate function and suffer function branching and saving of volatile registers. This patch makes it inline by use of the already existing csum_tcpudp_nofold() function. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 06 8月, 2015 3 次提交
-
-
由 Naveen N. Rao 提交于
Add a new powerpc-specific trace clock using the timebase register, similar to x86-tsc. This gives us - a fast, monotonic, hardware clock source for trace entries, and - a clock that can be used to correlate events across cpus as well as across hypervisor and guests. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Mahesh Salgaonkar 提交于
On non-recoverable MCE errors in kernel space, Linux kernel panics and system reboots. On BMC based system opal-prd runs as a daemon in the host. Hence, kernel crash may prevent opal-prd to detect and analyze this MCE error. This may land us in a situation where the faulty memory never gets de-configured and Linux would keep hitting same MCE error again and again. If this happens in early stage of kernel initialization, then Linux will keep crashing and rebooting in a loop. This patch fixes this issue by invoking new opal_cec_reboot2() call with reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. This patch is dependent on OPAL patchset posted on skiboot mailing list at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that introduces opal_cec_reboot2() opal call. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Mahesh Salgaonkar 提交于
The V2 version of HMI event now carries additional information for Malfunction Alert. It now contains error information about CORE and NX checkstop. This patch checks and displays the check stop reason before panic. Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 04 8月, 2015 1 次提交
-
-
由 Paul E. McKenney 提交于
RCU is the only thing that uses smp_mb__after_unlock_lock(), and is likely the only thing that ever will use it, so this commit makes this macro private to RCU. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
-
- 03 8月, 2015 2 次提交
-
-
由 Peter Zijlstra 提交于
There are various problems and short-comings with the current static_key interface: - static_key_{true,false}() read like a branch depending on the key value, instead of the actual likely/unlikely branch depending on init value. - static_key_{true,false}() are, as stated above, tied to the static_key init values STATIC_KEY_INIT_{TRUE,FALSE}. - we're limited to the 2 (out of 4) possible options that compile to a default NOP because that's what our arch_static_branch() assembly emits. So provide a new static_key interface: DEFINE_STATIC_KEY_TRUE(name); DEFINE_STATIC_KEY_FALSE(name); Which define a key of different types with an initial true/false value. Then allow: static_branch_likely() static_branch_unlikely() to take a key of either type and emit the right instruction for the case. This means adding a second arch_static_branch_jump() assembly helper which emits a JMP per default. In order to determine the right instruction for the right state, encode the branch type in the LSB of jump_entry::key. This is the final step in removing the naming confusion that has led to a stream of avoidable bugs such as: a833581e ("x86, perf: Fix static_key bug in load_mm_cr4()") ... but it also allows new static key combinations that will give us performance enhancements in the subsequent patches. Tested-by: Rabin Vincent <rabin@rab.in> # arm Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andrey Konovalov 提交于
Replace ACCESS_ONCE() macro in smp_store_release() and smp_load_acquire() with WRITE_ONCE() and READ_ONCE() on x86, arm, arm64, ia64, metag, mips, powerpc, s390, sparc and asm-generic since ACCESS_ONCE() does not work reliably on non-scalar types. WRITE_ONCE() and READ_ONCE() were introduced in the following commits: 230fa253 ("kernel: Provide READ_ONCE and ASSIGN_ONCE") 43239cbe ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)") Signed-off-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDavidlohr Bueso <dbueso@suse.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: NRalf Baechle <ralf@linux-mips.org> Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@suse.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1438528264-714-1-git-send-email-andreyknvl@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 7月, 2015 3 次提交
-
-
由 Luis R. Rodriguez 提交于
This adds ioremap_uc() only for architectures that do not include asm-generic.h/io.h as that already provides a default definition for them for both cases where you have CONFIG_MMU and you do not, and because of this, the number of architectures this patch address is less than the architectures that the ioremap_wt() patch addressed, "arch/*/io.h: Add ioremap_wt() to all architectures"). In order to reduce the number of architectures we have to modify by adding new architecture IO APIs we'll have to review the architectures in this patch, see why they can't add asm-generic.h/io.h or issues that would be created by doing so and then spread a consistent inclusion of this header towards the end of their own header. For instance arch/metag includes the asm-generic/io.h *before* the ioremap*() definitions, this should be the other way around but only once we have guard wrappers for the non-MMU case also for asm-generic/io.h. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Cc: Abhilash Kesavan <a.kesavan@samsung.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Howells <dhowells@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Kyle McMartin <kyle@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-am33-list@redhat.com Cc: linux-arch@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-sh@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20150728181713.GB30479@wotan.suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Michael Ellerman 提交于
SIG_SYS was added in commit a0727e8c "signal, x86: add SIGSYS info and make it synchronous." Because we use the asm-generic struct siginfo, we got support for SIG_SYS for free as part of that commit. However there was no compat handling added for powerpc. That means we've been advertising the existence of signfo._sifields._sigsys to compat tasks, but not actually filling in the fields correctly. Luckily it looks like no one has noticed, presumably because the only user of SIGSYS in the kernel is seccomp filter, which we don't support yet. So before we enable seccomp filter, add compat handling for SIGSYS. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NKees Cook <keescook@chromium.org>
-
由 Michael Ellerman 提交于
The documentation for syscall_get_nr() in asm-generic says: Note this returns int even on 64-bit machines. Only 32 bits of system call number can be meaningful. If the actual arch value is 64 bits, this truncates to 32 bits so 0xffffffff means -1. However our implementation was never updated to reflect this. Generally it's not important, but there is once case where it matters. For seccomp filter with SECCOMP_RET_TRACE, the tracer will set regs->gpr[0] to -1 to reject the syscall. When the task is a compat task, this means we end up with 0xffffffff in r0 because ptrace will zero extend the 32-bit value. If syscall_get_nr() returns an unsigned long, then a 64-bit kernel will see a positive value in r0 and will incorrectly allow the syscall through seccomp. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NKees Cook <keescook@chromium.org>
-