- 08 10月, 2014 5 次提交
-
-
由 Ian Munsie 提交于
This adds a new function hash_page_mm() based on the existing hash_page(). This version allows any struct mm to be passed in, rather than assuming current. This is useful for servicing co-processor faults which are not in the context of the current running process. We need to be careful here as the current hash_page() assumes current in a few places. Signed-off-by: NIan Munsie <imunsie@au1.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Ian Munsie 提交于
Export mmu_kernel_ssize and mmu_linear_psize. These are needed by the cxl driver which has it's own MMU. To setup the MMU cxl needs access to these. Signed-off-by: NIan Munsie <imunsie@au1.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Ian Munsie 提交于
This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs(). This will be useful when we add cxl which also needs a similar SLB flush call. Signed-off-by: NIan Munsie <imunsie@au1.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Ian Munsie 提交于
__spu_trap_data_seg() currently contains code to determine the VSID and ESID required for a particular EA and mm struct. This code is generically useful for other co-processors. This moves the code of the cell platform so it can be used by other powerpc code. It also adds 1TB segment handling which Cell didn't support. The new function is called copro_calculate_slb(). This also moves the internal struct spu_slb to a generic struct copro_slb which is now used in the Cell and copro code. We use this new struct instead of passing around esid and vsid parameters. Signed-off-by: NIan Munsie <imunsie@au1.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Ian Munsie 提交于
Currently spu_handle_mm_fault() is in the cell platform. This code is generically useful for other non-cell co-processors on powerpc. This patch moves this function out of the cell platform into arch/powerpc/mm so that others may use it. Signed-off-by: NIan Munsie <imunsie@au1.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 02 10月, 2014 4 次提交
-
-
由 Anton Blanchard 提交于
There is no need for yet another copy of the command line, just use boot_command_line like everyone else. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
Fill in the si_addr_lsb siginfo field so the hwpoison code can pass to userspace the length of memory that has been corrupted. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
do_page_fault was missing knowledge of HWPOISON, and we would oops if userspace tried to access a poisoned page: kernel BUG at arch/powerpc/mm/fault.c:180! Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
Exit out early for a kernel fault, avoiding indenting of most of the function. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 9月, 2014 8 次提交
-
-
由 Michael Ellerman 提交于
We can unindent the bulk of htab_dt_scan_page_sizes() by returning early if the property is not found. That is nice in and of itself, but also has the advantage of making it clear that we always return success once we have found the ibm,segment-page-sizes property. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Li Zhong 提交于
this patches changes some error handling logics in numa_setup_cpu(), when cpu node is not found, so: if the cpu is possible, but not present, -1 is kept in numa_cpu_lookup_table, so later, if the cpu is added, we could set correct numa information for it. if the cpu is present, then we set the first online node to numa_cpu_lookup_table instead of 0 ( in case 0 might not be an online node? ) Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Acked-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Li Zhong 提交于
As Nish suggested, it makes more sense to init the numa node informatiion for present cpus at boottime, which could also avoid WARN_ON(1) in numa_setup_cpu(). With this change, we also need to change the smp_prepare_cpus() to set up numa information only on present cpus. For those possible, but not present cpus, their numa information will be set up after they are started, as the original code did before commit 2fabf084. Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Acked-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Tested-by: NCyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Li Zhong 提交于
With commit 2fabf084 ("powerpc: reorder per-cpu NUMA information's initialization"), during boottime, cpu_numa_callback() is called earlier(before their online) for each cpu, and verify_cpu_node_mapping() uses cpu_to_node() to check whether siblings are in the same node. It skips the checking for siblings that are not online yet. So the only check done here is for the bootcpu, which is online at that time. But the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which will be set up in smp_prepare_cpus()). So I saw something like following reported: [ 0.000000] CPU thread siblings 1/2/3 and 0 don't belong to the same node! As we don't actually do the checking during this early stage, so maybe we could directly call numa_setup_cpu() in do_init_bootmem(). Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Acked-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
A recent patch added a function prototype for htab_remove_mapping in c code. Fix it. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
Fix a number of places where global functions were not including their prototype. This ensures the prototype and the function match. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 9月, 2014 1 次提交
-
-
由 Scott Wood 提交于
Commit 1c98025c "powerpc: Dynamic DMA zone limits" updated how zones are created in paging_init(), but missed the NUMA version of paging_init(). This was noticed via a linker error, since dma_pfn_limit_to_zone() was, like the non-NUMA paging_init(), limited by #ifndef CONFIG_NEED_MULTIPLE_NODES. It turns out that the NUMA paging_init() was not actually doing anything different from the standard paging_init(), other than a couple debug prints, a couple 32-bit-only ifdef sections, and a call to mark_nonram_nosave(). It's not clear whether mark_nonram_nosave() is inherently wrong to do for NUMA, or just not useful on targets that have NUMA, but for now I'm preserving the existing behavior. Fixes: 1c98025c "powerpc: Dynamic DMA zone limits" Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 04 9月, 2014 1 次提交
-
-
由 Scott Wood 提交于
Platform code can call limit_zone_pfn() to set appropriate limits for ZONE_DMA and ZONE_DMA32, and dma_direct_alloc_coherent() will select a suitable zone based on a device's mask and the pfn limits that platform code has configured. Signed-off-by: NScott Wood <scottwood@freescale.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
-
- 13 8月, 2014 9 次提交
-
-
由 Aneesh Kumar K.V 提交于
Add tracepoint to track hugepage invalidate. This help us in debugging difficult to track bugs. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
We would get wrong results in compiler recomputed old_pmd. Avoid that by using ACCESS_ONCE CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
As per ISA, for 4k base page size we compare 14..65 bits of VA specified with the entry_VA in tlb. That implies we need to make sure we do a tlbie with all the possible 4k va we used to access the 16MB hugepage. With 64k base page size we compare 14..57 bits of VA. Hence we cannot ignore the lower 24 bits of va while tlbie .We also cannot tlb invalidate a 16MB entry with just one tlbie instruction because we don't track which va was used to instantiate the tlb entry. CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
If we changed base page size of the segment, either via sub_page_protect or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash table entries. We do a lazy hash page table flush for all mapped pages in the demoted segment. This happens when we handle hash page fault for these pages. We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte, that implies that we could possibly have older 64K hash pte entries in the hash page table and we need to invalidate those entries. Use _PAGE_COMBO to determine the page size with which we should invalidate the hash table entries on unmap. CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
If we changed base page size of the segment, either via sub_page_protect or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash table entries. We do a lazy hash page table flush for all mapped pages in the demoted segment. This happens when we handle hash page fault for these pages. We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte, that implies that we could possibly have older 64K hash pte entries in the hash page table and we need to invalidate those entries. Handle this correctly for 16M pages CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
The segment identifier and segment size will remain the same in the loop, So we can compute it outside. We also change the hugepage_invalidate interface so that we can use it the later patch CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
With hugepages, we store the hpte valid information in the pte page whose address is stored in the second half of the PMD. Use a write barrier to make sure clearing pmd busy bit and updating hpte valid info are ordered properly. CC: <stable@vger.kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nishanth Aravamudan 提交于
There is an issue currently where NUMA information is used on powerpc (and possibly ia64) before it has been read from the device-tree, which leads to large slab consumption with CONFIG_SLUB and memoryless nodes. NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate after start_secondary(), similar to ia64, which is invoked via smp_init(). Commit 6ee0578b ("workqueue: mark init_workqueues() as early_initcall()") made init_workqueues() be invoked via do_pre_smp_initcalls(), which is obviously before the secondary processors are online. Additionally, the following commits changed init_workqueues() to use cpu_to_node to determine the node to use for kthread_create_on_node: bce90380 ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]") f3f90ad4 ("workqueue: determine NUMA node of workers accourding to the allowed cpumask") Therefore, when init_workqueues() runs, it sees all CPUs as being on Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to a high number of slab deactivations (http://www.spinics.net/lists/linux-mm/msg67489.html). Fix this by initializing the powerpc-specific CPU<->node/local memory node mapping as early as possible, which on powerpc is do_init_bootmem(). Currently that function initializes the mapping for the boot CPU, but we extend it to setup the mapping for all possible CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the per-cpu values for all possible CPUs. That ensures that before the early_initcalls run (and really as early as possible), the per-cpu NUMA mapping is accurate. While testing memoryless nodes on PowerKVM guests with a fix to the workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a guest topology of: available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 node 1 size: 16336 MB node 1 free: 15329 MB node distances: node 0 1 0: 10 40 1: 40 10 the slab consumption decreases from Slab: 932416 kB SUnreclaim: 902336 kB to Slab: 395264 kB SUnreclaim: 359424 kB And we a corresponding increase in the slab efficiency from slab mem objs slabs used active active ------------------------------------------------------------ kmalloc-16384 337 MB 11.28% 100.00% task_struct 288 MB 9.93% 100.00% to slab mem objs slabs used active active ------------------------------------------------------------ kmalloc-16384 37 MB 100.00% 100.00% task_struct 31 MB 100.00% 100.00% Powerpc didn't support memoryless nodes until recently (64bb80d8 "powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c272261 "powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also helped improve memory consumption with these kind of environments. Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Scott Wood 提交于
__early_init_mmu() does some things that are really only needed by the boot cpu. On FSL booke, This includes calling memblock_enforce_memory_limit(), which is labelled __init. Secondary cpu init code can't be __init as that would break CPU hotplug. While it's probably a bug that memblock_enforce_memory_limit() isn't __init_memblock instead, there's no reason why we should be doing this stuff for secondary cpus in the first place. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 8月, 2014 1 次提交
-
-
由 Laura Abbott 提交于
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an architecture specific scatterlist.h, make it a proper Kconfig option and use that instead. At same time, remove the header files are are now mostly useless and just include asm-generic/scatterlist.h. [sfr@canb.auug.org.au: powerpc files now need asm/dma.h] Signed-off-by: NLaura Abbott <lauraa@codeaurora.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc] Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2014 1 次提交
-
-
由 Wang Nan 提交于
This patch introduces zone_for_memory() to arch_add_memory() on powerpc to ensure new, higher memory added into ZONE_MOVABLE if movable zone has already setup. Signed-off-by: NWang Nan <wangnan0@huawei.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Mel Gorman" <mgorman@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 8月, 2014 6 次提交
-
-
由 Li Zhong 提交于
vmemmap_populated() checks whether the [start, start + page_size) has valid pfn numbers, to know whether a vmemmap mapping has been created that includes this range. Some range before end might not be checked by this loop: sec11start......start11..sec11end/sec12start..end....start12..sec12end as the above, for start11(section 11), it checks [sec11start, sec11end), and loop ends as the next start(start12) is bigger than end. However, [sec11end/sec12start, end) is not checked here. So before the loop, adjust the start to be the start of the section, so we don't miss ranges like the above. After we adjust start to be the start of the section, it also means it's aligned with vmemmap as of the sizeof struct page, so we could use page_to_pfn directly in the loop. Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Li Zhong 提交于
vmemmap_free() does the opposite of vmemap_populate(). This patch also puts vmemmap_free() and vmemmap_list_free() into CONFIG_MEMMORY_HOTPLUG. Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Li Zhong 提交于
This is to be called in vmemmap_free(), leave the implementation on BOOK3E empty as before. Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Li Zhong 提交于
This patch implements vmemmap_list_free() for vmemmap_free(). The freed entries will be removed from vmemmap_list, and form a freed list, with next as the header. The next position in the last allocated page is kept at the list tail. When allocation, if there are freed entries left, get it from the freed list; if no freed entries left, get it like before from the last allocated pages. With this change, realmode_pfn_to_page() also needs to be changed to walk all the entries in the vmemmap_list, as the virt_addr of the entries might not be stored in order anymore. It helps to reuse the memory when continuous doing memory hot-plug/remove operations, but didn't reclaim the pages already allocated, so the memory usage will only increase, but won't exceed the value for the largest memory configuration. Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Scott Wood 提交于
This silences a section mismatch warning. early_alloc_pgtable() is called from map_kernel_page() which cannot be __init, but only when slab_is_available() returns false which can only happen during early boot. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Andrey Utkin 提交于
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81631Reported-by: NDavid Binderman <dcb314@hotmail.com> Signed-off-by: NAndrey Utkin <andrey.krieger.utkin@gmail.com> CC: <stable@vger.kernel.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 30 7月, 2014 1 次提交
-
-
由 Scott Wood 提交于
Erratum A-008139 can cause duplicate TLB entries if an indirect entry is overwritten using tlbwe while the other thread is using it to do a lookup. Work around this by using tlbilx to invalidate prior to overwriting. To avoid the need to save another register to hold MAS1 during the workaround code, TID clearing has been moved from tlb_miss_kernel_e6500 until after the SMT section. Signed-off-by: NScott Wood <scottwood@freescale.com>
-
- 28 7月, 2014 3 次提交
-
-
由 Michael Ellerman 提交于
There are still a few occurences where it remains, because it helps to explain something that persists. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
Now that we have dropped power3 support we can remove CONFIG_POWER3. The usage in pgtable_32.c was already dead code as CONFIG_POWER3 was not selectable on PPC32. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
We now only support cpus that use an SLB, so we don't need an MMU feature to indicate that. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-