- 14 10月, 2018 9 次提交
-
-
由 Aneesh Kumar K.V 提交于
Currently we limit the max addressable memory to 128TB. This patch increase the limit to 2PB. We can have devices like nvdimm which adds memory above 512TB limit. We still don't support regular system ram above 512TB. One of the challenge with that is the percpu allocator, that allocates per node memory and use the max distance between them as the percpu offsets. This means with large gap in address space ( system ram above 1PB) we will run out of vmalloc space to map the percpu allocation. In order to support addressable memory above 512TB, kernel should be able to linear map this range. To do that with hash translation we now add 4 context to kernel linear map region. Our per context addressable range is 512TB. We still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated to validate these limit. We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
We will be adding get_kernel_context later. Update function name to indicate this handle context allocation user space address. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
slb_flush_and_rebolt() is misleading, it is called in virtual mode, so it can not possibly change the stack, so it should not be touching the shadow area. And since vmalloc is no longer bolted, it should not change any bolted mappings at all. Change the name to slb_flush_and_restore_bolted(), and have it just load the kernel stack from what's currently in the shadow SLB area. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
In the same spirit as already done in pte query helpers, this patch changes pte setting helpers to perform endian conversions on the constants rather than on the pte value. In the meantime, it changes pte_access_permitted() to use pte helpers for the same reason. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
__P and __S flags are the same for all platform and should remain as is in the future, so avoid duplication. Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
The following page flags in pte-common.h can be dropped: _PAGE_ENDIAN is only used in mm/fsl_booke_mmu.c and is defined in asm/nohash/32/pte-fsl-booke.h _PAGE_4K_PFN is nowhere defined nor used _PAGE_READ, _PAGE_WRITE and _PAGE_PTE are only defined and used in book3s/64 The following page flags in book3s/64/pgtable.h can be dropped as they are not used on this platform nor by common code. _PAGE_NA, _PAGE_RO, _PAGE_USER and _PAGE_PSIZE Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
In order to avoid using generic _PAGE_XXX flags in powerpc core functions, define helpers for all needed flags: - pte_mkuser() and pte_mkprivileged() to set/unset and/or unset/set _PAGE_USER and/or _PAGE_PRIVILEGED - pte_hashpte() to check if _PAGE_HASHPTE is set. - pte_ci() check if cache is inhibited (already existing on book3s/64) - pte_exprotect() to protect against execution - pte_exec() and pte_mkexec() to query and set page execution - pte_mkpte() to set _PAGE_PTE flag. - pte_hw_valid() to check _PAGE_PRESENT since pte_present does something different on book3s/64. On book3s/32 there is no exec protection, so pte_mkexec() and pte_exprotect() are nops and pte_exec() returns always true. Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
In order to avoid multiple conversions, handover directly a pgprot_t to map_kernel_page() as already done for radix. Do the same for __ioremap_caller() and __ioremap_at(). Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 03 10月, 2018 5 次提交
-
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
We need to make sure pmd_trans_huge returns false for a pmd migration entry. We mark the migration entry by clearing the _PAGE_PRESENT bit. We keep the _PAGE_PTE bit set to indicate a leaf page table entry. Hence we need to make sure we check for pmd_present() so that pmd_trans_huge won't return true on pmd migration entry. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
This make hugetlb directory pointer similar to other page able entries. A hugepd entry is identified by lack of _PAGE_PTE bit set and directory size stored in HUGEPD_SHIFT_MASK. We update that to also look at _PAGE_PRESENT Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid pgd/pud/pmd entry. We also switch the p**_present() to look at this bit. With pmd_present, we have a special case. We need to make sure we consider a pmd marked invalid during THP split as present. Right now we clear the _PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit for this special case. pmd_present is also updated to look at _PAGE_INVALID. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
This reverts commits: 5e46e29e ("powerpc/64s/hash: convert SLB miss handlers to C") 8fed04d0 ("powerpc/64s/hash: remove user SLB data from the paca") 655deecf ("powerpc/64s/hash: SLB allocation status bitmaps") 2e162674 ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup") 89ca4e12 ("powerpc/64s/hash: Add a SLB preload cache") This series had a few bugs, and the fixes are not all trivial. So revert most of it for now. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 19 9月, 2018 4 次提交
-
-
由 Nicholas Piggin 提交于
This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
User SLB mappig data is copied into the PACA from the mm->context so it can be accessed by the SLB miss handlers. After the C conversion, SLB miss handlers now run with relocation on, and user SLB misses are able to take recursive kernel SLB misses, so the user SLB mapping data can be removed from the paca and accessed directly. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Remove the vmalloc segment from bolted SLBEs. This is not required to be bolted, and seems like it was added to help pre-load the SLB on context switch. However there are now other segments like the vmemmap segment and non-zero node memory that often take misses after a context switch, so it is better to solve this in a more general way. A subsequent change will track free SLB entries and uses those rather than round-robin overwrite valid entries, which makes it far less likely for kernel SLBEs to be evicted after they are installed. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Mahesh Salgaonkar 提交于
If we get a machine check exceptions due to SLB errors then dump the current SLB contents which will be very much helpful in debugging the root cause of SLB errors. Introduce an exclusive buffer per cpu to hold faulty SLB entries. In real mode mce handler saves the old SLB contents into this buffer accessible through paca and print it out later in virtual mode. With this patch the console will log SLB contents like below on SLB MCE errors: [ 507.297236] SLB contents of cpu 0x1 [ 507.297237] Last SLB entry inserted at slot 16 [ 507.297238] 00 c000000008000000 400ea1b217000500 [ 507.297239] 1T ESID= c00000 VSID= ea1b217 LLP:100 [ 507.297240] 01 d000000008000000 400d43642f000510 [ 507.297242] 1T ESID= d00000 VSID= d43642f LLP:110 [ 507.297243] 11 f000000008000000 400a86c85f000500 [ 507.297244] 1T ESID= f00000 VSID= a86c85f LLP:100 [ 507.297245] 12 00007f0008000000 4008119624000d90 [ 507.297246] 1T ESID= 7f VSID= 8119624 LLP:110 [ 507.297247] 13 0000000018000000 00092885f5150d90 [ 507.297247] 256M ESID= 1 VSID= 92885f5150 LLP:110 [ 507.297248] 14 0000010008000000 4009e7cb50000d90 [ 507.297249] 1T ESID= 1 VSID= 9e7cb50 LLP:110 [ 507.297250] 15 d000000008000000 400d43642f000510 [ 507.297251] 1T ESID= d00000 VSID= d43642f LLP:110 [ 507.297252] 16 d000000008000000 400d43642f000510 [ 507.297253] 1T ESID= d00000 VSID= d43642f LLP:110 [ 507.297253] ---------------------------------- [ 507.297254] SLB cache ptr value = 3 [ 507.297254] Valid SLB cache entries: [ 507.297255] 00 EA[0-35]= 7f000 [ 507.297256] 01 EA[0-35]= 1 [ 507.297257] 02 EA[0-35]= 1000 [ 507.297257] Rest of SLB cache entries: [ 507.297258] 03 EA[0-35]= 7f000 [ 507.297258] 04 EA[0-35]= 1 [ 507.297259] 05 EA[0-35]= 1000 [ 507.297260] 06 EA[0-35]= 12 [ 507.297260] 07 EA[0-35]= 7f000 Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Suggested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 8月, 2018 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
When splitting a huge pmd pte, we need to mark the pmd entry invalid. We can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a swap pte. In order to differentiate between the two use a software pte bit when invalidating. For regular pte, due to bd5050e3 ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") we need to mark the pte entry invalid when relaxing access permission. Instead of marking pte_none which can result in different page table walk routines possibly skipping this pte entry, invalidate it but still keep it marked present. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 13 8月, 2018 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
Add statistics that show how memory is mapped within the kernel linear mapping. This is similar to commit 37cd944c ("s390/pgtable: add mapping statistics") We don't do this with Hash translation mode. Hash uses one size (mmu_linear_psize) to map the kernel linear mapping and we print the linear psize during boot as below. "Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24" A sample output looks like: DirectMap4k: 0 kB DirectMap64k: 18432 kB DirectMap2M: 1030144 kB DirectMap1G: 11534336 kB Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 10 8月, 2018 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
Avoid coverity false warnings like: *** CID 187347: Control flow issues (UNREACHABLE) /arch/powerpc/mm/hash_native_64.c: 819 in native_flush_hash_range() 813 slot += hidx & _PTEIDX_GROUP_IX; 814 hptep = htab_address + slot; 815 want_v = hpte_encode_avpn(vpn, psize, ssize); 816 hpte_v = hpte_get_old_v(hptep); 817 818 if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) >>> CID 187347: Control flow issues (UNREACHABLE) Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The machine check code that flushes and restores bolted segments in real mode belongs in mm/slb.c. This will also be used by pseries machine check and idle code in future changes. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 7月, 2018 2 次提交
-
-
由 Michael Ellerman 提交于
Paul Menzel reported that kmemleak was producing reports such as: unreferenced object 0xc0000000f8b80000 (size 16384): comm "init", pid 1, jiffies 4294937416 (age 312.240s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d997deb7>] __pud_alloc+0x80/0x190 [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0 [<00000000091e51c2>] shift_arg_pages+0xc0/0x210 [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0 [<0000000060871529>] load_elf_binary+0x41c/0x1648 [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280 [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940 [<000000005f953a6e>] sys_execve+0x58/0x70 [<000000009700a858>] system_call+0x5c/0x70 Indicating that a PUD was being leaked. However what's really happening is that kmemleak is not able to recognise the references from the PGD to the PUD, because they are not fully qualified pointers. We can confirm that in xmon, eg: Find the task struct for pid 1 "init": 0:mon> P task_struct ->thread.ksp PID PPID S P CMD c0000001fe7c0000 c0000001fe803960 1 0 S 13 systemd Dump virtual address 0 to find the PGD: 0:mon> dv 0 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 Dump the memory of the PGD: 0:mon> d c0000000f8b01000 c0000000f8b01000 00000000f8b90000 0000000000000000 |................| c0000000f8b01010 0000000000000000 0000000000000000 |................| c0000000f8b01020 0000000000000000 0000000000000000 |................| c0000000f8b01030 0000000000000000 00000000f8b80000 |................| ^^^^^^^^^^^^^^^^ There we can see the reference to our supposedly leaked PUD. But because it's missing the leading 0xc, kmemleak won't recognise it. We can confirm it's still in use by translating an address that is mapped via it: 0:mon> dv 7fff94000000 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <-- pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000 pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000 ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386 Maps physical address = 0x00000001d5ce0000 Flags = Accessed Dirty Read Write The fix is fairly simple. We need to tell kmemleak to ignore PUD allocations and never report them as leaks. We can also tell it not to scan the PGD, because it will never find pointers in there. However it will still notice if we allocate a PGD and then leak it. Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
This patch moves ASM_CONST() and stringify_in_c() into dedicated asm-const.h, then cleans all related inclusions. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> [mpe: asm-compat.h should include asm-const.h] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 24 7月, 2018 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
No functional change Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 16 7月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NMichael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 6月, 2018 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With 4k page size for hugetlb we allocate hugepage directories from its on slab cache. With patch 0c4d2680 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free") we missed to free these allocated hugepd tables. Update pgtable_free to handle hugetlb hugepd directory table. Fixes: 0c4d2680 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Add CONFIG_HUGETLB_PAGE guard to fix build break] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 6月, 2018 1 次提交
-
-
由 Laurent Dufour 提交于
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: NJerome Glisse <jglisse@redhat.com> Reviewed-by: NJerome Glisse <jglisse@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <albert@sifive.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 6月, 2018 6 次提交
-
-
由 Nicholas Piggin 提交于
Implementing pte_update with pte_xchg (which uses cmpxchg) is inefficient. A single larx/stcx. works fine, no need for the less efficient cmpxchg sequence. Then remove the memory barriers from the operation. There is a requirement for TLB flushing to load mm_cpumask after the store that reduces pte permissions, which is moved into the TLB flush code. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The ISA suggests ptesync after setting a pte, to prevent a table walk initiated by a subsequent access from missing that store and causing a spurious fault. This is an architectual allowance that allows an implementation's page table walker to be incoherent with the store queue. However there is no correctness problem in taking a spurious fault in userspace -- the kernel copes with these at any time, so the updated pte will be found eventually. Spurious kernel faults on vmap memory must be avoided, so a ptesync is put into flush_cache_vmap. On POWER9 so far I have not found a measurable window where this can result in more minor faults, so as an optimisation, remove the costly ptesync from pte updates. If an implementation benefits from ptesync, it would be better to add it back in update_mmu_cache, so it's not done for things like fork(2). fork --fork --exec benchmark improved 5.2% (12400->13100). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This matches other architectures, when we know there will be no further accesses to the address (e.g., for teardown), page table entries can be cleared non-atomically. The comments about NMMU are bogus: all MMU notifiers (including NMMU) are released at this point, with their TLBs flushed. An NMMU access at this point would be a bug. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
In the case of a spurious fault (which can happen due to a race with another thread that changes the page table), the default Linux mm code calls flush_tlb_page for that address. This is not required because the pte will be re-fetched. Hash does not wire this up to a hardware TLB flush for this reason. This patch avoids the flush for radix. >From Power ISA v3.0B, p.1090: Setting a Reference or Change Bit or Upgrading Access Authority (PTE Subject to Atomic Hardware Updates) If the only change being made to a valid PTE that is subject to atomic hardware updates is to set the Refer- ence or Change bit to 1 or to add access authorities, a simpler sequence suffices because the translation hardware will refetch the PTE if an access is attempted for which the only problems were reference and/or change bits needing to be set or insufficient access authority. The nest MMU on POWER9 does not re-fetch the PTE after such an access attempt before faulting, so address spaces with a coprocessor attached will continue to flush in these cases. This reduces tlbies for a kernel compile workload from 0.95M to 0.90M. fork --fork --exec benchmark improved 0.5% (12300->12400). Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
In later patch, we use the vma and psize to do tlb flush. Do the prototype update in separate patch to make the review easy. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
In later patch we will update them which require them to be moved to pgtable-radix.c. Keeping the function in radix.h results in compile warning as below. ./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’: ./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’ struct mm_struct *mm = vma->vm_mm; ^~ ./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration] atomic_read(&mm->context.copros) > 0) { ^~~~~~~~~~~ __atomic_load ./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’ atomic_read(&mm->context.copros) > 0) { Instead of fixing header dependencies, we move the function to pgtable-radix.c Also the function is now large to be a static inline . Doing the move in separate patch helps in review. No functional change in this patch. Only code movement. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 5月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
Implement a local TLB flush for invalidating an LPID with variants for process or partition scope. And a global TLB flush for invalidating a partition scoped page of an LPID. These will be used by KVM in subsequent patches. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 15 5月, 2018 4 次提交
-
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Instead of encoding shift in the table address, use an enumerated index value. This allow us to do different things in the callback for pte and pmd. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
4K config use one full page at level 4 of the pagetable. Add support for single fragment allocation in pagetable fragment code and and use that for 4K config. This makes both 4k and 64k use the same code path. Later we will switch pmd to use the page table fragment code. This is done only for 64bit platforms which is using page table fragment support. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 04 4月, 2018 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With split PTL (page table lock) config, we allocate the level 4 (leaf) page table using pte fragment framework instead of slab cache like other levels. This was done to enable us to have split page table lock at the level 4 of the page table. We use page->plt backing the all the level 4 pte fragment for the lock. Currently with Radix, we use only 16 fragments out of the allocated page. In radix each fragment is 256 bytes which means we use only 4k out of the allocated 64K page wasting 60k of the allocated memory. This was done earlier to keep it closer to hash. This patch update the pte fragment count to 256, thereby using the full 64K page and reducing the memory usage. Performance tests shows really low impact even with THP disabled. With THP disabled we will be contenting further less on level 4 ptl and hence the impact should be further low. 256 threads: without patch (10 runs of ./ebizzy -m -n 1000 -s 131072 -S 100) median = 15678.5 stdev = 42.1209 with patch: median = 15354 stdev = 194.743 This is with THP disabled. With THP enabled the impact of the patch will be less. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-