- 17 7月, 2016 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
As per ISA, we need to do this only for architecture version 2.02 and earlier. This continued to work even for 2.07. But let's not do this for anything after 2.02. ISA 3.0 requires these top bits to be not cleared. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
PowerISA 3.0 requires the MMU mode (radix vs. hash) of the hypervisor to be mirrored in the LPCR register, in addition to the partition table. This is done to avoid fetching from the table when deciding, among other things, how to perform transitions to HV mode on some interrupts. So let's set it up appropriately Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Balbir Singh 提交于
The .longs with the shifts are harder to read, use more meaningful names for the opcodes. PPC_TLBIE_5 is introduced for the 5 opcode variation of the instruction due to an existing op-code for the 2 opcode variant. Signed-off-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 07 7月, 2016 2 次提交
-
-
由 Michael Ellerman 提交于
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Benjamin Herrenschmidt 提交于
For memory hotplug to work, the MMU code needs to provide the functions create_section_mapping() and remove_section_mapping() to respectively map and unmap portions of the linear mapping. At the moment only hash64 provides these, so we provide weak stubs that just error out. This fixes the build with configurations such as 64-bit BookE with CONFIG_MEMORY_HOTPLUG enabled. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 05 7月, 2016 1 次提交
-
-
由 Oliver O'Halloran 提交于
This patch adds the kernel command line parameter "no_tb_segs" which forces the kernel to use 256MB rather than 1TB segments. Forcing the use of 256MB segments makes it considerably easier to test code that depends on an SLB miss occurring. Suggested-by: NMichael Neuling <mikey@neuling.org> Suggested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 6月, 2016 1 次提交
-
-
由 Darren Stevens 提交于
Commit d6a9996e ("powerpc/mm: vmalloc abstraction in preparation for radix") turned kernel memory and IO addresses from #defined constants to variables initialised at runtime. On PA6T (pasemi) systems the setup_arch() machine call initialises the onboard PCI-e root-ports, and uses pci_io_base to do this, which is now before its value has been set, resulting in a panic early in boot before console IO is initialised. Move the pci_io_base initialisation to the same place as vmalloc ranges are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the earliest possible place we can initialise it. Fixes: d6a9996e ("powerpc/mm: vmalloc abstraction in preparation for radix") Reported-by: NChristian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: NDarren Stevens <darren@stevens-zone.net> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Add #ifdef CONFIG_PCI, massage change log slightly] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 6月, 2016 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
ISA 3.0 updated it to be encoded as Radix tree size = 2^(RTS + 31). We have it encoded as 2^(RTS + 28). Add a helper with the correct encoding and use it instead of opencoding. Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines") Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
H_ENTER hcall handling in qemu had assumptions that a cache inhibited hpte entry won't have memory conference set. Also older kernel mentioned that some version of pHyp required this (the code removed by the below commit says: /* Make pHyp happy */ if ((rflags & _PAGE_NO_CACHE) && !(rflags & _PAGE_WRITETHRU)) hpte_r &= ~HPTE_R_M; But with older kernel we had some inconsistent memory conherence mapping. We always enabled memory conherence in the page fault path and removed memory conherence is _PAGE_NO_CACHE was set when we mapped the page via htab_bolt_mapping. The commit mentioned below tried to consolidate that by always enabling memory conherence. But as mentioned above that breaks Qemu H_ENTER handling. This patch update this such that we enable memory conherence only if cache inhibited is not set and bring fault handling, lpar and bolt mapping in sync. Fixes: commit 30bda41a("powerpc/mm: Drop WIMG in favour of new constant") Reported-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 16 6月, 2016 1 次提交
-
-
由 Oliver O'Halloran 提交于
The mm zone mechanism was traditionally used by arch specific code to partition memory into allocation zones. However there are several zones that are managed by the mm subsystem rather than the architecture. Most architectures set the max PFN of these special zones to zero, however on powerpc we set them to ~0ul. This, in conjunction with a bug in free_area_init_nodes() results in all of system memory being placed in ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel memory allocations so this will cause a kernel panic at boot. Given the planned addition of more mm managed zones (ZONE_CMA) we should aim to be consistent with every other architecture and set the max PFN for these zones to zero. Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 6月, 2016 4 次提交
-
-
由 Bharata B Rao 提交于
memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum addressable memory by referring to ibm,dyanamic-memory property. There are three problems with the current approach: 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes all the LMBs of the guest, but that is not true for PowerKVM which populates only DR LMBs (LMBs that can be hotplugged/removed) in that property. 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive at the max possible address. Since ibm,dynamic-memory doesn't include RMA LMBs, the address thus obtained will be less than the actual max address. For example, if max possible memory size is 32G, with lmb-size of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA which won't be present here). hot_add_drconf_memory_max() would then return the max addressable memory as 127 * 256MB = 31.75GB, the max address should have been 32G which is what ibm,lrdr-capacity shows. 3 In PowerKVM, there can be a gap between the end of boot time RAM and beginning of hotplug RAM area. So just multiplying lmb-count with lmb-size will not provide the correct max possible address for PowerKVM. This patch fixes 1 by using ibm,lrdr-capacity property to return the max addressable memory whenever the property is present. Then it fixes 2 & 3 by fetching the address of the last LMB in ibm,dynamic-memory property. Fixes: cd34206e ("powerpc: Add memory_hotplug_max()") Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bharata B Rao 提交于
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
With commit e58e87ad "powerpc/mm: Update _PAGE_KERNEL_RO" we now use all the three PPP bits. The top bit is now used to have a PPP value of 0b110 which will be mapped to kernel read only. When updating the hpte entry use right mask such that we update the 63rd bit (top 'P' bit) too. Prior to e58e87ad we didn't support KERNEL_RO at all (it was == KERNEL_RW), so this isn't a regression as such. Fixes: e58e87ad ("powerpc/mm: Update _PAGE_KERNEL_RO") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 10 6月, 2016 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
Even though a tlb_flush() does a flush with invalidate all cache, we can end up doing an RCU page table free before calling tlb_flush(). That means we can have page walk cache entries even after we free the page table pages. This can result in us doing wrong page table walk. Avoid this by doing pwc flush on every page table free. We can't batch the pwc flush, because the rcu call back function where we free the page table pages doesn't have information of the mmu gather. Thus we have to do a pwc on every page table page freed. Note: I also removed the dummy tlb_flush_pgtable call functions for hash 32. Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Radix invalidate control (RIC) is used to control which cache to flush using tlb instructions. When doing a PID flush, we currently flush everything including page walk cache. For address range flush, we flush only the TLB. In the next patch, we add support for flushing only the page walk cache. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 6月, 2016 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
PowerISA 3.0 encodes the segment size in the second half of hash page table entry. Update hpte_decode() accordingly. Fixes: 50de596d ("powerpc/mm/hash: Add support for Power9 Hash") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
In some of the radix TLB flush routines, we use a local to store the mm->context.id, AKA the PID. Currently we use an int, but the PID is unsigned long, so large values of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which means all our comparisons against that value can never be true. This means we'll issue TLB flushes when we shouldn't on radix enabled machines. Fix it by using an unsigned long for the local. Discovered by Coverity. Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> [mpe: Write change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 6月, 2016 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
This should not have any impact on hash, because hash does tlb invalidate with every pte update and we don't implement flush_tlb_* functions for hash. With radix we should make an explicit call to flush tlb outside pte update. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
When we converted the asm routines to C functions, we missed updating HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower bits from pte via. andi. r3,r30,0x1fe /* Get basic set of flags */ We also update the code such that we won't update the Change bit ('C' bit) always. This was added by commit c5cf0e30 ("powerpc: Fix buglet with MMU hash management"). With hash64, we need to make sure that hardware doesn't do a pte update directly. This is because we do end up with entries in TLB with no hash page table entry. This happens because when we find a hash bucket full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". For more info look at commit 0608d692("powerpc/mm: Always invalidate tlb on hpte invalidate and update"). Thus it's critical that valid hash PTEs always have reference bit set and writeable ones have change bit set. We do this by hashing a non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and thus R) when hashing anything else in. Any attempt by Linux at clearing those bits also removes the corresponding hash entry. Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always. We don't really need to do that because we never map a RW pte entry without setting 'C' bit. On READ fault on a RW pte entry, we still map it READ only, hence a store update in the page will still cause a hash pte fault. This patch reverts the part of commit c5cf0e30 ("[PATCH] powerpc: Fix buglet with MMU hash management") and retain the updatepp part. - If we hit the updatepp path on native, the old code without that commit, would fail to set C bcause native_hpte_updatepp() was implemented to filter the same bits as H_PROTECT and not let C through thus we would "upgrade" a RO HPTE to RW without setting C thus causing the bug. So the real fix in that commit was the change to native_hpte_updatepp Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
LPCR cannot be updated when running in guest mode. Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 5月, 2016 1 次提交
-
-
由 Vaishali Thakkar 提交于
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported hugepage size is found. Signed-off-by: NVaishali Thakkar <vaishali.thakkar@oracle.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 5月, 2016 18 次提交
-
-
由 Gavin Shan 提交于
The function is used to update the MMU with software PTE. It can be called by data access exception handler (0x300) or instruction access exception handler (0x400). If the function is called by 0x400 handler, the local variable @access is set to _PAGE_EXEC to indicate the software PTE should have that flag set. When the function is called by 0x300 handler, @access is set to zero. This improves the readability of the function by replacing if statements with switch. No logical changes introduced. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Oliver O'Halloran 提交于
The zone that contains the top of memory will be either ZONE_NORMAL or ZONE_HIGHMEM depending on the kernel config. There are two functions that require this information and both of them use an #ifdef to set a local variable (top_zone). This is a little silly so lets just make it a constant. Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Cc: linux-mm@kvack.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
With Linux page size of 64K and hardware only supporting 4K HPTE, if we use subpage protection, we always fail for the subpage 0 as shown below (using the selftest subpage_prot test): 520175565: (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass ! 4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass ! 4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass ! 4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass ! This is because hash preload wrongly inserts the HPTE entry for subpage 0 without looking at the subpage protection information. Fix it by teaching should_hash_preload() not to preload if we have subpage protection configured for that range. It appears this has been broken since it was introduced in 2008. Fixes: fa28237c ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Currently we have a check in hash_preload() against the psize, which is only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand this check in a subsequent patch, so factor it out to allow that. As a bonus it removes the #ifdef in the C code. Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES block because it would require a forward declaration. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
The usage in mm mmu_context_nohash.c is bogus, because we set the context.id value to MMU_NO_CONTEXT 4 lines previously in the same function, meaning slice_mm_new_context() will always be true. The book3s 64 usage was removed in the previous commit. So remove it as unused. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
As part of the radix support we switched Book3s64 to use a value of ~0 for MMU_NO_CONTEXT. That is because id 0 is special on radix. However that broke the logic in init_new_context(). The code there needs to differentiate between a newly allocated context and one inherited via fork. Previously it worked because a newly allocated context has an id of zero (because it was just memset() to zero), which used to match MMU_NO_CONTEXT, and therefore slice_mm_new_context() did the right thing. Instead check against a context.id value of zero instead of using slice_mm_new_context(). Without this patch we never call slice_set_user_psize(), and end up with a slice psize value of zero and we always end up using 4K HPTE. Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
The deposited pgtable_t is a pte fragment hence we cannot use page->lru for linking then together. We use the first two 64 bits for pte fragment as list_head type to link all deposited fragments together. On withdraw we properly zero then out. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Only code movement in this patch. No functionality change. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Radix doesn't use the slice framework to find the page size. Hence use vma to find the page size. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
In this patch we make the number of pte fragments per level 4 page table page a variable. Radix level 4 table size is 256 bytes and hence we can have 256 fragments per level 4 page. We don't update the fragment count in this patch. We need to do performance measurements to find the right value for fragment count. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
With radix there is no MMU cache. Hence we don't need to do anything in update_mmu_cache(). Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
The vmalloc range differs between hash and radix config. Hence make VMALLOC_START and related constants a variable which will be runtime initialized depending on whether hash or radix mode is active. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix missing init of ioremap_bot in pgtable_64.c for ppc64e] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
This patch switches 4K Linux page size config to use pte_t * type instead of struct page * for pgtable_t. This simplifies the code a lot and helps in consolidating both 64K and 4K page allocator routines. The changes should not have any impact, because we already store physical address in the upper level page table tree and that implies we already do struct page * to physical address conversion. One change to note here is we move the pgtable_page_dtor() call for nohash to pte_fragment_free_mm(). The nohash related change is due to the related changes in pgtable_64.c. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Only code cleanup. No functionality change. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-