1. 03 6月, 2018 1 次提交
    • A
      powerpc/mm/hugetlb: Update hugetlb related locks · ed515b68
      Aneesh Kumar K.V 提交于
      With split pmd page table lock enabled, we don't use mm->page_table_lock when
      updating pmd entries. This patch update hugetlb path to use the right lock
      when inserting huge page directory entries into page table.
      
      ex: if we are using hugepd and inserting hugepd entry at the pmd level, we
      use pmd_lockptr, which based on config can be split pmd lock.
      
      For update huge page directory entries itself we use mm->page_table_lock. We
      do have a helper huge_pte_lockptr() for that.
      
      Fixes: 675d9952 ("powerpc/book3s64: Enable split pmd ptlock")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ed515b68
  2. 03 5月, 2018 1 次提交
    • H
      powerpc/fadump: Do not use hugepages when fadump is active · 85975387
      Hari Bathini 提交于
      FADump capture kernel boots in restricted memory environment preserving
      the context of previous kernel to save vmcore. Supporting hugepages in
      such environment makes things unnecessarily complicated, as hugepages
      need memory set aside for them. This means most of the capture kernel's
      memory is used in supporting hugepages. In most cases, this results in
      out-of-memory issues while booting FADump capture kernel. But hugepages
      are not of much use in capture kernel whose only job is to save vmcore.
      So, disabling hugepages support, when fadump is active, is a reliable
      solution for the out of memory issues. Introducing a flag variable to
      disable HugeTLB support when fadump is active.
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85975387
  3. 06 4月, 2018 1 次提交
    • D
      mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize() · 09135cc5
      Dan Williams 提交于
      Patch series "mm, smaps: MMUPageSize for device-dax", v3.
      
      Similar to commit 31383c68 ("mm, hugetlbfs: introduce ->split() to
      vm_operations_struct") here is another occasion where we want
      special-case hugetlbfs/hstate enabling to also apply to device-dax.
      
      This prompts the question what other hstate conversions we might do
      beyond ->split() and ->pagesize(), but this appears to be the last of
      the usages of hstate_vma() in generic/non-hugetlbfs specific code paths.
      
      This patch (of 3):
      
      The current powerpc definition of vma_mmu_pagesize() open codes looking
      up the page size via hstate.  It is identical to the generic
      vma_kernel_pagesize() implementation.
      
      Now, vma_kernel_pagesize() is growing support for determining the page
      size of Device-DAX vmas in addition to the existing Hugetlbfs page size
      determination.
      
      Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it
      would automatically benefit from any new vma-type support that is added
      to vma_kernel_pagesize().  However, the powerpc vma_mmu_pagesize() is
      prevented from calling vma_kernel_pagesize() due to a circular header
      dependency that requires vma_mmu_pagesize() to be defined before
      including <linux/hugetlb.h>.
      
      Break this circular dependency by defining the default vma_mmu_pagesize()
      as a __weak symbol to be overridden by the powerpc version.
      
      Link: http://lkml.kernel.org/r/151996254179.27922.2213728278535578744.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Jane Chu <jane.chu@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09135cc5
  4. 04 4月, 2018 1 次提交
  5. 13 3月, 2018 1 次提交
  6. 06 3月, 2018 1 次提交
    • C
      powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx · aa0ab02b
      Christophe Leroy 提交于
      On the 8xx, the page size is set in the PMD entry and applies to
      all pages of the page table pointed by the said PMD entry.
      
      When an app has some regular pages allocated (e.g. see below) and tries
      to mmap() a huge page at a hint address covered by the same PMD entry,
      the kernel accepts the hint allthough the 8xx cannot handle different
      page sizes in the same PMD entry.
      
      10000000-10001000 r-xp 00000000 00:0f 2597 /root/malloc
      10010000-10011000 rwxp 00000000 00:0f 2597 /root/malloc
      
      mmap(0x10080000, 524288, PROT_READ|PROT_WRITE,
           MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x10080000
      
      This results the app remaining forever in do_page_fault()/hugetlb_fault()
      and when interrupting that app, we get the following warning:
      
      [162980.035629] WARNING: CPU: 0 PID: 2777 at arch/powerpc/mm/hugetlbpage.c:354 hugetlb_free_pgd_range+0xc8/0x1e4
      [162980.035699] CPU: 0 PID: 2777 Comm: malloc Tainted: G W       4.14.6 #85
      [162980.035744] task: c67e2c00 task.stack: c668e000
      [162980.035783] NIP:  c000fe18 LR: c00e1eec CTR: c00f90c0
      [162980.035830] REGS: c668fc20 TRAP: 0700   Tainted: G W        (4.14.6)
      [162980.035854] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24044224 XER: 20000000
      [162980.036003]
      [162980.036003] GPR00: c00e1eec c668fcd0 c67e2c00 00000010 c6869410 10080000 00000000 77fb4000
      [162980.036003] GPR08: ffff0001 0683c001 00000000 ffffff80 44028228 10018a34 00004008 418004fc
      [162980.036003] GPR16: c668e000 00040100 c668e000 c06c0000 c668fe78 c668e000 c6835ba0 c668fd48
      [162980.036003] GPR24: 00000000 73ffffff 74000000 00000001 77fb4000 100fffff 10100000 10100000
      [162980.036743] NIP [c000fe18] hugetlb_free_pgd_range+0xc8/0x1e4
      [162980.036839] LR [c00e1eec] free_pgtables+0x12c/0x150
      [162980.036861] Call Trace:
      [162980.036939] [c668fcd0] [c00f0774] unlink_anon_vmas+0x1c4/0x214 (unreliable)
      [162980.037040] [c668fd10] [c00e1eec] free_pgtables+0x12c/0x150
      [162980.037118] [c668fd40] [c00eabac] exit_mmap+0xe8/0x1b4
      [162980.037210] [c668fda0] [c0019710] mmput.part.9+0x20/0xd8
      [162980.037301] [c668fdb0] [c001ecb0] do_exit+0x1f0/0x93c
      [162980.037386] [c668fe00] [c001f478] do_group_exit+0x40/0xcc
      [162980.037479] [c668fe10] [c002a76c] get_signal+0x47c/0x614
      [162980.037570] [c668fe70] [c0007840] do_signal+0x54/0x244
      [162980.037654] [c668ff30] [c0007ae8] do_notify_resume+0x34/0x88
      [162980.037744] [c668ff40] [c000dae8] do_user_signal+0x74/0xc4
      [162980.037781] Instruction dump:
      [162980.037821] 7fdff378 81370000 54a3463a 80890020 7d24182e 7c841a14 712a0004 4082ff94
      [162980.038014] 2f890000 419e0010 712a0ff0 408200e0 <0fe00000> 54a9000a 7f984840 419d0094
      [162980.038216] ---[ end trace c0ceeca8e7a5800a ]---
      [162980.038754] BUG: non-zero nr_ptes on freeing mm: 1
      [162985.363322] BUG: non-zero nr_ptes on freeing mm: -1
      
      In order to fix this, this patch uses the address space "slices"
      implemented for BOOK3S/64 and enhanced to support PPC32 by the
      preceding patch.
      
      This patch modifies the context.id on the 8xx to be in the range
      [1:16] instead of [0:15] in order to identify context.id == 0 as
      not initialised contexts as done on BOOK3S
      
      This patch activates CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE is
      selected for the 8xx
      
      Alltough we could in theory have as many slices as PMD entries, the
      current slices implementation limits the number of low slices to 16.
      This limitation is not preventing us to fix the initial issue allthough
      it is suboptimal. It will be cured in a subsequent patch.
      
      Fixes: 4b914286 ("powerpc/8xx: Implement support of hugepages")
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aa0ab02b
  7. 19 1月, 2018 2 次提交
  8. 16 1月, 2018 1 次提交
    • C
      powerpc/8xx: Remove _PAGE_USER and handle user access at PMD level · de0f9387
      Christophe Leroy 提交于
      As Linux kernel separates KERNEL and USER address spaces, there is
      therefore no need to flag USER access at page level.
      
      Today, the 8xx TLB handlers already handle user access in the L1 entry
      through Access Protection Groups, it is then natural to move the user
      access handling at PMD level once _PAGE_NA allows to handle PAGE_NONE
      protection without _PAGE_USER
      
      In the mean time, as we free up one bit in the PTE, we can use it to
      include SPS (page size flag) in the PTE and avoid handling it at every
      TLB miss hence removing special handling based on compiled page size.
      
      For _PAGE_EXEC, we rework it to use PP PTE bits, avoiding the copy
      of _PAGE_EXEC bit into the L1 entry. Unfortunatly we are not
      able to put it at the correct location as it conflicts with
      NA/RO/RW bits for data entries.
      
      Upper bits of APG in L1 entry overlap with PMD base address. In
      order to avoid having to filter that out, we set up all groups so that
      upper bits can have any value.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de0f9387
  9. 22 12月, 2017 1 次提交
  10. 16 11月, 2017 1 次提交
  11. 23 8月, 2017 1 次提交
  12. 17 8月, 2017 1 次提交
    • A
      powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19
      Aneesh Kumar K.V 提交于
      Add newer helpers to make the function usage simpler. It is always
      recommended to use find_current_mm_pte() for walking the page table.
      If we cannot use find_current_mm_pte(), it should be documented why
      the said usage of __find_linux_pte() is safe against a parallel THP
      split.
      
      For now we have KVM code using __find_linux_pte(). This is because kvm
      code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
      with PACA soft_enabled = 1. We may want to fix that later and make
      sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
      we can switch kvm to use find_linux_pte().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94171b19
  13. 16 8月, 2017 1 次提交
    • A
      powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line · 79cc38de
      Aneesh Kumar K.V 提交于
      With commit aa888a74 ("hugetlb: support larger than MAX_ORDER") we added
      support for allocating gigantic hugepages via kernel command line. Switch
      ppc64 arch specific code to use that.
      
      W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE.
      
      We use the kernel command line to do reservation of hugetlb pages on powernv
      platforms. On pseries hash mmu mode the supported gigantic huge page size is
      16GB and that can only be allocated with hypervisor assist. For pseries the
      command line option doesn't do the allocation. Instead pseries does gigantic
      hugepage allocation based on hypervisor hint that is specified via
      "ibm,expected#pages" property of the memory node.
      
      Cc: Scott Wood <oss@buserror.net>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      79cc38de
  14. 15 8月, 2017 1 次提交
    • C
      powerpc/hugetlb: fix page rights verification in gup_hugepte() · ca8afd40
      Christophe Leroy 提交于
      gup_hugepte() checks if pages are present and readable, and
      when  'write' is set, also checks if the pages are writable.
      
      Initially this was done by checking if _PAGE_PRESENT and
      _PAGE_READ were set. In addition, _PAGE_WRITE was verified for write
      accesses.
      
      The problem is that we have to handle the three following cases:
      1/ The target defines __PAGE_READ and __PAGE_WRITE
      2/ The target defines __PAGE_RW
      3/ The target defines __PAGE_RO
      
      In case 1/, this is obvious
      In case 2/, __PAGE_READ is defined as 0 and __PAGE_WRITE as __PAGE_RW
      so it works as well.
      But in case 3, __PAGE_RW is defined as 0, which means __PAGE_WRITE is 0
      and then the test returns true (page writable) in all cases.
      
      A first correction was attempted in commit 6b8cb66a ("powerpc: Fix
      usage of _PAGE_RO in hugepage"), but that fix is wrong:
      instead of checking that the page is writable when write is requested,
      it checks that the page is NOT writable when write is NOT requested.
      
      This patch adds a new pte_read() helper to check whether a page is
      readable or not. This avoids handling all possible cases in
      gup_hugepte().
      
      Then gup_hugepte() is modified to use pte_present(), pte_read()
      and pte_write() instead of the raw flags.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca8afd40
  15. 07 7月, 2017 4 次提交
  16. 02 7月, 2017 2 次提交
  17. 05 6月, 2017 1 次提交
  18. 31 3月, 2017 1 次提交
    • A
      powerpc/mm/hugetlb: Filter out hugepage size not supported by page table layout · a525108c
      Aneesh Kumar K.V 提交于
      Without this if firmware reports 1MB page size support we will crash
      trying to use 1MB as hugetlb page size.
      
      echo 300 > /sys/kernel/mm/hugepages/hugepages-1024kB/nr_hugepages
      
      kernel BUG at ./arch/powerpc/include/asm/hugetlb.h:19!
      .....
      ....
      [c0000000e2c27b30] c00000000029dae8 .hugetlb_fault+0x638/0xda0
      [c0000000e2c27c30] c00000000026fb64 .handle_mm_fault+0x844/0x1d70
      [c0000000e2c27d70] c00000000004805c .do_page_fault+0x3dc/0x7c0
      [c0000000e2c27e30] c00000000000ac98 handle_page_fault+0x10/0x30
      
      With fix, we don't enable 1MB as hugepage size.
      
      bash-4.2# cd /sys/kernel/mm/hugepages/
      bash-4.2# ls
      hugepages-16384kB  hugepages-16777216kB
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a525108c
  19. 18 1月, 2017 3 次提交
  20. 10 12月, 2016 2 次提交
    • C
      powerpc/8xx: Implement support of hugepages · 4b914286
      Christophe Leroy 提交于
      8xx uses a two level page table with two different linux page size
      support (4k and 16k). 8xx also support two different hugepage sizes
      512k and 8M. In order to support them on linux we define two different
      page table layout.
      
      The size of pages is in the PGD entry, using PS field (bits 28-29):
      00 : Small pages (4k or 16k)
      01 : 512k pages
      10 : reserved
      11 : 8M pages
      
      For 512K hugepage size a pgd entry have the below format
      [<hugepte address >0101] . The hugepte table allocated will contain 8
      entries pointing to 512K huge pte in 4k pages mode and 64 entries in
      16k pages mode.
      
      For 8M in 16k mode, a pgd entry have the below format
      [<hugepte address >1101] . The hugepte table allocated will contain 8
      entries pointing to 8M huge pte.
      
      For 8M in 4k mode, multiple pgd entries point to the same hugepte
      address and pgd entry will have the below format
      [<hugepte address>1101]. The hugepte table allocated will only have one
      entry.
      
      For the time being, we do not support CPU15 ERRATA when HUGETLB is
      selected
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3, for the generic bits)
      Signed-off-by: NScott Wood <oss@buserror.net>
      4b914286
    • C
      powerpc: get hugetlbpage handling more generic · 03bb2d65
      Christophe Leroy 提交于
      Today there are two implementations of hugetlbpages which are managed
      by exclusive #ifdefs:
      * FSL_BOOKE: several directory entries points to the same single hugepage
      * BOOK3S: one upper level directory entry points to a table of hugepages
      
      In preparation of implementation of hugepage support on the 8xx, we
      need a mix of the two above solutions, because the 8xx needs both cases
      depending on the size of pages:
      * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
      that 2 PGD entries will be necessary to cover an 8M hugepage while a
      single PGD entry will cover 8x 512k hugepages.
      * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
      that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
      hugepages will be covers by one PGD entry.
      
      This patch:
      * removes #ifdefs in favor of if/else based on the range sizes
      * merges the two huge_pte_alloc() functions as they are pretty similar
      * merges the two hugetlbpage_init() functions as they are pretty similar
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3)
      Signed-off-by: NScott Wood <oss@buserror.net>
      03bb2d65
  21. 23 9月, 2016 1 次提交
  22. 21 7月, 2016 1 次提交
  23. 25 6月, 2016 1 次提交
  24. 20 5月, 2016 1 次提交
  25. 11 5月, 2016 2 次提交
  26. 01 5月, 2016 2 次提交
  27. 29 3月, 2016 1 次提交
    • S
      powerpc/mm: Fixup preempt underflow with huge pages · 08a5bb29
      Sebastian Siewior 提交于
      hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
      accessing the variable did not migrate from one CPU to another and soon
      this was noticed by Tiejun Chen in 94b09d75 ("powerpc/hugetlb:
      Replace __get_cpu_var with get_cpu_var"). So we had it fixed.
      
      Christoph Lameter was doing his __get_cpu_var() replaces and forgot
      PowerPC. Then he noticed this and sent his fixed up batch again which
      got applied as 69111bac ("powerpc: Replace __get_cpu_var uses").
      
      The careful reader will noticed one little detail: get_cpu_var() got
      replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
      a preempt_enable() and nothing that does preempt_disable() so we
      underflow the preempt counter.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      08a5bb29
  28. 29 2月, 2016 1 次提交
  29. 16 1月, 2016 2 次提交