1. 05 5月, 2022 2 次提交
  2. 12 2月, 2022 1 次提交
    • A
      powerpc/mm: Update default hugetlb size early · 2354ad25
      Aneesh Kumar K.V 提交于
      commit: d9c23400 ("Do not depend on MAX_ORDER when grouping pages by mobility")
      introduced pageblock_order which will be used to group pages better.
      The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT
      should be set before we call set_pageblock_order.
      
      set_pageblock_order happens early in the boot and default hugetlb page size
      should be initialized before that to compute the right pageblock_order value.
      
      Currently, default hugetlbe page size is set via arch_initcalls which happens
      late in the boot as shown via the below callstack:
      
      [c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8
      [c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320
      [c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8
      [c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c
      [c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64
      
      and the pageblock_order initialization is done early during the boot.
      
      [c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64
      [c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268
      [c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328
      [c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480
      [c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934
      [c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98
      
      delaying default hugetlb page size initialization implies the kernel will
      initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal
      value for mobility grouping. IIUC we always had this issue. But it was not
      a problem for hash translation mode because (MAX_ORDER - 1) is the same as
      HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix,
      HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be
      5 instead of 8.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220211065215.101767-1-aneesh.kumar@linux.ibm.com
      2354ad25
  3. 09 12月, 2021 1 次提交
  4. 07 11月, 2021 1 次提交
  5. 06 5月, 2021 1 次提交
    • P
      hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share() · aec44e0f
      Peter Xu 提交于
      Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4.
      
      This series tries to disable huge pmd unshare of hugetlbfs backed memory
      for uffd-wp.  Although uffd-wp of hugetlbfs is still during rfc stage,
      the idea of this series may be needed for multiple tasks (Axel's uffd
      minor fault series, and Mike's soft dirty series), so I picked it out
      from the larger series.
      
      This patch (of 4):
      
      It is a preparation work to be able to behave differently in the per
      architecture huge_pte_alloc() according to different VMA attributes.
      
      Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call.
      
      [peterx@redhat.com: build fix]
        Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com
      
      Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Suggested-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aec44e0f
  6. 11 2月, 2021 1 次提交
  7. 30 1月, 2021 1 次提交
  8. 15 12月, 2020 1 次提交
    • C
      powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range() · 2198d493
      Christophe Leroy 提交于
      Commit 7bfe54b5 ("powerpc/mm: Refactor the floor/ceiling check in
      hugetlb range freeing functions") inadvertely removed the mask
      applied to start parameter in those two functions, leading to the
      following crash on power9.
      
        LTP: starting hugemmap05_1 (hugemmap05 -m)
        ------------[ cut here ]------------
        kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
        ...
        CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G           O      5.10.0-rc7-next-20201211 #1
        NIP:  c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000
        REGS: c00020000bb6f830 TRAP: 0700   Tainted: G           O       (5.10.0-rc7-next-20201211)
        MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002284  XER: 20040000
        GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0
        GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001
        GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002
        GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000
        GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10
        GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4
        GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000
        NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0
                               pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387
                               (inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405
        LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
        Call Trace:
          __tlb_remove_table+0x13c/0x1e0 (unreliable)
          tlb_remove_table_rcu+0x54/0xa0
          __tlb_remove_table_free at mm/mmu_gather.c:101
          (inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156
          rcu_core+0x35c/0xbb0
          rcu_do_batch at kernel/rcu/tree.c:2502
          (inlined by) rcu_core at kernel/rcu/tree.c:2737
          __do_softirq+0x480/0x704
          run_ksoftirqd+0x74/0xd0
          run_ksoftirqd at kernel/softirq.c:651
          (inlined by) run_ksoftirqd at kernel/softirq.c:642
          smpboot_thread_fn+0x278/0x320
          kthread+0x1c4/0x1d0
          ret_from_kernel_thread+0x5c/0x80
      
      Properly apply the masks before calling pmd_free_tlb() and
      pud_free_tlb() respectively.
      
      Fixes: 7bfe54b5 ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions")
      Reported-by: NQian Cai <qcai@redhat.com>
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/56feccd7b6fcd98e353361a233fa7bb8e67c3164.1607780469.git.christophe.leroy@csgroup.eu
      2198d493
  9. 09 12月, 2020 1 次提交
  10. 15 9月, 2020 2 次提交
  11. 29 7月, 2020 1 次提交
  12. 10 6月, 2020 1 次提交
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  13. 05 6月, 2020 1 次提交
  14. 04 6月, 2020 3 次提交
    • M
      hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate · 38237830
      Mike Kravetz 提交于
      hugetlb_add_hstate() prints a warning if the hstate already exists.  This
      was originally done as part of kernel command line parsing.  If
      'hugepagesz=' was specified more than once, the warning
      
      	pr_warn("hugepagesz= specified twice, ignoring\n");
      
      would be printed.
      
      Some architectures want to enable all huge page sizes.  They would call
      hugetlb_add_hstate for all supported sizes.  However, this was done after
      command line processing and as a result hstates could have already been
      created for some sizes.  To make sure no warning were printed, there would
      often be code like:
      
      	if (!size_to_hstate(size)
      		hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)
      
      The only time we want to print the warning is as the result of command
      line processing.  So, remove the warning from hugetlb_add_hstate and add
      it to the single arch independent routine processing "hugepagesz=".  After
      this, calls to size_to_hstate() in arch specific code can be removed and
      hugetlb_add_hstate can be called without worrying about warning messages.
      
      [mike.kravetz@oracle.com: fix hugetlb initialization]
        Link: http://lkml.kernel.org/r/4c36c6ce-3774-78fa-abc4-b7346bf24348@oracle.com
        Link: http://lkml.kernel.org/r/20200428205614.246260-5-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NAnders Roxell <anders.roxell@linaro.org>
      Acked-by: NMina Almasry <almasrymina@google.com>
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
      Acked-by: NWill Deacon <will@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Longpeng <longpeng2@huawei.com>
      Cc: Nitesh Narayan Lal <nitesh@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200417185049.275845-4-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20200428205614.246260-4-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38237830
    • M
      hugetlbfs: move hugepagesz= parsing to arch independent code · 359f2544
      Mike Kravetz 提交于
      Now that architectures provide arch_hugetlb_valid_size(), parsing of
      "hugepagesz=" can be done in architecture independent code.  Create a
      single routine to handle hugepagesz= parsing and remove all arch specific
      routines.  We can also remove the interface hugetlb_bad_size() as this is
      no longer used outside arch independent code.
      
      This also provides consistent behavior of hugetlbfs command line options.
      The hugepagesz= option should only be specified once for a specific size,
      but some architectures allow multiple instances.  This appears to be more
      of an oversight when code was added by some architectures to set up ALL
      huge pages sizes.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NSandipan Das <sandipan@linux.ibm.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Acked-by: NMina Almasry <almasrymina@google.com>
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
      Acked-by: NWill Deacon <will@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Longpeng <longpeng2@huawei.com>
      Cc: Nitesh Narayan Lal <nitesh@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Anders Roxell <anders.roxell@linaro.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200417185049.275845-3-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20200428205614.246260-3-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      359f2544
    • M
      hugetlbfs: add arch_hugetlb_valid_size · ae94da89
      Mike Kravetz 提交于
      Patch series "Clean up hugetlb boot command line processing", v4.
      
      Longpeng(Mike) reported a weird message from hugetlb command line
      processing and proposed a solution [1].  While the proposed patch does
      address the specific issue, there are other related issues in command line
      processing.  As hugetlbfs evolved, updates to command line processing have
      been made to meet immediate needs and not necessarily in a coordinated
      manner.  The result is that some processing is done in arch specific code,
      some is done in arch independent code and coordination is problematic.
      Semantics can vary between architectures.
      
      The patch series does the following:
      - Define arch specific arch_hugetlb_valid_size routine used to validate
        passed huge page sizes.
      - Move hugepagesz= command line parsing out of arch specific code and into
        an arch independent routine.
      - Clean up command line processing to follow desired semantics and
        document those semantics.
      
      [1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpeng2@huawei.com
      
      This patch (of 3):
      
      The architecture independent routine hugetlb_default_setup sets up the
      default huge pages size.  It has no way to verify if the passed value is
      valid, so it accepts it and attempts to validate at a later time.  This
      requires undocumented cooperation between the arch specific and arch
      independent code.
      
      For architectures that support more than one huge page size, provide a
      routine arch_hugetlb_valid_size to validate a huge page size.
      hugetlb_default_setup can use this to validate passed values.
      
      arch_hugetlb_valid_size will also be used in a subsequent patch to move
      processing of the "hugepagesz=" in arch specific code to a common routine
      in arch independent code.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
      Acked-by: NWill Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Longpeng <longpeng2@huawei.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Nitesh Narayan Lal <nitesh@redhat.com>
      Cc: Anders Roxell <anders.roxell@linaro.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200428205614.246260-1-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20200428205614.246260-2-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20200417185049.275845-1-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20200417185049.275845-2-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae94da89
  15. 26 5月, 2020 3 次提交
    • C
      powerpc/8xx: Only 8M pages are hugepte pages now · d4870b89
      Christophe Leroy 提交于
      512k pages are now standard pages, so only 8M pages
      are hugepte.
      
      No more handling of normal page tables through hugepd allocation
      and freeing, and hugepte helpers can also be simplified.
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/2c6135d57fb76eebf70673fbac3dc9e740767879.1589866984.git.christophe.leroy@csgroup.eu
      d4870b89
    • C
      powerpc/8xx: Manage 512k huge pages as standard pages. · b250c8c0
      Christophe Leroy 提交于
      At the time being, 512k huge pages are handled through hugepd page
      tables. The PMD entry is flagged as a hugepd pointer and it
      means that only 512k hugepages can be managed in that 4M block.
      However, the hugepd table has the same size as a normal page
      table, and 512k entries can therefore be nested with normal pages.
      
      On the 8xx, TLB loading is performed by software and allthough the
      page tables are organised to match the L1 and L2 level defined by
      the HW, all TLB entries have both L1 and L2 independent entries.
      It means that even if two TLB entries are associated with the same
      PMD entry, they can be loaded with different values in L1 part.
      
      The L1 entry contains the page size (PS field):
      - 00 for 4k and 16 pages
      - 01 for 512k pages
      - 11 for 8M pages
      
      By adding a flag for hugepages in the PTE (_PAGE_HUGE) and copying it
      into the lower bit of PS, we can then manage 512k pages with normal
      page tables:
      - PMD entry has PS=11 for 8M pages
      - PMD entry has PS=00 for other pages.
      
      As a PMD entry covers 4M areas, a PMD will either point to a hugepd
      table having a single entry to an 8M page, or the PMD will point to
      a standard page table which will have either entries to 4k or 16k or
      512k pages. For 512k pages, as the L1 entry will not know it is a
      512k page before the PTE is read, there will be 128 entries in the
      PTE as if it was 4k pages. But when loading the TLB, it will be
      flagged as a 512k page.
      
      Note that we can't use pmd_ptr() in asm/nohash/32/pgtable.h because
      it is not defined yet.
      
      In ITLB miss, we keep the possibility to opt it out as when kernel
      text is pinned and no user hugepages are used, we can save several
      instruction by not using r11.
      
      In DTLB miss, that's just one instruction so it's not worth bothering
      with it.
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/002819e8e166bf81d24b24782d98de7c40905d8f.1589866984.git.christophe.leroy@csgroup.eu
      b250c8c0
    • C
      powerpc/mm: Reduce hugepd size for 8M hugepages on 8xx · b12c07a4
      Christophe Leroy 提交于
      Commit 55c8fc3f ("powerpc/8xx: reintroduce 16K pages with HW
      assistance") redefined pte_t as a struct of 4 pte_basic_t, because
      in 16K pages mode there are four identical entries in the page table.
      But hugepd entries for 8M pages require only one entry of size
      pte_basic_t. So there is no point in creating a cache for 4 entries
      page tables.
      
      Calculate PTE_T_ORDER using the size of pte_basic_t instead of pte_t.
      
      Define specific huge_pte helpers (set_huge_pte_at(), huge_pte_clear(),
      huge_ptep_set_wrprotect()) to write the pte in a single entry instead
      of using set_pte_at() which writes 4 identical entries in 16k pages
      mode. Also make sure that __ptep_set_access_flags() properly handle
      the huge_pte case.
      
      Define set_pte_filter() inline otherwise GCC doesn't inline it anymore
      because it is now used twice, and that gives a pretty suboptimal code
      because of pte_t being a struct of 4 entries.
      
      Those functions are also used for 512k pages which only require one
      entry as well allthough replicating it four times was harmless as 512k
      pages entries are spread every 128 bytes in the table.
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/43050d1a0c2d6e1541cab9c1126fc80bc7015ebd.1589866984.git.christophe.leroy@csgroup.eu
      b12c07a4
  16. 15 5月, 2020 1 次提交
    • G
      powerpc/mm: Replace zero-length array with flexible-array · 02bddf21
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200507185755.GA15014@embeddedor
      02bddf21
  17. 17 2月, 2020 1 次提交
  18. 25 9月, 2019 1 次提交
  19. 13 7月, 2019 1 次提交
  20. 04 7月, 2019 3 次提交
  21. 15 5月, 2019 1 次提交
    • M
      powerpc/mm: Fix crashes with hugepages & 4K pages · 7338874c
      Michael Ellerman 提交于
      The recent commit to cleanup ifdefs in the hugepage initialisation led
      to crashes when using 4K pages as reported by Sachin:
      
        BUG: Kernel NULL pointer dereference at 0x0000001c
        Faulting instruction address: 0xc000000001d1e58c
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        ...
        CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G        W  O      5.1.0-next-20190507-autotest #1
        NIP:  c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000
        REGS: c000000004937890 TRAP: 0300
        MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22424822  XER: 00000000
        CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0
        ...
        NIP kmem_cache_alloc+0xbc/0x5a0
        LR  kmem_cache_alloc+0x7c/0x5a0
        Call Trace:
          huge_pte_alloc+0x580/0x950
          hugetlb_fault+0x9a0/0x1250
          handle_mm_fault+0x490/0x4a0
          __do_page_fault+0x77c/0x1f00
          do_page_fault+0x28/0x50
          handle_page_fault+0x18/0x38
      
      This is caused by us trying to allocate from a NULL kmem cache in
      __hugepte_alloc(). The kmem cache is NULL because it was never
      allocated in hugetlbpage_init(), because add_huge_page_size() returned
      an error.
      
      The reason add_huge_page_size() returned an error is a simple typo, we
      are calling check_and_get_huge_psize(size) when we should be passing
      shift instead.
      
      The fact that we're able to trigger this path when the kmem caches are
      NULL is a separate bug, ie. we should not advertise any hugepage sizes
      if we haven't setup the required caches for them.
      
      This was only seen with 4K pages, with 64K pages we don't need to
      allocate any extra kmem caches because the 16M hugepage just occupies
      a single entry at the PMD level.
      
      Fixes: 723f268f ("powerpc/mm: cleanup ifdef mess in add_huge_page_size()")
      Reported-by: NSachin Sant <sachinp@linux.ibm.com>
      Tested-by: NSachin Sant <sachinp@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      7338874c
  22. 06 5月, 2019 1 次提交
  23. 02 5月, 2019 8 次提交
  24. 04 12月, 2018 2 次提交
    • C
      powerpc/8xx: Enable 512k hugepage support with HW assistance · 3fb69c6a
      Christophe Leroy 提交于
      For using 512k pages with hardware assistance, the PTEs have to be spread
      every 128 bytes in the L2 table.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3fb69c6a
    • C
      powerpc/mm: fix a warning when a cache is common to PGD and hugepages · 1e03c7e2
      Christophe Leroy 提交于
      While implementing TLB miss HW assistance on the 8xx, the following
      warning was encountered:
      
      [  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 ___slab_alloc.constprop.30+0x26c/0x46c
      [  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 4.18.0-rc8-00664-g2dfff9121c55 #671
      [  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 00000004
      [  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  (4.18.0-rc8-00664-g2dfff9121c55)
      [  423.733147] MSR:  00021032 <ME,IR,DR,RI>  CR: 24224848  XER: 20000000
      [  423.733319]
      [  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 c0011b34 c7fa41e0 c455be30
      [  423.733319] GPR08: 00000001 c00103a0 c7fa41e0 c49afcc4 24282842 10018840 c079b37c 00000040
      [  423.733319] GPR16: 73f00000 00210d00 00000000 00000001 c455a000 00000100 00000200 c455a000
      [  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 c7fa41e0 00000000 00009032
      [  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
      [  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
      [  423.734283] Call Trace:
      [  423.734326] [c455bc50] [00000100] 0x100 (unreliable)
      [  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
      [  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
      [  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
      [  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
      [  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
      [  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
      [  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
      [  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
      [  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
      [  423.735271] Instruction dump:
      [  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 4bfffe24 81370010
      [  423.735536] 71280004 41a2ff88 4840c571 4bffff80 <0fe00000> 4bfffeb8 81340010 712a0004
      [  423.735757] ---[ end trace e9b222919a470790 ]---
      
      This warning occurs when calling kmem_cache_zalloc() on a
      cache having a constructor.
      
      In this case it happens because PGD cache and 512k hugepte cache are
      the same size (4k). While a cache with constructor is created for
      the PGD, hugepages create cache without constructor and uses
      kmem_cache_zalloc(). As both expect a cache with the same size,
      the hugepages reuse the cache created for PGD, hence the conflict.
      
      In order to avoid this conflict, this patch:
      - modifies pgtable_cache_add() so that a zeroising constructor is
      added for any cache size.
      - replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1e03c7e2