1. 14 10月, 2018 9 次提交
  2. 03 10月, 2018 5 次提交
  3. 19 9月, 2018 4 次提交
    • N
      powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup · 2e162674
      Nicholas Piggin 提交于
      This will be used by the SLB code in the next patch, but for now this
      sets the slb_addr_limit to the correct size for 32-bit tasks.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2e162674
    • N
      powerpc/64s/hash: remove user SLB data from the paca · 8fed04d0
      Nicholas Piggin 提交于
      User SLB mappig data is copied into the PACA from the mm->context so
      it can be accessed by the SLB miss handlers.
      
      After the C conversion, SLB miss handlers now run with relocation on,
      and user SLB misses are able to take recursive kernel SLB misses, so
      the user SLB mapping data can be removed from the paca and accessed
      directly.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8fed04d0
    • N
      powerpc/64s/hash: remove the vmalloc segment from the bolted SLB · 85376e2a
      Nicholas Piggin 提交于
      Remove the vmalloc segment from bolted SLBEs. This is not required to
      be bolted, and seems like it was added to help pre-load the SLB on
      context switch. However there are now other segments like the vmemmap
      segment and non-zero node memory that often take misses after a context
      switch, so it is better to solve this in a more general way.
      
      A subsequent change will track free SLB entries and uses those rather
      than round-robin overwrite valid entries, which makes it far less
      likely for kernel SLBEs to be evicted after they are installed.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85376e2a
    • M
      powerpc/pseries: Dump the SLB contents on SLB MCE errors. · c6d15258
      Mahesh Salgaonkar 提交于
      If we get a machine check exceptions due to SLB errors then dump the
      current SLB contents which will be very much helpful in debugging the
      root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
      faulty SLB entries. In real mode mce handler saves the old SLB contents
      into this buffer accessible through paca and print it out later in virtual
      mode.
      
      With this patch the console will log SLB contents like below on SLB MCE
      errors:
      
      [  507.297236] SLB contents of cpu 0x1
      [  507.297237] Last SLB entry inserted at slot 16
      [  507.297238] 00 c000000008000000 400ea1b217000500
      [  507.297239]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
      [  507.297240] 01 d000000008000000 400d43642f000510
      [  507.297242]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297243] 11 f000000008000000 400a86c85f000500
      [  507.297244]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
      [  507.297245] 12 00007f0008000000 4008119624000d90
      [  507.297246]   1T  ESID=       7f  VSID=      8119624 LLP:110
      [  507.297247] 13 0000000018000000 00092885f5150d90
      [  507.297247]  256M ESID=        1  VSID=   92885f5150 LLP:110
      [  507.297248] 14 0000010008000000 4009e7cb50000d90
      [  507.297249]   1T  ESID=        1  VSID=      9e7cb50 LLP:110
      [  507.297250] 15 d000000008000000 400d43642f000510
      [  507.297251]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297252] 16 d000000008000000 400d43642f000510
      [  507.297253]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297253] ----------------------------------
      [  507.297254] SLB cache ptr value = 3
      [  507.297254] Valid SLB cache entries:
      [  507.297255] 00 EA[0-35]=    7f000
      [  507.297256] 01 EA[0-35]=        1
      [  507.297257] 02 EA[0-35]=     1000
      [  507.297257] Rest of SLB cache entries:
      [  507.297258] 03 EA[0-35]=    7f000
      [  507.297258] 04 EA[0-35]=        1
      [  507.297259] 05 EA[0-35]=     1000
      [  507.297260] 06 EA[0-35]=       12
      [  507.297260] 07 EA[0-35]=    7f000
      Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c6d15258
  4. 23 8月, 2018 1 次提交
  5. 13 8月, 2018 1 次提交
  6. 10 8月, 2018 2 次提交
  7. 30 7月, 2018 2 次提交
    • M
      powerpc/mm: Don't report PUDs as memory leaks when using kmemleak · a984506c
      Michael Ellerman 提交于
      Paul Menzel reported that kmemleak was producing reports such as:
      
        unreferenced object 0xc0000000f8b80000 (size 16384):
          comm "init", pid 1, jiffies 4294937416 (age 312.240s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<00000000d997deb7>] __pud_alloc+0x80/0x190
            [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0
            [<00000000091e51c2>] shift_arg_pages+0xc0/0x210
            [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0
            [<0000000060871529>] load_elf_binary+0x41c/0x1648
            [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280
            [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
            [<000000005f953a6e>] sys_execve+0x58/0x70
            [<000000009700a858>] system_call+0x5c/0x70
      
      Indicating that a PUD was being leaked.
      
      However what's really happening is that kmemleak is not able to
      recognise the references from the PGD to the PUD, because they are not
      fully qualified pointers.
      
      We can confirm that in xmon, eg:
      
      Find the task struct for pid 1 "init":
        0:mon> P
             task_struct     ->thread.ksp    PID   PPID S  P CMD
        c0000001fe7c0000 c0000001fe803960      1      0 S 13 systemd
      
      Dump virtual address 0 to find the PGD:
        0:mon> dv 0 c0000001fe7c0000
        pgd  @ 0xc0000000f8b01000
      
      Dump the memory of the PGD:
        0:mon> d c0000000f8b01000
        c0000000f8b01000 00000000f8b90000 0000000000000000  |................|
        c0000000f8b01010 0000000000000000 0000000000000000  |................|
        c0000000f8b01020 0000000000000000 0000000000000000  |................|
        c0000000f8b01030 0000000000000000 00000000f8b80000  |................|
                                          ^^^^^^^^^^^^^^^^
      
      There we can see the reference to our supposedly leaked PUD. But
      because it's missing the leading 0xc, kmemleak won't recognise it.
      
      We can confirm it's still in use by translating an address that is
      mapped via it:
        0:mon> dv 7fff94000000 c0000001fe7c0000
        pgd  @ 0xc0000000f8b01000
        pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <--
        pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000
        pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000
        ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386
        Maps physical address = 0x00000001d5ce0000
        Flags = Accessed Dirty Read Write
      
      The fix is fairly simple. We need to tell kmemleak to ignore PUD
      allocations and never report them as leaks. We can also tell it not to
      scan the PGD, because it will never find pointers in there. However it
      will still notice if we allocate a PGD and then leak it.
      Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a984506c
    • C
      powerpc: move ASM_CONST and stringify_in_c() into asm-const.h · ec0c464c
      Christophe Leroy 提交于
      This patch moves ASM_CONST() and stringify_in_c() into
      dedicated asm-const.h, then cleans all related inclusions.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: asm-compat.h should include asm-const.h]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ec0c464c
  8. 24 7月, 2018 1 次提交
  9. 16 7月, 2018 1 次提交
  10. 20 6月, 2018 1 次提交
  11. 08 6月, 2018 1 次提交
    • L
      mm: introduce ARCH_HAS_PTE_SPECIAL · 3010a5ea
      Laurent Dufour 提交于
      Currently the PTE special supports is turned on in per architecture
      header files.  Most of the time, it is defined in
      arch/*/include/asm/pgtable.h depending or not on some other per
      architecture static definition.
      
      This patch introduce a new configuration variable to manage this
      directly in the Kconfig files.  It would later replace
      __HAVE_ARCH_PTE_SPECIAL.
      
      Here notes for some architecture where the definition of
      __HAVE_ARCH_PTE_SPECIAL is not obvious:
      
      arm
       __HAVE_ARCH_PTE_SPECIAL which is currently defined in
      arch/arm/include/asm/pgtable-3level.h which is included by
      arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
      So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.
      
      powerpc
      __HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
       - arch/powerpc/include/asm/book3s/64/pgtable.h
       - arch/powerpc/include/asm/pte-common.h
      The first one is included if (PPC_BOOK3S & PPC64) while the second is
      included in all the other cases.
      So select ARCH_HAS_PTE_SPECIAL all the time.
      
      sparc:
      __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
      defined(__arch64__) which are defined through the compiler in
      sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
      So select ARCH_HAS_PTE_SPECIAL if SPARC64
      
      There is no functional change introduced by this patch.
      
      Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Suggested-by: NJerome Glisse <jglisse@redhat.com>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <albert@sifive.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Christophe LEROY <christophe.leroy@c-s.fr>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3010a5ea
  12. 03 6月, 2018 6 次提交
    • N
      powerpc/64s/radix: optimise pte_update · 85bcfaf6
      Nicholas Piggin 提交于
      Implementing pte_update with pte_xchg (which uses cmpxchg) is
      inefficient. A single larx/stcx. works fine, no need for the less
      efficient cmpxchg sequence.
      
      Then remove the memory barriers from the operation. There is a
      requirement for TLB flushing to load mm_cpumask after the store
      that reduces pte permissions, which is moved into the TLB flush
      code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85bcfaf6
    • N
      powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags · f1cb8f9b
      Nicholas Piggin 提交于
      The ISA suggests ptesync after setting a pte, to prevent a table walk
      initiated by a subsequent access from missing that store and causing a
      spurious fault. This is an architectual allowance that allows an
      implementation's page table walker to be incoherent with the store
      queue.
      
      However there is no correctness problem in taking a spurious fault in
      userspace -- the kernel copes with these at any time, so the updated
      pte will be found eventually. Spurious kernel faults on vmap memory
      must be avoided, so a ptesync is put into flush_cache_vmap.
      
      On POWER9 so far I have not found a measurable window where this can
      result in more minor faults, so as an optimisation, remove the costly
      ptesync from pte updates. If an implementation benefits from ptesync,
      it would be better to add it back in update_mmu_cache, so it's not
      done for things like fork(2).
      
      fork --fork --exec benchmark improved 5.2% (12400->13100).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f1cb8f9b
    • N
      powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case · f569bd94
      Nicholas Piggin 提交于
      This matches other architectures, when we know there will be no
      further accesses to the address (e.g., for teardown), page table
      entries can be cleared non-atomically.
      
      The comments about NMMU are bogus: all MMU notifiers (including NMMU)
      are released at this point, with their TLBs flushed. An NMMU access at
      this point would be a bug.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f569bd94
    • N
      powerpc/64s/radix: do not flush TLB on spurious fault · 6d8278c4
      Nicholas Piggin 提交于
      In the case of a spurious fault (which can happen due to a race with
      another thread that changes the page table), the default Linux mm code
      calls flush_tlb_page for that address. This is not required because
      the pte will be re-fetched. Hash does not wire this up to a hardware
      TLB flush for this reason. This patch avoids the flush for radix.
      
      >From Power ISA v3.0B, p.1090:
      
          Setting a Reference or Change Bit or Upgrading Access Authority
          (PTE Subject to Atomic Hardware Updates)
      
          If the only change being made to a valid PTE that is subject to
          atomic hardware updates is to set the Refer- ence or Change bit to
          1 or to add access authorities, a simpler sequence suffices
          because the translation hardware will refetch the PTE if an access
          is attempted for which the only problems were reference and/or
          change bits needing to be set or insufficient access authority.
      
      The nest MMU on POWER9 does not re-fetch the PTE after such an access
      attempt before faulting, so address spaces with a coprocessor
      attached will continue to flush in these cases.
      
      This reduces tlbies for a kernel compile workload from 0.95M to 0.90M.
      
      fork --fork --exec benchmark improved 0.5% (12300->12400).
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6d8278c4
    • A
      powerpc/mm: Change function prototype · e4c1112c
      Aneesh Kumar K.V 提交于
      In later patch, we use the vma and psize to do tlb flush. Do the prototype
      update in separate patch to make the review easy.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e4c1112c
    • A
      powerpc/mm/radix: Move function from radix.h to pgtable-radix.c · 044003b5
      Aneesh Kumar K.V 提交于
      In later patch we will update them which require them to be moved
      to pgtable-radix.c. Keeping the function in radix.h results in
      compile warning as below.
      
      ./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
      ./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
        struct mm_struct *mm = vma->vm_mm;
                                  ^~
      ./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
            atomic_read(&mm->context.copros) > 0) {
            ^~~~~~~~~~~
            __atomic_load
      ./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
            atomic_read(&mm->context.copros) > 0) {
      
      Instead of fixing header dependencies, we move the function to pgtable-radix.c
      Also the function is now large to be a static inline . Doing the
      move in separate patch helps in review.
      
      No functional change in this patch. Only code movement.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      044003b5
  13. 17 5月, 2018 1 次提交
  14. 15 5月, 2018 4 次提交
  15. 04 4月, 2018 1 次提交
    • A
      powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix · fb4e5dbd
      Aneesh Kumar K.V 提交于
      With split PTL (page table lock) config, we allocate the level
      4 (leaf) page table using pte fragment framework instead of slab cache
      like other levels. This was done to enable us to have split page table
      lock at the level 4 of the page table. We use page->plt backing the
      all the level 4 pte fragment for the lock.
      
      Currently with Radix, we use only 16 fragments out of the allocated
      page. In radix each fragment is 256 bytes which means we use only 4k
      out of the allocated 64K page wasting 60k of the allocated memory.
      This was done earlier to keep it closer to hash.
      
      This patch update the pte fragment count to 256, thereby using the
      full 64K page and reducing the memory usage. Performance tests shows
      really low impact even with THP disabled. With THP disabled we will be
      contenting further less on level 4 ptl and hence the impact should be
      further low.
      
        256 threads:
          without patch (10 runs of ./ebizzy  -m -n 1000 -s 131072 -S 100)
            median = 15678.5
            stdev = 42.1209
      
          with patch:
            median = 15354
            stdev = 194.743
      
      This is with THP disabled. With THP enabled the impact of the patch
      will be less.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fb4e5dbd