1. 03 6月, 2018 6 次提交
    • N
      powerpc/64s/radix: optimise pte_update · 85bcfaf6
      Nicholas Piggin 提交于
      Implementing pte_update with pte_xchg (which uses cmpxchg) is
      inefficient. A single larx/stcx. works fine, no need for the less
      efficient cmpxchg sequence.
      
      Then remove the memory barriers from the operation. There is a
      requirement for TLB flushing to load mm_cpumask after the store
      that reduces pte permissions, which is moved into the TLB flush
      code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85bcfaf6
    • N
      powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags · f1cb8f9b
      Nicholas Piggin 提交于
      The ISA suggests ptesync after setting a pte, to prevent a table walk
      initiated by a subsequent access from missing that store and causing a
      spurious fault. This is an architectual allowance that allows an
      implementation's page table walker to be incoherent with the store
      queue.
      
      However there is no correctness problem in taking a spurious fault in
      userspace -- the kernel copes with these at any time, so the updated
      pte will be found eventually. Spurious kernel faults on vmap memory
      must be avoided, so a ptesync is put into flush_cache_vmap.
      
      On POWER9 so far I have not found a measurable window where this can
      result in more minor faults, so as an optimisation, remove the costly
      ptesync from pte updates. If an implementation benefits from ptesync,
      it would be better to add it back in update_mmu_cache, so it's not
      done for things like fork(2).
      
      fork --fork --exec benchmark improved 5.2% (12400->13100).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f1cb8f9b
    • N
      powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case · f569bd94
      Nicholas Piggin 提交于
      This matches other architectures, when we know there will be no
      further accesses to the address (e.g., for teardown), page table
      entries can be cleared non-atomically.
      
      The comments about NMMU are bogus: all MMU notifiers (including NMMU)
      are released at this point, with their TLBs flushed. An NMMU access at
      this point would be a bug.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f569bd94
    • N
      powerpc/64s/radix: do not flush TLB on spurious fault · 6d8278c4
      Nicholas Piggin 提交于
      In the case of a spurious fault (which can happen due to a race with
      another thread that changes the page table), the default Linux mm code
      calls flush_tlb_page for that address. This is not required because
      the pte will be re-fetched. Hash does not wire this up to a hardware
      TLB flush for this reason. This patch avoids the flush for radix.
      
      >From Power ISA v3.0B, p.1090:
      
          Setting a Reference or Change Bit or Upgrading Access Authority
          (PTE Subject to Atomic Hardware Updates)
      
          If the only change being made to a valid PTE that is subject to
          atomic hardware updates is to set the Refer- ence or Change bit to
          1 or to add access authorities, a simpler sequence suffices
          because the translation hardware will refetch the PTE if an access
          is attempted for which the only problems were reference and/or
          change bits needing to be set or insufficient access authority.
      
      The nest MMU on POWER9 does not re-fetch the PTE after such an access
      attempt before faulting, so address spaces with a coprocessor
      attached will continue to flush in these cases.
      
      This reduces tlbies for a kernel compile workload from 0.95M to 0.90M.
      
      fork --fork --exec benchmark improved 0.5% (12300->12400).
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6d8278c4
    • A
      powerpc/mm: Change function prototype · e4c1112c
      Aneesh Kumar K.V 提交于
      In later patch, we use the vma and psize to do tlb flush. Do the prototype
      update in separate patch to make the review easy.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e4c1112c
    • A
      powerpc/mm/radix: Move function from radix.h to pgtable-radix.c · 044003b5
      Aneesh Kumar K.V 提交于
      In later patch we will update them which require them to be moved
      to pgtable-radix.c. Keeping the function in radix.h results in
      compile warning as below.
      
      ./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
      ./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
        struct mm_struct *mm = vma->vm_mm;
                                  ^~
      ./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
            atomic_read(&mm->context.copros) > 0) {
            ^~~~~~~~~~~
            __atomic_load
      ./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
            atomic_read(&mm->context.copros) > 0) {
      
      Instead of fixing header dependencies, we move the function to pgtable-radix.c
      Also the function is now large to be a static inline . Doing the
      move in separate patch helps in review.
      
      No functional change in this patch. Only code movement.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      044003b5
  2. 17 5月, 2018 1 次提交
  3. 15 5月, 2018 4 次提交
  4. 04 4月, 2018 1 次提交
    • A
      powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix · fb4e5dbd
      Aneesh Kumar K.V 提交于
      With split PTL (page table lock) config, we allocate the level
      4 (leaf) page table using pte fragment framework instead of slab cache
      like other levels. This was done to enable us to have split page table
      lock at the level 4 of the page table. We use page->plt backing the
      all the level 4 pte fragment for the lock.
      
      Currently with Radix, we use only 16 fragments out of the allocated
      page. In radix each fragment is 256 bytes which means we use only 4k
      out of the allocated 64K page wasting 60k of the allocated memory.
      This was done earlier to keep it closer to hash.
      
      This patch update the pte fragment count to 256, thereby using the
      full 64K page and reducing the memory usage. Performance tests shows
      really low impact even with THP disabled. With THP disabled we will be
      contenting further less on level 4 ptl and hence the impact should be
      further low.
      
        256 threads:
          without patch (10 runs of ./ebizzy  -m -n 1000 -s 131072 -S 100)
            median = 15678.5
            stdev = 42.1209
      
          with patch:
            median = 15354
            stdev = 194.743
      
      This is with THP disabled. With THP enabled the impact of the patch
      will be less.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fb4e5dbd
  5. 30 3月, 2018 5 次提交
  6. 23 3月, 2018 2 次提交
  7. 13 3月, 2018 1 次提交
    • N
      powerpc/mm/slice: implement a slice mask cache · 5709f7cf
      Nicholas Piggin 提交于
      Calculating the slice mask can become a signifcant overhead for
      get_unmapped_area. This patch adds a struct slice_mask for
      each page size in the mm_context, and keeps these in synch with
      the slices psize arrays and slb_addr_limit.
      
      On Book3S/64 this adds 288 bytes to the mm_context_t for the
      slice mask caches.
      
      On POWER8, this increases vfork+exec+exit performance by 9.9%
      and reduces time to mmap+munmap a 64kB page by 28%.
      
      Reduces time to mmap+munmap by about 10% on 8xx.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5709f7cf
  8. 06 3月, 2018 2 次提交
  9. 13 2月, 2018 4 次提交
  10. 01 2月, 2018 2 次提交
  11. 30 1月, 2018 1 次提交
    • M
      powerpc/mm/radix: Fix build error when RADIX_MMU=n · 015eb1b8
      Michael Ellerman 提交于
      The recent TLB flush rework broke the build when the Radix MMU is
      disabled at build time, eg:
      
        (.text+0x264): undefined reference to `.radix__tlbiel_all'
      
      We could add an empty version, but if we ever called it by accident
      that would indicate a bad bug, so add a stub that just WARNs if we do.
      
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      015eb1b8
  12. 20 1月, 2018 6 次提交
  13. 19 1月, 2018 1 次提交
    • M
      powerpc/64s: Fix ps3 build error due to tlbiel_all() · 7a074fc0
      Michael Ellerman 提交于
      The recent changes to TLB handling broke the PS3 build:
      
        arch/powerpc/include/asm/book3s/64/tlbflush.h:30: undefined reference to `.hash__tlbiel_all'
      
      Fix it by adding an fallback version of tlbiel_all() for non-native
      builds. It should never be called, due to checks in callers so it
      calls BUG(). We should probably clean it up further but this will
      suffice for now.
      
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7a074fc0
  14. 17 1月, 2018 1 次提交
    • N
      powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 · d4748276
      Nicholas Piggin 提交于
      There are several cases outside the normal address space management
      where a CPU's entire local TLB is to be flushed:
      
        1. Booting the kernel, in case something has left stale entries in
           the TLB (e.g., kexec).
      
        2. Machine check, to clean corrupted TLB entries.
      
      One other place where the TLB is flushed, is waking from deep idle
      states. The flush is a side-effect of calling ->cpu_restore with the
      intention of re-setting various SPRs. The flush itself is unnecessary
      because in the first case, the TLB should not acquire new corrupted
      TLB entries as part of sleep/wake (though they may be lost).
      
      This type of TLB flush is coded inflexibly, several times for each CPU
      type, and they have a number of problems with ISA v3.0B:
      
      - The current radix mode of the MMU is not taken into account, it is
        always done as a hash flushn For IS=2 (LPID-matching flush from host)
        and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
        the R field does not match the current radix mode.
      
      - ISA v3.0B hash must flush the partition and process table caches as
        well.
      
      - ISA v3.0B radix must flush partition and process scoped translations,
        partition and process table caches, and also the page walk cache.
      
      So consolidate the flushing code and implement it in C and inline asm
      under the mm/ directory with the rest of the flush code. Add ISA v3.0B
      cases for radix and hash, and use the radix flush in radix environment.
      
      Provide a way for IS=2 (LPID flush) to specify the radix mode of the
      partition. Have KVM pass in the radix mode of the guest.
      
      Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
      and move it later in the boot process after, the MMU registers are set
      up and before relocation is first turned on.
      
      The TLB flush is no longer called when restoring from deep idle states.
      This was not be done as a separate step because booting secondaries
      uses the same cpu_restore as idle restore, which needs the TLB flush.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4748276
  15. 16 1月, 2018 2 次提交
    • C
      powerpc/mm: Introduce _PAGE_NA · 35175033
      Christophe Leroy 提交于
      Today, PAGE_NONE is defined as a page not having _PAGE_USER.
      In some circunstances, when the CPU supports it, it might be
      better to be able to flag a page with NO ACCESS.
      
      In a following patch, the 8xx will switch user access being flagged
      in the PMD, therefore it will not be possible anymore to use
      _PAGE_USER as a way to flag a page with no access.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      35175033
    • C
      powerpc/mm: extend _PAGE_PRIVILEGED to all CPUs · 812fadcb
      Christophe Leroy 提交于
      commit ac29c640 ("powerpc/mm: Replace _PAGE_USER with
      _PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64
      
      This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing
      to have either _PAGE_PRIVILEGED or _PAGE_USER or both.
      
      PPC_8xx has a _PAGE_SHARED flag which is set for and only for
      all non user pages. Lets rename it _PAGE_PRIVILEGED to remove
      confusion as it has nothing to do with Linux shared pages.
      
      On BookE, there's a _PAGE_BAP_SR which has to be set for kernel
      pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make
      this generic
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      812fadcb
  16. 22 12月, 2017 1 次提交