1. 05 11月, 2019 2 次提交
  2. 24 9月, 2019 2 次提交
  3. 05 9月, 2019 3 次提交
  4. 04 7月, 2019 1 次提交
  5. 03 7月, 2019 3 次提交
  6. 31 5月, 2019 1 次提交
  7. 15 5月, 2019 2 次提交
    • M
      powerpc/mm/radix: mark as __tlbie_pid() and friends as__always_inline · efc344c5
      Masahiro Yamada 提交于
      This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common
      place.  We need to eliminate potential issues beforehand.
      
      If it is enabled for powerpc, the following errors are reported:
      
        arch/powerpc/mm/tlb-radix.c: In function '__tlbie_lpid':
        arch/powerpc/mm/tlb-radix.c:148:2: warning: asm operand 3 probably doesn't match constraints
          asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
          ^~~
        arch/powerpc/mm/tlb-radix.c:148:2: error: impossible constraint in 'asm'
        arch/powerpc/mm/tlb-radix.c: In function '__tlbie_pid':
        arch/powerpc/mm/tlb-radix.c:118:2: warning: asm operand 3 probably doesn't match constraints
          asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
          ^~~
        arch/powerpc/mm/tlb-radix.c: In function '__tlbiel_pid':
        arch/powerpc/mm/tlb-radix.c:104:2: warning: asm operand 3 probably doesn't match constraints
          asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
          ^~~
      
      Link: http://lkml.kernel.org/r/20190423034959.13525-11-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boris Brezillon <bbrezillon@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Marek Vasut <marek.vasut@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      efc344c5
    • M
      powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline · e12d6d7d
      Masahiro Yamada 提交于
      This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common
      place.  We need to eliminate potential issues beforehand.
      
      If it is enabled for powerpc, the following error is reported:
      
        arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize':
        arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't match constraints [-Werror]
          asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
          ^~~
        arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'
      
      Link: http://lkml.kernel.org/r/20190423034959.13525-10-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boris Brezillon <bbrezillon@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Marek Vasut <marek.vasut@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e12d6d7d
  8. 02 5月, 2019 1 次提交
  9. 20 10月, 2018 1 次提交
  10. 09 10月, 2018 1 次提交
    • S
      KVM: PPC: Book3S HV: Handle page fault for a nested guest · fd10be25
      Suraj Jitindar Singh 提交于
      Consider a normal (L1) guest running under the main hypervisor (L0),
      and then a nested guest (L2) running under the L1 guest which is acting
      as a nested hypervisor. L0 has page tables to map the address space for
      L1 providing the translation from L1 real address -> L0 real address;
      
      	L1
      	|
      	| (L1 -> L0)
      	|
      	----> L0
      
      There are also page tables in L1 used to map the address space for L2
      providing the translation from L2 real address -> L1 read address. Since
      the hardware can only walk a single level of page table, we need to
      maintain in L0 a "shadow_pgtable" for L2 which provides the translation
      from L2 real address -> L0 real address. Which looks like;
      
      	L2				L2
      	|				|
      	| (L2 -> L1)			|
      	|				|
      	----> L1			| (L2 -> L0)
      	      |				|
      	      | (L1 -> L0)		|
      	      |				|
      	      ----> L0			--------> L0
      
      When a page fault occurs while running a nested (L2) guest we need to
      insert a pte into this "shadow_pgtable" for the L2 -> L0 mapping. To
      do this we need to:
      
      1. Walk the pgtable in L1 memory to find the L2 -> L1 mapping, and
         provide a page fault to L1 if this mapping doesn't exist.
      2. Use our L1 -> L0 pgtable to convert this L1 address to an L0 address,
         or try to insert a pte for that mapping if it doesn't exist.
      3. Now we have a L2 -> L0 mapping, insert this into our shadow_pgtable
      
      Once this mapping exists we can take rc faults when hardware is unable
      to automatically set the reference and change bits in the pte. On these
      we need to:
      
      1. Check the rc bits on the L2 -> L1 pte match, and otherwise reflect
         the fault down to L1.
      2. Set the rc bits in the L1 -> L0 pte which corresponds to the same
         host page.
      3. Set the rc bits in the L2 -> L0 pte.
      
      As we reuse a large number of functions in book3s_64_mmu_radix.c for
      this we also needed to refactor a number of these functions to take
      an lpid parameter so that the correct lpid is used for tlb invalidations.
      The functionality however has remained the same.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fd10be25
  11. 04 10月, 2018 1 次提交
  12. 16 7月, 2018 1 次提交
  13. 20 6月, 2018 1 次提交
  14. 19 6月, 2018 1 次提交
  15. 03 6月, 2018 2 次提交
    • N
      powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask · 0cef77c7
      Nicholas Piggin 提交于
      When a single-threaded process has a non-local mm_cpumask, try to use
      that point to flush the TLBs out of other CPUs in the cpumask.
      
      An IPI is used for clearing remote CPUs for a few reasons:
      - An IPI can end lazy TLB use of the mm, which is required to prevent
        TLB entries being created on the remote CPU. The alternative is to
        drop lazy TLB switching completely, which costs 7.5% in a context
        switch ping-pong test betwee a process and kernel idle thread.
      - An IPI can have remote CPUs flush the entire PID, but the local CPU
        can flush a specific VA. tlbie would require over-flushing of the
        local CPU (where the process is running).
      - A single threaded process that is migrated to a different CPU is
        likely to have a relatively small mm_cpumask, so IPI is reasonable.
      
      No other thread can concurrently switch to this mm, because it must
      have been given a reference to mm_users by the current thread before it
      can use_mm. mm_users can be asynchronously incremented (by
      mm_activate or mmget_not_zero), but those users must use remote mm
      access and can't use_mm or access user address space. Existing code
      makes the this assumption already, for example sparc64 has reset
      mm_cpumask using this condition since the start of history, see
      arch/sparc/kernel/smp_64.c.
      
      This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
      tlbiels are increased significantly due to the PID flushing for the
      cleaning up remote CPUs, and increased local flushes (PID flushes take
      128 tlbiels vs 1 tlbie).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0cef77c7
    • N
      powerpc/64s/radix: optimise pte_update · 85bcfaf6
      Nicholas Piggin 提交于
      Implementing pte_update with pte_xchg (which uses cmpxchg) is
      inefficient. A single larx/stcx. works fine, no need for the less
      efficient cmpxchg sequence.
      
      Then remove the memory barriers from the operation. There is a
      requirement for TLB flushing to load mm_cpumask after the store
      that reduces pte permissions, which is moved into the TLB flush
      code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85bcfaf6
  16. 17 5月, 2018 1 次提交
  17. 12 4月, 2018 1 次提交
    • M
      powerpc/mm/radix: Fix checkstops caused by invalid tlbiel · 2675c13b
      Michael Ellerman 提交于
      In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to
      construct tlbiel instructions. The instruction takes 5 fields, two of
      which are registers, and the others are constants. But because it's
      constructed with inline asm the compiler doesn't know that.
      
      We got the constraint wrong on the 'r' field, using "r" tells the
      compiler to put the value in a register. The value we then get in the
      macro is the *register number*, not the value of the field.
      
      That means when we mask the register number with 0x1 we get 0 or 1
      depending on which register the compiler happens to put the constant
      in, eg:
      
        li      r10,1
        tlbiel  r8,r9,2,0,0
      
        li      r7,1
        tlbiel  r10,r6,0,0,1
      
      If we're unlucky we might generate an invalid instruction form, for
      example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been
      observed to cause machine checks:
      
        Oops: Machine check, sig: 7 [#1]
        CPU: 24 PID: 0 Comm: swapper
        NIP:  00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f
        REGS: c00000000110bb40 TRAP: 0200
        MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48002222  XER: 20040000
        CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1
      
      If the machine check happens early in boot while we have MSR_ME=0 it
      will escalate into a checkstop and kill the box entirely.
      
      To fix it we could change the inline asm constraint to "i" which
      tells the compiler the value is a constant. But a better fix is to just
      pass a literal 1 into the macro, which bypasses any problems with inline
      asm constraints.
      
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      2675c13b
  18. 01 4月, 2018 1 次提交
  19. 30 3月, 2018 1 次提交
  20. 27 3月, 2018 1 次提交
  21. 23 3月, 2018 4 次提交
  22. 22 1月, 2018 1 次提交
  23. 17 1月, 2018 1 次提交
    • N
      powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 · d4748276
      Nicholas Piggin 提交于
      There are several cases outside the normal address space management
      where a CPU's entire local TLB is to be flushed:
      
        1. Booting the kernel, in case something has left stale entries in
           the TLB (e.g., kexec).
      
        2. Machine check, to clean corrupted TLB entries.
      
      One other place where the TLB is flushed, is waking from deep idle
      states. The flush is a side-effect of calling ->cpu_restore with the
      intention of re-setting various SPRs. The flush itself is unnecessary
      because in the first case, the TLB should not acquire new corrupted
      TLB entries as part of sleep/wake (though they may be lost).
      
      This type of TLB flush is coded inflexibly, several times for each CPU
      type, and they have a number of problems with ISA v3.0B:
      
      - The current radix mode of the MMU is not taken into account, it is
        always done as a hash flushn For IS=2 (LPID-matching flush from host)
        and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
        the R field does not match the current radix mode.
      
      - ISA v3.0B hash must flush the partition and process table caches as
        well.
      
      - ISA v3.0B radix must flush partition and process scoped translations,
        partition and process table caches, and also the page walk cache.
      
      So consolidate the flushing code and implement it in C and inline asm
      under the mm/ directory with the rest of the flush code. Add ISA v3.0B
      cases for radix and hash, and use the radix flush in radix environment.
      
      Provide a way for IS=2 (LPID flush) to specify the radix mode of the
      partition. Have KVM pass in the radix mode of the guest.
      
      Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
      and move it later in the boot process after, the MMU registers are set
      up and before relocation is first turned on.
      
      The TLB flush is no longer called when restoring from deep idle states.
      This was not be done as a separate step because booting secondaries
      uses the same cpu_restore as idle restore, which needs the TLB flush.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4748276
  24. 10 11月, 2017 5 次提交
    • N
      powerpc/64s/radix: Improve TLB flushing for page table freeing · 0b2f5a8a
      Nicholas Piggin 提交于
      Unmaps that free page tables always flush the entire PID, which is
      sub-optimal. Provide TLB range flushing with an additional PWC flush
      that can be use for va range invalidations with PWC flush.
      
           Time to munmap N pages of memory including last level page table
           teardown (after mmap, touch), local invalidate:
           N           1       2      4      8     16     32     64
           vanilla  3.2us  3.3us  3.4us  3.6us  4.1us  5.2us  7.2us
           patched  1.4us  1.5us  1.7us  1.9us  2.6us  3.7us  6.2us
      
           Global invalidate:
           N           1       2      4      8     16      32     64
           vanilla  2.2us  2.3us  2.4us  2.6us  3.2us   4.1us  6.2us
           patched  2.1us  2.5us  3.4us  5.2us  8.7us  15.7us  6.2us
      
      Local invalidates get much better across the board. Global ones have
      the same issue where multiple tlbies for va flush do get slower than
      the single tlbie to invalidate the PID. None of this test captures
      the TLB benefits of avoiding killing everything.
      
      Global gets worse, but it is brought in to line with global invalidate
      for munmap()s that do not free page tables.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0b2f5a8a
    • N
      powerpc/64s/radix: Introduce local single page ceiling for TLB range flush · f6f27951
      Nicholas Piggin 提交于
      The single page flush ceiling is the cut-off point at which we switch
      from invalidating individual pages, to invalidating the entire process
      address space in response to a range flush.
      
      Introduce a local variant of this heuristic because local and global
      tlbie have significantly different properties:
      - Local tlbiel requires 128 instructions to invalidate a PID, global
        tlbie only 1 instruction.
      - Global tlbie instructions are expensive broadcast operations.
      
      The local ceiling has been made much higher, 2x the number of
      instructions required to invalidate the entire PID (i.e., 256 pages).
      
           Time to mprotect N pages of memory (after mmap, touch), local invalidate:
           N           32     34      64     128     256     512
           vanilla  7.4us  9.0us  14.6us  26.4us  50.2us  98.3us
           patched  7.4us  7.8us  13.8us  26.4us  51.9us  98.3us
      
      The behaviour of both is identical at N=32 and N=512. Between there,
      the vanilla kernel does a PID invalidate and the patched kernel does
      a va range invalidate.
      
      At N=128, these require the same number of tlbiel instructions, so
      the patched version can be sen to be cheaper when < 128, and more
      expensive when > 128. However this does not well capture the cost
      of invalidated TLB.
      
      The additional cost at 256 pages does not seem prohibitive. It may
      be the case that increasing the limit further would continue to be
      beneficial to avoid invalidating all of the process's TLB entries.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f6f27951
    • N
      powerpc/64s/radix: Optimize flush_tlb_range · cbf09c83
      Nicholas Piggin 提交于
      Currently for radix, flush_tlb_range flushes the entire PID, because
      the Linux mm code does not tell us about page size here for THP vs
      regular pages. This is quite sub-optimal for small mremap / mprotect
      / change_protection.
      
      So implement va range flushes with two flush passes, one for each
      page size (regular and THP). The second flush has an order of matnitude
      fewer tlbie instructions than the first, so it is a relatively small
      additional cost.
      
      There is still room for improvement here with some changes to generic
      APIs, particularly if there are mostly THP pages to be invalidated,
      the small page flushes could be reduced.
      
      Time to mprotect 1 page of memory (after mmap, touch):
      vanilla 2.9us   1.8us
      patched 1.2us   1.6us
      
      Time to mprotect 30 pages of memory (after mmap, touch):
      vanilla 8.2us   7.2us
      patched 6.9us   17.9us
      
      Time to mprotect 34 pages of memory (after mmap, touch):
      vanilla 9.1us   8.0us
      patched 9.0us   8.0us
      
      34 pages is the point at which the invalidation switches from va
      to entire PID, which tlbie can do in a single instruction. This is
      why in the case of 30 pages, the new code runs slower for this test.
      This is a deliberate tradeoff already present in the unmap and THP
      promotion code, the idea is that the benefit from avoiding flushing
      entire TLB for this PID on all threads in the system.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cbf09c83
    • N
      powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions · d665767e
      Nicholas Piggin 提交于
      Move the barriers and range iteration down into the _tlbie* level,
      which improves readability.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d665767e
    • N
      powerpc/64s/radix: Optimize TLB range flush barriers · 14001c60
      Nicholas Piggin 提交于
      Short range flushes issue a sequences of tlbie(l) instructions for
      individual effective addresses. These do not all require individual
      barrier sequences, only one covering all tlbie(l) instructions.
      
      Commit f7327e0b ("powerpc/mm/radix: Remove unnecessary ptesync")
      made a similar optimization for tlbiel for PID flushing.
      
      For tlbie, the ISA says:
      
          The tlbsync instruction provides an ordering function for the
          effects of all tlbie instructions executed by the thread executing
          the tlbsync instruction, with respect to the memory barrier
          created by a subsequent ptesync instruction executed by the same
          thread.
      
      Time to munmap 30 pages of memory (after mmap, touch):
               local   global
      vanilla  10.9us  22.3us
      patched   3.4us  14.4us
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      14001c60
  25. 06 11月, 2017 1 次提交
    • N
      powerpc/64s/radix: Fix process table entry cache invalidation · 30b49ec7
      Nicholas Piggin 提交于
      According to the architecture, the process table entry cache must be
      flushed with tlbie RIC=2.
      
      Currently the process table entry is set to invalid right before the
      PID is returned to the allocator, with no invalidation. This works on
      existing implementations that are known to not cache the process table
      entry for any except the current PIDR.
      
      It is architecturally correct and cleaner to invalidate with RIC=2
      after clearing the process table entry and before the PID is returned
      to the allocator. This can be done in arch_exit_mmap that runs before
      the final flush, and to ensure the final flush (fullmm) is always a
      RIC=2 variant.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      30b49ec7