1. 23 3月, 2018 1 次提交
  2. 17 1月, 2018 1 次提交
    • N
      powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 · d4748276
      Nicholas Piggin 提交于
      There are several cases outside the normal address space management
      where a CPU's entire local TLB is to be flushed:
      
        1. Booting the kernel, in case something has left stale entries in
           the TLB (e.g., kexec).
      
        2. Machine check, to clean corrupted TLB entries.
      
      One other place where the TLB is flushed, is waking from deep idle
      states. The flush is a side-effect of calling ->cpu_restore with the
      intention of re-setting various SPRs. The flush itself is unnecessary
      because in the first case, the TLB should not acquire new corrupted
      TLB entries as part of sleep/wake (though they may be lost).
      
      This type of TLB flush is coded inflexibly, several times for each CPU
      type, and they have a number of problems with ISA v3.0B:
      
      - The current radix mode of the MMU is not taken into account, it is
        always done as a hash flushn For IS=2 (LPID-matching flush from host)
        and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
        the R field does not match the current radix mode.
      
      - ISA v3.0B hash must flush the partition and process table caches as
        well.
      
      - ISA v3.0B radix must flush partition and process scoped translations,
        partition and process table caches, and also the page walk cache.
      
      So consolidate the flushing code and implement it in C and inline asm
      under the mm/ directory with the rest of the flush code. Add ISA v3.0B
      cases for radix and hash, and use the radix flush in radix environment.
      
      Provide a way for IS=2 (LPID flush) to specify the radix mode of the
      partition. Have KVM pass in the radix mode of the guest.
      
      Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
      and move it later in the boot process after, the MMU registers are set
      up and before relocation is first turned on.
      
      The TLB flush is no longer called when restoring from deep idle states.
      This was not be done as a separate step because booting secondaries
      uses the same cpu_restore as idle restore, which needs the TLB flush.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4748276
  3. 23 11月, 2017 1 次提交
  4. 02 7月, 2017 2 次提交
  5. 23 6月, 2017 1 次提交
    • B
      powerpc/mm: Trace tlbie(l) instructions · 0428491c
      Balbir Singh 提交于
      Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
      Entry (Local)) instructions.
      
      The tlbie instruction has changed over the years, so not all versions
      accept the same operands. Use the ISA v3 field operands because they are
      the most verbose, we may change them in future.
      
      Example output:
      
        qemu-system-ppc-5371  [016]  1412.369519: tlbie:
        	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Add some missing trace_tlbie()s, reword change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0428491c
  6. 05 4月, 2017 1 次提交
  7. 16 11月, 2016 1 次提交
    • P
      powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format · 6b243fcf
      Paul Mackerras 提交于
      This changes the way that we support the new ISA v3.00 HPTE format.
      Instead of adapting everything that uses HPTE values to handle either
      the old format or the new format, depending on which CPU we are on,
      we now convert explicitly between old and new formats if necessary
      in the low-level routines that actually access HPTEs in memory.
      This limits the amount of code that needs to know about the new
      format and makes the conversions explicit.  This is OK because the
      old format contains all the information that is in the new format.
      
      This also fixes operation under a hypervisor, because the H_ENTER
      hypercall (and other hypercalls that deal with HPTEs) will continue
      to require the HPTE value to be supplied in the old format.  At
      present the kernel will not boot in HPT mode on POWER9 under a
      hypervisor.
      
      This fixes and partially reverts commit 50de596d
      ("powerpc/mm/hash: Add support for Power9 Hash", 2016-04-29).
      
      Fixes: 50de596d ("powerpc/mm/hash: Add support for Power9 Hash")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b243fcf
  8. 14 11月, 2016 1 次提交
    • B
      powerpc/hash64: Be more careful when generating tlbiel · f923efbc
      Balbir Singh 提交于
      In ISA v2.05, the tlbiel instruction takes two arguments, RB and L:
      
      tlbiel RB,L
      
      +---------+---------+----+---------+---------+---------+----+
      |    31   |    /    | L  |    /    |    RB   |   274   | /  |
      | 31 - 26 | 25 - 22 | 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
      +---------+---------+----+---------+---------+---------+----+
      
      In ISA v2.06 tlbiel takes only one argument, RB:
      
      tlbiel RB
      
      +---------+---------+---------+---------+---------+----+
      |    31   |    /    |    /    |    RB   |   274   | /  |
      | 31 - 26 | 25 - 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
      +---------+---------+---------+---------+---------+----+
      
      And in ISA v3.00 tlbiel takes five arguments:
      
      tlbiel RB,RS,RIC,PRS,R
      
      +---------+---------+----+---------+----+----+---------+---------+----+
      |    31   |    RS   | /  |   RIC   |PRS | R  |    RB   |   274   | /  |
      | 31 - 26 | 25 - 21 | 20 | 19 - 18 | 17 | 16 | 15 - 11 |  10 - 1 | 0  |
      +---------+---------+----+---------+----+----+---------+---------+----+
      
      However the assembler also accepts "tlbiel RB", and generates
      "tlbiel RB,r0,0,0,0".
      
      As you can see above the L field from the v2.05 encoding overlaps with the
      reserved field of the v2.06 encoding, and the low bit of the RS field of the
      v3.00 encoding.
      
      Currently in __tlbiel() we generate two tlbiel instructions manually using hex
      constants. In the first case, for MMU_PAGE_4K, we generate "tlbiel RB,0", which
      is safe in all cases, because the L bit is zero.
      
      However in the default case we generate "tlbiel RB,1", therefore setting bit 21
      to 1.
      
      This is not an actual bug on v2.06 processors, because the CPU ignores the value
      of the reserved field. However software is supposed to encode the reserved
      fields as zero to enable forward compatibility.
      
      On v3.00 processors setting bit 21 to 1 and no other bits of RS, means we are
      using r1 for the value of RS.
      
      Although it's not obvious, the code sets the IS field (bits 10-11) to 0 (by
      omission), and L=1, in the va value, which is passed as RB. We also pass R=0 in
      the instruction.
      
      The combination of IS=0, L=1 and R=0 means the value of RS is not used, so even
      on ISA v3.00 there is no actual bug.
      
      We should still fix it, as setting a reserved bit on v2.06 is naughty, and we
      are only avoiding a bug on v3.00 by accident rather than design. Use
      ASM_FTR_IFSET() to generate the single argument form on ISA v2.06 and later, and
      the two argument form on pre v2.06.
      
      Although there may be very old toolchains which don't understand tlbiel, we have
      other code in the tree which has been using tlbiel for over five years, and no
      one has reported any build failures, so just let the assembler generate the
      instructions.
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Rewrite change log, use IFSET instead of IFCLR]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f923efbc
  9. 09 9月, 2016 1 次提交
    • P
      powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c
      Paul Mackerras 提交于
      This replaces a 2-D search through an array with a simple 8-bit table
      lookup for determining the actual and/or base page size for a HPT entry.
      
      The encoding in the second doubleword of the HPTE is designed to encode
      the actual and base page sizes without using any more bits than would be
      needed for a 4k page number, by using between 1 and 8 low-order bits of
      the RPN (real page number) field to encode the page sizes.  A single
      "large page" bit in the first doubleword indicates that these low-order
      bits are to be interpreted like this.
      
      We can determine the page sizes by using the low-order 8 bits of the RPN
      to look up a 256-entry table.  For actual page sizes less than 1MB, some
      of the upper bits of these 8 bits are going to be real address bits, but
      we can cope with that by replicating the entries for those smaller page
      sizes.
      
      While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
      functions from a KVM-specific header to a header for 64-bit HPT systems,
      since this computation doesn't have anything specifically to do with KVM.
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0eeede0c
  10. 04 8月, 2016 1 次提交
    • M
      powerpc/mm: Move register_process_table() out of ppc_md · eea8148c
      Michael Ellerman 提交于
      We want to initialise register_process_table() before ppc_md is setup,
      so that it can be called as part of MMU init (at least on Radix ATM).
      
      That no longer works because probe_machine() requires that ppc_md be
      empty before it's called, and we now do probe_machine() much later.
      
      So make register_process_table a global for now. It will probably move
      into a mmu_radix_ops struct at some point in the future.
      
      This was broken by me when applying commit 7025776e "powerpc/mm:
      Move hash table ops to a separate structure" due to conflicts with other
      patches.
      
      Fixes: 7025776e ("powerpc/mm: Move hash table ops to a separate structure")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eea8148c
  11. 01 8月, 2016 1 次提交
  12. 21 7月, 2016 1 次提交
  13. 17 7月, 2016 2 次提交
  14. 14 6月, 2016 2 次提交
  15. 08 6月, 2016 1 次提交
  16. 01 5月, 2016 1 次提交
    • A
      powerpc/mm/hash: Add support for Power9 Hash · 50de596d
      Aneesh Kumar K.V 提交于
      PowerISA 3.0 adds a parition table indexed by LPID. Parition table
      allows us to specify the MMU model that will be used for guest and host
      translation.
      
      This patch adds support with SLB based hash model (UPRT = 0). What is
      required with this model is to support the new hash page table entry
      format and also setup partition table such that we use hash table for
      address translation.
      
      We don't have segment table support yet.
      
      In order to make sure we don't load KVM module on Power9 (since we don't
      have kvm support yet) this patch also disables KVM on Power9.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      50de596d
  17. 14 12月, 2015 1 次提交
  18. 09 10月, 2015 1 次提交
    • C
      powerpc: Fix checkstop in native_hpte_clear() with lockdep · fdf880a6
      Cyril Bur 提交于
      native_hpte_clear() is called in real mode from two places:
      - Early in boot during htab initialisation if firmware assisted dump is
        active.
      - Late in the kexec path.
      
      In both contexts there is no need to disable interrupts are they are
      already disabled. Furthermore, locking around the tlbie() is only required
      for pre POWER5 hardware.
      
      On POWER5 or newer hardware concurrent tlbie()s work as expected and on pre
      POWER5 hardware concurrent tlbie()s could result in deadlock. This code
      would only be executed at crashdump time, during which all bets are off,
      concurrent tlbie()s are unlikely and taking locks is unsafe therefore the
      best course of action is to simply do nothing. Concurrent tlbie()s are not
      possible in the first case as secondary CPUs have not come up yet.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fdf880a6
  19. 03 6月, 2015 1 次提交
  20. 05 12月, 2014 1 次提交
    • A
      powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688
      Aneesh Kumar K.V 提交于
      upatepp can get called for a nohpte fault when we find from the linux
      page table that the translation was hashed before. In that case
      we are sure that there is no existing translation, hence we could
      avoid doing tlbie.
      
      We could possibly race with a parallel fault filling the TLB. But
      that should be ok because updatepp is only ever relaxing permissions.
      We also look at linux pte permission bits when filling hash pte
      permission bits. We also hold the linux pte busy bits while
      inserting/updating a hashpte entry, hence a paralle update of
      linux pte is not possible. On the other hand mprotect involves
      ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
      
      Performance number:
      We use randbox_access_bench written by Anton.
      
      Kernel with THP disabled and smaller hash page table size.
      
          86.60%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_updatepp
           2.10%  random_access_b  random_access_bench              [.] doit
           1.99%  random_access_b  [kernel.kallsyms]                [k] .do_raw_spin_lock
           1.85%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           1.26%  random_access_b  [kernel.kallsyms]                [k] .native_flush_hash_range
           1.18%  random_access_b  [kernel.kallsyms]                [k] .__delay
           0.69%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           0.37%  random_access_b  [kernel.kallsyms]                [k] .clear_user_page
           0.34%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           0.32%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           0.30%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
      
      With Fix:
      
          27.54%  random_access_b  random_access_bench              [.] doit
          22.90%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           5.76%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           5.20%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           5.12%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           4.80%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
           3.31%  random_access_b  [kernel.kallsyms]                [k] data_access_common
           1.84%  random_access_b  [kernel.kallsyms]                [k] .trace_hardirqs_on_caller
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aefa5688
  21. 02 12月, 2014 2 次提交
  22. 03 11月, 2014 1 次提交
    • C
      powerpc: Replace __get_cpu_var uses · 69111bac
      Christoph Lameter 提交于
      This still has not been merged and now powerpc is the only arch that does
      not have this change. Sorry about missing linuxppc-dev before.
      
      V2->V2
        - Fix up to work against 3.18-rc1
      
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [mpe: Fix build errors caused by set/or_softirq_pending(), and rework
            assignment in __set_breakpoint() to use memcpy().]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69111bac
  23. 08 10月, 2014 1 次提交
    • I
      powerpc/mm: Add hooks for cxl · 4c6d9acc
      Ian Munsie 提交于
      This adds hooks into the core powerpc mm code for cxl.
      
      The core powerpc code sometimes uses local tlbie. Unfortunately this won't
      work with the current cxl driver as it relies on snooping tlbie broadcasts.
      
      The cxl hardware can have TLB entries invalidated via MMIO but this is not
      currently supported by the driver. In future we can make local tlbie smarter so
      that it invalidates cxl contexts via MMIO when it needs to but for now we have
      this workaround.
      
      This workaround checks for any active cxl contexts and if so, disables local
      tlbie.
      
      This also adds a hook for when SLBs are invalidated. This ensures any
      corresponding SLBs in cxl are also invalidated at the same time. This is
      required for segment demotion.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4c6d9acc
  24. 27 8月, 2014 2 次提交
    • T
      Revert "powerpc: Replace __get_cpu_var uses" · 23f66e2d
      Tejun Heo 提交于
      This reverts commit 5828f666 due to
      build failure after merging with pending powerpc changes.
      
      Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.auSigned-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      23f66e2d
    • C
      powerpc: Replace __get_cpu_var uses · 5828f666
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      tj: Folded a fix patch.
          http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5828f666
  25. 13 8月, 2014 2 次提交
  26. 28 4月, 2014 1 次提交
  27. 11 10月, 2013 1 次提交
  28. 24 7月, 2013 2 次提交
  29. 21 6月, 2013 2 次提交
  30. 01 6月, 2013 1 次提交
  31. 30 4月, 2013 2 次提交