1. 14 11月, 2018 1 次提交
  2. 30 7月, 2018 1 次提交
  3. 19 6月, 2018 1 次提交
  4. 03 5月, 2018 2 次提交
    • N
      powerpc64/ftrace: Delay enabling ftrace on secondary cpus · d1039786
      Naveen N. Rao 提交于
      On the boot cpu, though we enable paca->ftrace_enabled in early_setup()
      (via cpu_ready_for_interrupts()), we don't start tracing until much
      later since ftrace is not initialized yet and since we only support
      DYNAMIC_FTRACE on powerpc. However, it is possible that ftrace has been
      initialized by the time some of the secondary cpus start up. In this
      case, we will try to trace some of the early boot code which can cause
      problems.
      
      To address this, move setting paca->ftrace_enabled from
      cpu_ready_for_interrupts() to early_setup() for the boot cpu, and towards
      the end of start_secondary() for secondary cpus.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d1039786
    • N
      powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths · ea678ac6
      Naveen N. Rao 提交于
      We have some C code that we call into from real mode where we cannot
      take any exceptions. Though the C functions themselves are mostly safe,
      if these functions are traced, there is a possibility that we may take
      an exception. For instance, in certain conditions, the ftrace code uses
      WARN(), which uses a 'trap' to do its job.
      
      For such scenarios, introduce a new field in paca 'ftrace_enabled',
      which is checked on ftrace entry before continuing. This field can then
      be set to zero to disable/pause ftrace, and set to a non-zero value to
      resume ftrace.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea678ac6
  5. 17 4月, 2018 1 次提交
  6. 10 4月, 2018 1 次提交
    • M
      powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() · 501a78cb
      Michael Ellerman 提交于
      The recent LPM changes to setup_rfi_flush() are causing some section
      mismatch warnings because we removed the __init annotation on
      setup_rfi_flush():
      
        The function setup_rfi_flush() references
        the function __init ppc64_bolted_size().
        the function __init memblock_alloc_base().
      
      The references are actually in init_fallback_flush(), but that is
      inlined into setup_rfi_flush().
      
      These references are safe because:
       - only pseries calls setup_rfi_flush() at runtime
       - pseries always passes L1D_FLUSH_FALLBACK at boot
       - so the fallback flush area will always be allocated
       - so the check in init_fallback_flush() will always return early:
         /* Only allocate the fallback flush area once (at boot time). */
         if (l1d_flush_fallback_area)
         	return;
      
       - and therefore we won't actually call the freed init routines.
      
      We should rework the code to make it safer by default rather than
      relying on the above, but for now as a quick-fix just add a __ref
      annotation to squash the warning.
      
      Fixes: abf110f3 ("powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      501a78cb
  7. 30 3月, 2018 4 次提交
  8. 27 3月, 2018 4 次提交
  9. 23 1月, 2018 1 次提交
    • N
      powerpc/64s: Improve RFI L1-D cache flush fallback · bdcb1aef
      Nicholas Piggin 提交于
      The fallback RFI flush is used when firmware does not provide a way
      to flush the cache. It's a "displacement flush" that evicts useful
      data by displacing it with an uninteresting buffer.
      
      The flush has to take care to work with implementation specific cache
      replacment policies, so the recipe has been in flux. The initial
      slow but conservative approach is to touch all lines of a congruence
      class, with dependencies between each load. It has since been
      determined that a linear pattern of loads without dependencies is
      sufficient, and is significantly faster.
      
      Measuring the speed of a null syscall with RFI fallback flush enabled
      gives the relative improvement:
      
      P8 - 1.83x
      P9 - 1.75x
      
      The flush also becomes simpler and more adaptable to different cache
      geometries.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bdcb1aef
  10. 19 1月, 2018 3 次提交
  11. 18 1月, 2018 1 次提交
    • N
      powerpc/64s: Relax PACA address limitations · 1af19331
      Nicholas Piggin 提交于
      Book3S PACA memory allocation is restricted by the RMA limit and also
      must not take SLB faults when accessed in virtual mode. Currently a
      fixed 256MB limit is used for this, which is imprecise and sub-optimal.
      
      Update the paca allocation limits to use use the ppc64_rma_size for RMA
      limit, and share the safe_stack_limit() that is currently used for stack
      allocations that must not take virtual mode faults.
      
      The safe_stack_limit() name is changed to ppc64_bolted_size() to match
      ppc64_rma_size and some comments are updated. We also need to use
      early_mmu_has_feature() because we are now calling this function prior
      to the jump label patching that enables mmu_has_feature().
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Change mmu_has_feature() to early_mmu_has_feature()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1af19331
  12. 17 1月, 2018 2 次提交
  13. 10 1月, 2018 2 次提交
    • M
      powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti · bc9c9304
      Michael Ellerman 提交于
      Because there may be some performance overhead of the RFI flush, add
      kernel command line options to disable it.
      
      We add a sensibly named 'no_rfi_flush' option, but we also hijack the
      x86 option 'nopti'. The RFI flush is not the same as KPTI, but if we
      see 'nopti' we can guess that the user is trying to avoid any overhead
      of Meltdown mitigations, and it means we don't have to educate every
      one about a different command line option.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bc9c9304
    • M
      powerpc/64s: Add support for RFI flush of L1-D cache · aa8a5e00
      Michael Ellerman 提交于
      On some CPUs we can prevent the Meltdown vulnerability by flushing the
      L1-D cache on exit from kernel to user mode, and from hypervisor to
      guest.
      
      This is known to be the case on at least Power7, Power8 and Power9. At
      this time we do not know the status of the vulnerability on other CPUs
      such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
      CPUs. As more information comes to light we can enable this, or other
      mechanisms on those CPUs.
      
      The vulnerability occurs when the load of an architecturally
      inaccessible memory region (eg. userspace load of kernel memory) is
      speculatively executed to the point where its result can influence the
      address of a subsequent speculatively executed load.
      
      In order for that to happen, the first load must hit in the L1,
      because before the load is sent to the L2 the permission check is
      performed. Therefore if no kernel addresses hit in the L1 the
      vulnerability can not occur. We can ensure that is the case by
      flushing the L1 whenever we return to userspace. Similarly for
      hypervisor vs guest.
      
      In order to flush the L1-D cache on exit, we add a section of nops at
      each (h)rfi location that returns to a lower privileged context, and
      patch that with some sequence. Newer firmwares are able to advertise
      to us that there is a special nop instruction that flushes the L1-D.
      If we do not see that advertised, we fall back to doing a displacement
      flush in software.
      
      For guest kernels we support migration between some CPU versions, and
      different CPUs may use different flush instructions. So that we are
      prepared to migrate to a machine with a different flush instruction
      activated, we may have to patch more than one flush instruction at
      boot if the hypervisor tells us to.
      
      In the end this patch is mostly the work of Nicholas Piggin and
      Michael Ellerman. However a cast of thousands contributed to analysis
      of the issue, earlier versions of the patch, back ports testing etc.
      Many thanks to all of them.
      Tested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aa8a5e00
  14. 11 12月, 2017 1 次提交
  15. 10 11月, 2017 2 次提交
  16. 31 8月, 2017 2 次提交
  17. 13 7月, 2017 2 次提交
  18. 23 6月, 2017 1 次提交
    • N
      powerpc/64: Initialise thread_info for emergency stacks · 34f19ff1
      Nicholas Piggin 提交于
      Emergency stacks have their thread_info mostly uninitialised, which in
      particular means garbage preempt_count values.
      
      Emergency stack code runs with interrupts disabled entirely, and is
      used very rarely, so this has been unnoticed so far. It was found by a
      proposed new powerpc watchdog that takes a soft-NMI directly from the
      masked_interrupt handler and using the emergency stack. That crashed
      at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
      garbage.
      
      To fix this, zero the entire THREAD_SIZE allocation, and initialize
      the thread_info.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Move it all into setup_64.c, use a function not a macro. Fix
            crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      34f19ff1
  19. 06 6月, 2017 1 次提交
    • M
      powerpc/numa: Fix percpu allocations to be NUMA aware · ba4a648f
      Michael Ellerman 提交于
      In commit 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
      switched to the generic implementation of cpu_to_node(), which uses a percpu
      variable to hold the NUMA node for each CPU.
      
      Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
      of our percpu areas, leading to a chicken and egg problem. In practice what
      happens is when we are setting up the percpu areas, cpu_to_node() reports that
      all CPUs are on node 0, so we allocate all percpu areas on node 0.
      
      This is visible in the dmesg output, as all pcpu allocs being in group 0:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
        pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
        pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
      
      To fix it we need an early_cpu_to_node() which can run prior to percpu being
      setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
      in. With the patch dmesg output shows two groups, 0 and 1:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
        pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
        pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
      
      We can also check the data_offset in the paca of various CPUs, with the fix we
      see:
      
        CPU 0:  data_offset = 0x0ffe8b0000
        CPU 24: data_offset = 0x1ffe5b0000
      
      And we can see from dmesg that CPU 24 has an allocation on node 1:
      
        node   0: [mem 0x0000000000000000-0x0000000fffffffff]
        node   1: [mem 0x0000001000000000-0x0000001fffffffff]
      
      Cc: stable@vger.kernel.org # v3.16+
      Fixes: 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ba4a648f
  20. 09 5月, 2017 1 次提交
  21. 28 4月, 2017 1 次提交
    • N
      powerpc/64s: Dedicated system reset interrupt stack · b1ee8a3d
      Nicholas Piggin 提交于
      The system reset interrupt is used for crash/debug situations, so it is
      desirable to have as little impact on the normal state of the system as
      possible.
      
      Currently it uses the current kernel stack to process the exception.
      This stores into the stack which may be involved with the crash. The
      stack pointer may be corrupted, or it may have overflowed.
      
      Avoid or minimise these problems by creating a dedicated NMI stack for
      the system reset interrupt to use.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b1ee8a3d
  22. 28 3月, 2017 2 次提交
    • B
      powerpc: Disable HFSCR[TM] if TM is not supported · 7ed23e1b
      Benjamin Herrenschmidt 提交于
      On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
      turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
      [Transactional Memory]), but that doesn't take into account that TM
      might be disabled by CPU features, or disabled by the kernel being built
      with CONFIG_PPC_TRANSACTIONAL_MEM=n.
      
      So later in boot, when we have setup the CPU features, clear HSCR[TM] if
      the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
      for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.
      
      Without this a KVM guest might try use TM, even if told not to, and
      cause an oops in the host kernel. Typically the oops is seen in
      __kvmppc_vcore_entry() and may or may not be fatal to the host, but is
      always bad news.
      
      In practice all shipping CPU revisions do support TM, and all host
      kernels we are aware of build with TM support enabled, so no one should
      actually be able to hit this in the wild.
      
      Fixes: 2a3563b0 ("powerpc: Setup in HFSCR for POWER8")
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      [mpe: Rewrite change log with input from Sam, add Fixes/stable]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7ed23e1b
    • M
      powerpc/64: Don't use early_cpu_has_feature() in cpu_ready_for_interrupts() · 5511a45f
      Michael Ellerman 提交于
      cpu_ready_for_interrupts() is called after feature patching, so there's
      no need to use early_cpu_has_feature().
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5511a45f
  23. 06 3月, 2017 1 次提交
  24. 15 2月, 2017 1 次提交
  25. 06 2月, 2017 1 次提交