1. 27 3月, 2018 4 次提交
  2. 23 1月, 2018 1 次提交
    • N
      powerpc/64s: Improve RFI L1-D cache flush fallback · bdcb1aef
      Nicholas Piggin 提交于
      The fallback RFI flush is used when firmware does not provide a way
      to flush the cache. It's a "displacement flush" that evicts useful
      data by displacing it with an uninteresting buffer.
      
      The flush has to take care to work with implementation specific cache
      replacment policies, so the recipe has been in flux. The initial
      slow but conservative approach is to touch all lines of a congruence
      class, with dependencies between each load. It has since been
      determined that a linear pattern of loads without dependencies is
      sufficient, and is significantly faster.
      
      Measuring the speed of a null syscall with RFI fallback flush enabled
      gives the relative improvement:
      
      P8 - 1.83x
      P9 - 1.75x
      
      The flush also becomes simpler and more adaptable to different cache
      geometries.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bdcb1aef
  3. 19 1月, 2018 3 次提交
  4. 18 1月, 2018 1 次提交
    • N
      powerpc/64s: Relax PACA address limitations · 1af19331
      Nicholas Piggin 提交于
      Book3S PACA memory allocation is restricted by the RMA limit and also
      must not take SLB faults when accessed in virtual mode. Currently a
      fixed 256MB limit is used for this, which is imprecise and sub-optimal.
      
      Update the paca allocation limits to use use the ppc64_rma_size for RMA
      limit, and share the safe_stack_limit() that is currently used for stack
      allocations that must not take virtual mode faults.
      
      The safe_stack_limit() name is changed to ppc64_bolted_size() to match
      ppc64_rma_size and some comments are updated. We also need to use
      early_mmu_has_feature() because we are now calling this function prior
      to the jump label patching that enables mmu_has_feature().
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Change mmu_has_feature() to early_mmu_has_feature()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1af19331
  5. 17 1月, 2018 2 次提交
  6. 10 1月, 2018 2 次提交
    • M
      powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti · bc9c9304
      Michael Ellerman 提交于
      Because there may be some performance overhead of the RFI flush, add
      kernel command line options to disable it.
      
      We add a sensibly named 'no_rfi_flush' option, but we also hijack the
      x86 option 'nopti'. The RFI flush is not the same as KPTI, but if we
      see 'nopti' we can guess that the user is trying to avoid any overhead
      of Meltdown mitigations, and it means we don't have to educate every
      one about a different command line option.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bc9c9304
    • M
      powerpc/64s: Add support for RFI flush of L1-D cache · aa8a5e00
      Michael Ellerman 提交于
      On some CPUs we can prevent the Meltdown vulnerability by flushing the
      L1-D cache on exit from kernel to user mode, and from hypervisor to
      guest.
      
      This is known to be the case on at least Power7, Power8 and Power9. At
      this time we do not know the status of the vulnerability on other CPUs
      such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
      CPUs. As more information comes to light we can enable this, or other
      mechanisms on those CPUs.
      
      The vulnerability occurs when the load of an architecturally
      inaccessible memory region (eg. userspace load of kernel memory) is
      speculatively executed to the point where its result can influence the
      address of a subsequent speculatively executed load.
      
      In order for that to happen, the first load must hit in the L1,
      because before the load is sent to the L2 the permission check is
      performed. Therefore if no kernel addresses hit in the L1 the
      vulnerability can not occur. We can ensure that is the case by
      flushing the L1 whenever we return to userspace. Similarly for
      hypervisor vs guest.
      
      In order to flush the L1-D cache on exit, we add a section of nops at
      each (h)rfi location that returns to a lower privileged context, and
      patch that with some sequence. Newer firmwares are able to advertise
      to us that there is a special nop instruction that flushes the L1-D.
      If we do not see that advertised, we fall back to doing a displacement
      flush in software.
      
      For guest kernels we support migration between some CPU versions, and
      different CPUs may use different flush instructions. So that we are
      prepared to migrate to a machine with a different flush instruction
      activated, we may have to patch more than one flush instruction at
      boot if the hypervisor tells us to.
      
      In the end this patch is mostly the work of Nicholas Piggin and
      Michael Ellerman. However a cast of thousands contributed to analysis
      of the issue, earlier versions of the patch, back ports testing etc.
      Many thanks to all of them.
      Tested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aa8a5e00
  7. 11 12月, 2017 1 次提交
  8. 10 11月, 2017 2 次提交
  9. 31 8月, 2017 2 次提交
  10. 13 7月, 2017 2 次提交
  11. 23 6月, 2017 1 次提交
    • N
      powerpc/64: Initialise thread_info for emergency stacks · 34f19ff1
      Nicholas Piggin 提交于
      Emergency stacks have their thread_info mostly uninitialised, which in
      particular means garbage preempt_count values.
      
      Emergency stack code runs with interrupts disabled entirely, and is
      used very rarely, so this has been unnoticed so far. It was found by a
      proposed new powerpc watchdog that takes a soft-NMI directly from the
      masked_interrupt handler and using the emergency stack. That crashed
      at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
      garbage.
      
      To fix this, zero the entire THREAD_SIZE allocation, and initialize
      the thread_info.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Move it all into setup_64.c, use a function not a macro. Fix
            crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      34f19ff1
  12. 06 6月, 2017 1 次提交
    • M
      powerpc/numa: Fix percpu allocations to be NUMA aware · ba4a648f
      Michael Ellerman 提交于
      In commit 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
      switched to the generic implementation of cpu_to_node(), which uses a percpu
      variable to hold the NUMA node for each CPU.
      
      Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
      of our percpu areas, leading to a chicken and egg problem. In practice what
      happens is when we are setting up the percpu areas, cpu_to_node() reports that
      all CPUs are on node 0, so we allocate all percpu areas on node 0.
      
      This is visible in the dmesg output, as all pcpu allocs being in group 0:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
        pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
        pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
      
      To fix it we need an early_cpu_to_node() which can run prior to percpu being
      setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
      in. With the patch dmesg output shows two groups, 0 and 1:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
        pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
        pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
      
      We can also check the data_offset in the paca of various CPUs, with the fix we
      see:
      
        CPU 0:  data_offset = 0x0ffe8b0000
        CPU 24: data_offset = 0x1ffe5b0000
      
      And we can see from dmesg that CPU 24 has an allocation on node 1:
      
        node   0: [mem 0x0000000000000000-0x0000000fffffffff]
        node   1: [mem 0x0000001000000000-0x0000001fffffffff]
      
      Cc: stable@vger.kernel.org # v3.16+
      Fixes: 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ba4a648f
  13. 09 5月, 2017 1 次提交
  14. 28 4月, 2017 1 次提交
    • N
      powerpc/64s: Dedicated system reset interrupt stack · b1ee8a3d
      Nicholas Piggin 提交于
      The system reset interrupt is used for crash/debug situations, so it is
      desirable to have as little impact on the normal state of the system as
      possible.
      
      Currently it uses the current kernel stack to process the exception.
      This stores into the stack which may be involved with the crash. The
      stack pointer may be corrupted, or it may have overflowed.
      
      Avoid or minimise these problems by creating a dedicated NMI stack for
      the system reset interrupt to use.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b1ee8a3d
  15. 28 3月, 2017 2 次提交
    • B
      powerpc: Disable HFSCR[TM] if TM is not supported · 7ed23e1b
      Benjamin Herrenschmidt 提交于
      On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
      turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
      [Transactional Memory]), but that doesn't take into account that TM
      might be disabled by CPU features, or disabled by the kernel being built
      with CONFIG_PPC_TRANSACTIONAL_MEM=n.
      
      So later in boot, when we have setup the CPU features, clear HSCR[TM] if
      the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
      for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.
      
      Without this a KVM guest might try use TM, even if told not to, and
      cause an oops in the host kernel. Typically the oops is seen in
      __kvmppc_vcore_entry() and may or may not be fatal to the host, but is
      always bad news.
      
      In practice all shipping CPU revisions do support TM, and all host
      kernels we are aware of build with TM support enabled, so no one should
      actually be able to hit this in the wild.
      
      Fixes: 2a3563b0 ("powerpc: Setup in HFSCR for POWER8")
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      [mpe: Rewrite change log with input from Sam, add Fixes/stable]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7ed23e1b
    • M
      powerpc/64: Don't use early_cpu_has_feature() in cpu_ready_for_interrupts() · 5511a45f
      Michael Ellerman 提交于
      cpu_ready_for_interrupts() is called after feature patching, so there's
      no need to use early_cpu_has_feature().
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5511a45f
  16. 06 3月, 2017 1 次提交
  17. 15 2月, 2017 1 次提交
  18. 06 2月, 2017 8 次提交
  19. 30 11月, 2016 1 次提交
  20. 15 11月, 2016 1 次提交
  21. 10 8月, 2016 1 次提交
  22. 01 8月, 2016 1 次提交