1. 13 2月, 2018 3 次提交
  2. 08 2月, 2018 5 次提交
    • B
      powerpc/mm/radix: Split linear mapping on hot-unplug · 4dd5f8a9
      Balbir Singh 提交于
      This patch splits the linear mapping if the hot-unplug range is
      smaller than the mapping size. The code detects if the mapping needs
      to be split into a smaller size and if so, uses the stop machine
      infrastructure to clear the existing mapping and then remap the
      remaining range using a smaller page size.
      
      The code will skip any region of the mapping that overlaps with kernel
      text and warn about it once. We don't want to remove a mapping where
      the kernel text and the LMB we intend to remove overlap in the same
      TLB mapping as it may affect the currently executing code.
      
      I've tested these changes under a kvm guest with 2 vcpus, from a split
      mapping point of view, some of the caveats mentioned above applied to
      the testing I did.
      
      Fixes: 4b5d62ca ("powerpc/mm: add radix__remove_section_mapping()")
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Tweak change log to match updated behaviour]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4dd5f8a9
    • N
      powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID · eeb715c3
      Nicholas Piggin 提交于
      This change restores and formalises the behaviour that access to NULL
      or other user addresses by the kernel during boot should fault rather
      than succeed and modify memory. This was inadvertently broken when
      fixing another bug, because it was previously not well defined and
      only worked by chance.
      
      powerpc/64s/radix uses high address bits to select an address space
      "quadrant", which determines which PID and LPID are used to translate
      the rest of the address (effective PID, effective LPID). The kernel
      mapping at 0xC... selects quadrant 3, which uses PID=0 and LPID=0. So
      the kernel page tables are installed in the PID 0 process table entry.
      
      An address at 0x0... selects quadrant 0, which uses PID=PIDR for
      translating the rest of the address (that is, it uses the value of the
      PIDR register as the effective PID). If PIDR=0, then the translation
      is performed with the PID 0 process table entry page tables. This is
      the kernel mapping, so we effectively get another copy of the kernel
      address space at 0. A NULL pointer access will access physical memory
      address 0.
      
      To prevent duplicating the kernel address space in quadrant 0, this
      patch allocates a guard PID containing no translations, and
      initializes PIDR with this during boot, before the MMU is switched on.
      Any kernel access to quadrant 0 will use this guard PID for
      translation and find no valid mappings, and therefore fault.
      
      After boot, this PID will be switchd away to user context PIDs, but
      those contain user mappings (and usually NULL pointer protection)
      rather than kernel mapping, which is much safer (and by design). It
      may be in future this is tightened further, which the guard PID could
      be used for.
      
      Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before
      setting partition table"), introduced this problem because it zeroes
      PIDR at boot. However previously the value was inherited from firmware
      or kexec, which is not robust and can be zero (e.g., mambo).
      
      Fixes: 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table")
      Cc: stable@vger.kernel.org # v4.15+
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Tested-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eeb715c3
    • N
      powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking · 6cc3f91b
      Nicholas Piggin 提交于
      The soft IRQ masking code has to hard-disable interrupts in cases
      where the exception is not cleared by the masked handler. External
      interrupts used this approach for soft masking. Now recently PMU
      interrupts do the same thing.
      
      The soft IRQ masking code additionally allowed for interrupt handlers
      to hard-enable interrupts after soft-disabling them. The idea is to
      allow PMU interrupts through to profile interrupt handlers.
      
      So when interrupts are being replayed when there is a pending
      interrupt that requires hard-disabling, there is a test to prevent
      those handlers from hard-enabling them if there is a pending external
      interrupt. may_hard_irq_enable() handles this.
      
      After f442d004 ("powerpc/64s: Add support to mask perf interrupts
      and replay them"), may_hard_irq_enable() could prematurely enable
      MSR[EE] when a PMU exception exists, which would result in the
      interrupt firing again while masked, and MSR[EE] being disabled again.
      
      I haven't seen that this could cause a serious problem, but it's
      more consistent to handle these soft-masked interrupts in the same
      way. So introduce a define for all types of interrupts that require
      MSR[EE] masking in their soft-disable handlers, and use that in
      may_hard_irq_enable().
      
      Fixes: f442d004 ("powerpc/64s: Add support to mask perf interrupts and replay them")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6cc3f91b
    • M
      powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro · 5c11d1e5
      Madhavan Srinivasan 提交于
      Commit f14e953b ("powerpc/64s: Add support to take additional
      parameter in MASKABLE_* macro") messed up MASKABLE_RELON_EXCEPTION_HV_OOL
      macro by adding the wrong SOFTEN test which caused guest kernel crash
      at boot. Patch to fix the macro to use SOFTEN_TEST_HV instead of
      SOFTEN_NOTEST_HV.
      
      Fixes: f14e953b ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro")
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Fix-Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5c11d1e5
    • N
      powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove · 1d9a0907
      Nathan Fontenot 提交于
      When DLPAR removing a CPU, the unmapping of the cpu from a node in
      unmap_cpu_from_node() should also invalidate the CPUs entry in the
      numa_cpu_lookup_table. There is not a guarantee that on a subsequent
      DLPAR add of the CPU the associativity will be the same and thus
      could be in a different node. Invalidating the entry in the
      numa_cpu_lookup_table causes the associativity to be read from the
      device tree at the time of the add.
      
      The current behavior of not invalidating the CPUs entry in the
      numa_cpu_lookup_table can result in scenarios where the the topology
      layout of CPUs in the partition does not match the device tree
      or the topology reported by the HMC.
      
      This bug looks like it was introduced in 2004 in the commit titled
      "ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the
      linux-fullhist tree. Hence tag it for all stable releases.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1d9a0907
  3. 06 2月, 2018 2 次提交
    • M
      membarrier: Provide GLOBAL_EXPEDITED command · c5f58bd5
      Mathieu Desnoyers 提交于
      Allow expedited membarrier to be used for data shared between processes
      through shared memory.
      
      Processes wishing to receive the membarriers register with
      MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue
      membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED.
      
      This allows extremely simple kernel-level implementation: we have almost
      everything we need with the PRIVATE_EXPEDITED barrier code. All we need
      to do is to add a flag in the mm_struct that will be used to check
      whether we need to send the IPI to the current thread of each CPU.
      
      There is a slight downside to this approach compared to targeting
      specific shared memory users: when performing a membarrier operation,
      all registered "global" receivers will get the barrier, even if they
      don't share a memory mapping with the sender issuing
      MEMBARRIER_CMD_GLOBAL_EXPEDITED.
      
      This registration approach seems to fit the requirement of not
      disturbing processes that really deeply care about real-time: they
      simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
      
      In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED"
      command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of
      MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward
      compatibility.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c5f58bd5
    • M
      powerpc, membarrier: Skip memory barrier in switch_mm() · 3ccfebed
      Mathieu Desnoyers 提交于
      Allow PowerPC to skip the full memory barrier in switch_mm(), and
      only issue the barrier when scheduling into a task belonging to a
      process that has registered to use expedited private.
      
      Threads targeting the same VM but which belong to different thread
      groups is a tricky case. It has a few consequences:
      
      It turns out that we cannot rely on get_nr_threads(p) to count the
      number of threads using a VM. We can use
      (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
      instead to skip the synchronize_sched() for cases where the VM only has
      a single user, and that user only has a single thread.
      
      It also turns out that we cannot use for_each_thread() to set
      thread flags in all threads using a VM, as it only iterates on the
      thread group.
      
      Therefore, test the membarrier state variable directly rather than
      relying on thread flags. This means
      membarrier_register_private_expedited() needs to set the
      MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
      only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
      private expedited membarrier commands to succeed.
      membarrier_arch_switch_mm() now tests for the
      MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3ccfebed
  4. 01 2月, 2018 3 次提交
  5. 30 1月, 2018 1 次提交
    • M
      powerpc/mm/radix: Fix build error when RADIX_MMU=n · 015eb1b8
      Michael Ellerman 提交于
      The recent TLB flush rework broke the build when the Radix MMU is
      disabled at build time, eg:
      
        (.text+0x264): undefined reference to `.radix__tlbiel_all'
      
      We could add an empty version, but if we ever called it by accident
      that would indicate a bad bug, so add a stub that just WARNs if we do.
      
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      015eb1b8
  6. 28 1月, 2018 4 次提交
    • M
      powerpc/watchdog: Print the NIP in soft_nmi_interrupt() · 0bc00914
      Michael Ellerman 提交于
      When a CPU detects its locked up via soft_nmi_interrupt() we have
      pt_regs, so print the regs->nip, which points to where we took the
      soft-NMI.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0bc00914
    • M
      powerpc/watchdog: regs can't be null in soft_nmi_interrupt() · 3ba45b7e
      Michael Ellerman 提交于
      soft_nmi_interrupt() is called directly from the asm exception
      handling code, which passes regs as a pointer to the stack. So regs
      can't be NULL, it may be full of junk, but that's a separate problem.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3ba45b7e
    • M
      powerpc/watchdog: Tweak watchdog printks · d8fa82e0
      Michael Ellerman 提交于
      Use pr_fmt() in the watchdog code, so we don't have to say "Watchdog"
      so many times.
      
      Rather than "CPU:%d" just spell it "CPU %d", "Hard" doesn't need a
      capital in the middle of a sentence, and "LOCKUP other CPUS" should be
      "LOCKUP on other CPUS".
      
      Also make it clear when a CPU self detects a lockup by spelling it
      out.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d8fa82e0
    • M
      powerpc/cell: Remove axonram driver · 1d65b1c8
      Michael Ellerman 提交于
      The QS21/22 IBM Cell blades had a southbridge chip called Axon. This
      could have DDR DIMMs attached to it, though they were not directly
      usable as RAM, instead they could be used as some sort of buffer, if
      applications were written specifically to use the block device
      provided by the driver.
      
      Although the driver supposedly had direct access support, it was
      apparently never tested (see commit 91117a20 ("axonram: Fix bug in
      direct_access")).
      
      These machines have not been available for over 5 years, and were
      never widely in use. It seems highly unlikely anyone is using this
      driver.
      
      In general we're happy to leave old drivers in the tree, but because
      DAX is involved this driver is caught up in the ongoing work in that
      area, but none of the DAX folks are able to test it.
      
      So remove the driver, if any one *is* using it, we'll be happy to put
      it back.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1d65b1c8
  7. 27 1月, 2018 15 次提交
  8. 25 1月, 2018 2 次提交
  9. 24 1月, 2018 5 次提交