1. 23 3月, 2018 5 次提交
    • P
      powerpc/powernv: Provide a way to force a core into SMT4 mode · 7672691a
      Paul Mackerras 提交于
      POWER9 processors up to and including "Nimbus" v2.2 have hardware
      bugs relating to transactional memory and thread reconfiguration.
      One of these bugs has a workaround which is to get the core into
      SMT4 state temporarily.  This workaround is only needed when
      running bare-metal.
      
      This patch provides a function which gets the core into SMT4 mode
      by preventing threads from going to a stop state, and waking up
      those which are already in a stop state.  Once at least 3 threads
      are not in a stop state, the core will be in SMT4 and we can
      continue.
      
      To do this, we add a "dont_stop" flag to the paca to tell the
      thread not to go into a stop state.  If this flag is set,
      power9_idle_stop() just returns immediately with a return value
      of 0.  The pnv_power9_force_smt4_catch() function does the following:
      
      1. Set the dont_stop flag for each thread in the core, except
         ourselves (in fact we use an atomic_inc() in case more than
         one thread is calling this function concurrently).
      2. See how many threads are awake, indicated by their
         requested_psscr field in the paca being 0.  If this is at
         least 3, skip to step 5.
      3. Send a doorbell interrupt to each thread that was seen as
         being in a stop state in step 2.
      4. Until at least 3 threads are awake, scan the threads to which
         we sent a doorbell interrupt and check if they are awake now.
      
      This relies on the following properties:
      
      - Once dont_stop is non-zero, requested_psccr can't go from zero to
        non-zero, except transiently (and without the thread doing stop).
      - requested_psscr being zero guarantees that the thread isn't in
        a state-losing stop state where thread reconfiguration could occur.
      - Doing stop with a PSSCR value of 0 won't be a state-losing stop
        and thus won't allow thread reconfiguration.
      - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
        must be in SMT4 mode, since SMT modes are powers of 2.
      
      This does add a sync to power9_idle_stop(), which is necessary to
      provide the correct ordering between setting requested_psscr and
      checking dont_stop.  The overhead of the sync should be unnoticeable
      compared to the latency of going into and out of a stop state.
      
      Because some objected to incurring this extra latency on systems where
      the XER[SO] bug is not relevant, I have put the test in
      power9_idle_stop inside a feature section.  This means that
      pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
      without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
      probably hang the system.
      
      In order to cater for uses where the caller has an operation that
      has to be done while the core is in SMT4, the core continues to be
      kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
      until the pnv_power9_force_smt4_release() function is called.
      It undoes the effect of step 1 above and allows the other threads
      to go into a stop state.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7672691a
    • P
      powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2 · b5af4f27
      Paul Mackerras 提交于
      This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
      processors which will be used to enable the hypervisor to assist
      hardware with the handling of checkpointed register values while the
      CPU is in suspend state, in order to work around hardware bugs.  The
      hardware assistance for these workarounds introduced a new hardware
      bug relating to the XER[SO] bit.  We add a separate feature bit for
      this bug in case future chips fix it while still requiring the
      hypervisor assistance with suspend state.
      
      When the dt_cpu_ftrs subsystem is in use, the software assistance can
      be enabled using a "tm-suspend-hypervisor-assist" node in the device
      tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for
      the XER[SO] bug.  In the absence of such nodes, a quirk enables both
      for POWER9 "Nimbus" DD2.2 processors.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b5af4f27
    • P
      powerpc: Free up CPU feature bits on 64-bit machines · 9bbf0b57
      Paul Mackerras 提交于
      This moves all the CPU feature bits that are only used on 32-bit
      machines to the top 20 bits of the CPU feature word and arranges
      for them to be defined only in 32-bit builds.  The features that
      are common to 32-bit and 64-bit machines are moved to bits 0-11
      of the CPU feature word.  This means that for 64-bit platforms,
      bits 44-63 can now be used for new features that only exist on
      64-bit machines.  (These bit numbers are counting from the right,
      i.e. the LSB is bit 0.)
      
      Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high
      16 bits, we have to adjust some assembly code.  Also, CPU_FTR_EMB_HV
      moved from the high 16 bits to the low 16 bits.
      
      Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only
      64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is
      a CPU execution mode as opposed to being a page attribute.
      
      With this we now have 20 free CPU feature bits on 64-bit machines.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9bbf0b57
    • P
      powerpc: Book E: Remove unused CPU_FTR_L2CSR bit · dd0efb3f
      Paul Mackerras 提交于
      The CPU_FTR_L2CSR bit is never tested anywhere, so let's reclaim the
      bit.
      
      The last usage was removed in 86d63363 ("powerpc/e500mc: Remove
      dead L2 flushing code in idle_e500.S") (Jun 2015).
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dd0efb3f
    • P
      powerpc: Use feature bit for RTC presence rather than timebase presence · c0d64cf9
      Paul Mackerras 提交于
      All PowerPC CPUs other than the original PPC601 have a timebase
      register rather than the "real-time clock" (RTC) register that the
      PPC601 (and the original POWER and POWER2 CPUs) had.  Currently
      we have a CPU feature bit to indicate the presence of the timebase,
      but it makes more sense to use a bit to indicate the unusual
      situation rather than the common situation.  This therefore defines
      a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and
      arranges for it to be set on PPC601 systems.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c0d64cf9
  2. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  3. 11 2月, 2018 1 次提交
  4. 09 2月, 2018 4 次提交
    • J
      KVM: PPC: Book3S: Add MMIO emulation for VMX instructions · 09f98496
      Jose Ricardo Ziviani 提交于
      This patch provides the MMIO load/store vector indexed
      X-Form emulation.
      
      Instructions implemented:
      lvx: the quadword in storage addressed by the result of EA &
      0xffff_ffff_ffff_fff0 is loaded into VRT.
      
      stvx: the contents of VRS are stored into the quadword in storage
      addressed by the result of EA & 0xffff_ffff_ffff_fff0.
      Reported-by: NGopesh Kumar Chaudhary <gopchaud@in.ibm.com>
      Reported-by: NBalamuruhan S <bala24@linux.vnet.ibm.com>
      Signed-off-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      09f98496
    • A
      KVM: PPC: Book3S HV: Branch inside feature section · d20fe50a
      Alexander Graf 提交于
      We ended up with code that did a conditional branch inside a feature
      section to code outside of the feature section. Depending on how the
      object file gets organized, that might mean we exceed the 14bit
      relocation limit for conditional branches:
      
        arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4
      
      So instead of doing a conditional branch outside of the feature section,
      let's just jump at the end of the same, making the branch very short.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d20fe50a
    • D
      KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 · 790a9df5
      David Gibson 提交于
      This adds code to enable the HPT resizing code to work on POWER9,
      which uses a slightly modified HPT entry format compared to POWER8.
      On POWER9, we convert HPTEs read from the HPT from the new format to
      the old format so that the rest of the HPT resizing code can work as
      before.  HPTEs written to the new HPT are converted to the new format
      as the last step before writing them into the new HPT.
      
      This takes out the checks added by commit bcd3bb63 ("KVM: PPC:
      Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18),
      now that HPT resizing works on POWER9.
      
      On POWER9, when we pivot to the new HPT, we now call
      kvmppc_setup_partition_table() to update the partition table in order
      to make the hardware use the new HPT.
      
      [paulus@ozlabs.org - added kvmppc_setup_partition_table() call,
       wrote commit message.]
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      790a9df5
    • P
      KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · 05f2bb03
      Paul Mackerras 提交于
      This fixes the computation of the HPTE index to use when the HPT
      resizing code encounters a bolted HPTE which is stored in its
      secondary HPTE group.  The code inverts the HPTE group number, which
      is correct, but doesn't then mask it with new_hash_mask.  As a result,
      new_pteg will be effectively negative, resulting in new_hptep
      pointing before the new HPT, which will corrupt memory.
      
      In addition, this removes two BUG_ON statements.  The condition that
      the BUG_ONs were testing -- that we have computed the hash value
      incorrectly -- has never been observed in testing, and if it did
      occur, would only affect the guest, not the host.  Given that
      BUG_ON should only be used in conditions where the kernel (i.e.
      the host kernel, in this case) can't possibly continue execution,
      it is not appropriate here.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      05f2bb03
  5. 08 2月, 2018 1 次提交
  6. 06 2月, 2018 2 次提交
    • M
      membarrier: Provide GLOBAL_EXPEDITED command · c5f58bd5
      Mathieu Desnoyers 提交于
      Allow expedited membarrier to be used for data shared between processes
      through shared memory.
      
      Processes wishing to receive the membarriers register with
      MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue
      membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED.
      
      This allows extremely simple kernel-level implementation: we have almost
      everything we need with the PRIVATE_EXPEDITED barrier code. All we need
      to do is to add a flag in the mm_struct that will be used to check
      whether we need to send the IPI to the current thread of each CPU.
      
      There is a slight downside to this approach compared to targeting
      specific shared memory users: when performing a membarrier operation,
      all registered "global" receivers will get the barrier, even if they
      don't share a memory mapping with the sender issuing
      MEMBARRIER_CMD_GLOBAL_EXPEDITED.
      
      This registration approach seems to fit the requirement of not
      disturbing processes that really deeply care about real-time: they
      simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
      
      In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED"
      command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of
      MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward
      compatibility.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c5f58bd5
    • M
      powerpc, membarrier: Skip memory barrier in switch_mm() · 3ccfebed
      Mathieu Desnoyers 提交于
      Allow PowerPC to skip the full memory barrier in switch_mm(), and
      only issue the barrier when scheduling into a task belonging to a
      process that has registered to use expedited private.
      
      Threads targeting the same VM but which belong to different thread
      groups is a tricky case. It has a few consequences:
      
      It turns out that we cannot rely on get_nr_threads(p) to count the
      number of threads using a VM. We can use
      (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
      instead to skip the synchronize_sched() for cases where the VM only has
      a single user, and that user only has a single thread.
      
      It also turns out that we cannot use for_each_thread() to set
      thread flags in all threads using a VM, as it only iterates on the
      thread group.
      
      Therefore, test the membarrier state variable directly rather than
      relying on thread flags. This means
      membarrier_register_private_expedited() needs to set the
      MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
      only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
      private expedited membarrier commands to succeed.
      membarrier_arch_switch_mm() now tests for the
      MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3ccfebed
  7. 01 2月, 2018 5 次提交
  8. 30 1月, 2018 1 次提交
    • M
      powerpc/mm/radix: Fix build error when RADIX_MMU=n · 015eb1b8
      Michael Ellerman 提交于
      The recent TLB flush rework broke the build when the Radix MMU is
      disabled at build time, eg:
      
        (.text+0x264): undefined reference to `.radix__tlbiel_all'
      
      We could add an empty version, but if we ever called it by accident
      that would indicate a bad bug, so add a stub that just WARNs if we do.
      
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      015eb1b8
  9. 28 1月, 2018 4 次提交
    • M
      powerpc/watchdog: Print the NIP in soft_nmi_interrupt() · 0bc00914
      Michael Ellerman 提交于
      When a CPU detects its locked up via soft_nmi_interrupt() we have
      pt_regs, so print the regs->nip, which points to where we took the
      soft-NMI.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0bc00914
    • M
      powerpc/watchdog: regs can't be null in soft_nmi_interrupt() · 3ba45b7e
      Michael Ellerman 提交于
      soft_nmi_interrupt() is called directly from the asm exception
      handling code, which passes regs as a pointer to the stack. So regs
      can't be NULL, it may be full of junk, but that's a separate problem.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3ba45b7e
    • M
      powerpc/watchdog: Tweak watchdog printks · d8fa82e0
      Michael Ellerman 提交于
      Use pr_fmt() in the watchdog code, so we don't have to say "Watchdog"
      so many times.
      
      Rather than "CPU:%d" just spell it "CPU %d", "Hard" doesn't need a
      capital in the middle of a sentence, and "LOCKUP other CPUS" should be
      "LOCKUP on other CPUS".
      
      Also make it clear when a CPU self detects a lockup by spelling it
      out.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d8fa82e0
    • M
      powerpc/cell: Remove axonram driver · 1d65b1c8
      Michael Ellerman 提交于
      The QS21/22 IBM Cell blades had a southbridge chip called Axon. This
      could have DDR DIMMs attached to it, though they were not directly
      usable as RAM, instead they could be used as some sort of buffer, if
      applications were written specifically to use the block device
      provided by the driver.
      
      Although the driver supposedly had direct access support, it was
      apparently never tested (see commit 91117a20 ("axonram: Fix bug in
      direct_access")).
      
      These machines have not been available for over 5 years, and were
      never widely in use. It seems highly unlikely anyone is using this
      driver.
      
      In general we're happy to leave old drivers in the tree, but because
      DAX is involved this driver is caught up in the ongoing work in that
      area, but none of the DAX folks are able to test it.
      
      So remove the driver, if any one *is* using it, we'll be happy to put
      it back.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1d65b1c8
  10. 27 1月, 2018 15 次提交
  11. 25 1月, 2018 1 次提交