1. 23 3月, 2018 5 次提交
    • P
      KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 · 4bb3c7a0
      Paul Mackerras 提交于
      POWER9 has hardware bugs relating to transactional memory and thread
      reconfiguration (changes to hardware SMT mode).  Specifically, the core
      does not have enough storage to store a complete checkpoint of all the
      architected state for all four threads.  The DD2.2 version of POWER9
      includes hardware modifications designed to allow hypervisor software
      to implement workarounds for these problems.  This patch implements
      those workarounds in KVM code so that KVM guests see a full, working
      transactional memory implementation.
      
      The problems center around the use of TM suspended state, where the
      CPU has a checkpointed state but execution is not transactional.  The
      workaround is to implement a "fake suspend" state, which looks to the
      guest like suspended state but the CPU does not store a checkpoint.
      In this state, any instruction that would cause a transition to
      transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
      checkpointed state (treclaim) causes a "soft patch" interrupt (vector
      0x1500) to the hypervisor so that it can be emulated.  The trechkpt
      instruction also causes a soft patch interrupt.
      
      On POWER9 DD2.2, we avoid returning to the guest in any state which
      would require a checkpoint to be present.  The trechkpt in the guest
      entry path which would normally create that checkpoint is replaced by
      either a transition to fake suspend state, if the guest is in suspend
      state, or a rollback to the pre-transactional state if the guest is in
      transactional state.  Fake suspend state is indicated by a flag in the
      PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
      reads back as 0.
      
      On exit from the guest, if the guest is in fake suspend state, we still
      do the treclaim instruction as we would in real suspend state, in order
      to get into non-transactional state, but we do not save the resulting
      register state since there was no checkpoint.
      
      Emulation of the instructions that cause a softpatch interrupt is
      handled in two paths.  If the guest is in real suspend mode, we call
      kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
      transitioning to transactional state.  This is called before we do the
      treclaim in the guest exit path; because we haven't done treclaim, we
      can get back to the guest with the transaction still active.  If the
      instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
      handle, or if the guest is in fake suspend state, then we proceed to
      do the complete guest exit path and subsequently call
      kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
      all the cases including the cases that generate program interrupts
      (illegal instruction or TM Bad Thing) and facility unavailable
      interrupts.
      
      The emulation is reasonably straightforward and is mostly concerned
      with checking for exception conditions and updating the state of
      registers such as MSR and CR0.  The treclaim emulation takes care to
      ensure that the TEXASR register gets updated as if it were the guest
      treclaim instruction that had done failure recording, not the treclaim
      done in hypervisor state in the guest exit path.
      
      With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
      transactional memory is not available to host userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4bb3c7a0
    • P
      powerpc/powernv: Provide a way to force a core into SMT4 mode · 7672691a
      Paul Mackerras 提交于
      POWER9 processors up to and including "Nimbus" v2.2 have hardware
      bugs relating to transactional memory and thread reconfiguration.
      One of these bugs has a workaround which is to get the core into
      SMT4 state temporarily.  This workaround is only needed when
      running bare-metal.
      
      This patch provides a function which gets the core into SMT4 mode
      by preventing threads from going to a stop state, and waking up
      those which are already in a stop state.  Once at least 3 threads
      are not in a stop state, the core will be in SMT4 and we can
      continue.
      
      To do this, we add a "dont_stop" flag to the paca to tell the
      thread not to go into a stop state.  If this flag is set,
      power9_idle_stop() just returns immediately with a return value
      of 0.  The pnv_power9_force_smt4_catch() function does the following:
      
      1. Set the dont_stop flag for each thread in the core, except
         ourselves (in fact we use an atomic_inc() in case more than
         one thread is calling this function concurrently).
      2. See how many threads are awake, indicated by their
         requested_psscr field in the paca being 0.  If this is at
         least 3, skip to step 5.
      3. Send a doorbell interrupt to each thread that was seen as
         being in a stop state in step 2.
      4. Until at least 3 threads are awake, scan the threads to which
         we sent a doorbell interrupt and check if they are awake now.
      
      This relies on the following properties:
      
      - Once dont_stop is non-zero, requested_psccr can't go from zero to
        non-zero, except transiently (and without the thread doing stop).
      - requested_psscr being zero guarantees that the thread isn't in
        a state-losing stop state where thread reconfiguration could occur.
      - Doing stop with a PSSCR value of 0 won't be a state-losing stop
        and thus won't allow thread reconfiguration.
      - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
        must be in SMT4 mode, since SMT modes are powers of 2.
      
      This does add a sync to power9_idle_stop(), which is necessary to
      provide the correct ordering between setting requested_psscr and
      checking dont_stop.  The overhead of the sync should be unnoticeable
      compared to the latency of going into and out of a stop state.
      
      Because some objected to incurring this extra latency on systems where
      the XER[SO] bug is not relevant, I have put the test in
      power9_idle_stop inside a feature section.  This means that
      pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
      without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
      probably hang the system.
      
      In order to cater for uses where the caller has an operation that
      has to be done while the core is in SMT4, the core continues to be
      kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
      until the pnv_power9_force_smt4_release() function is called.
      It undoes the effect of step 1 above and allows the other threads
      to go into a stop state.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7672691a
    • P
      powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2 · b5af4f27
      Paul Mackerras 提交于
      This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
      processors which will be used to enable the hypervisor to assist
      hardware with the handling of checkpointed register values while the
      CPU is in suspend state, in order to work around hardware bugs.  The
      hardware assistance for these workarounds introduced a new hardware
      bug relating to the XER[SO] bit.  We add a separate feature bit for
      this bug in case future chips fix it while still requiring the
      hypervisor assistance with suspend state.
      
      When the dt_cpu_ftrs subsystem is in use, the software assistance can
      be enabled using a "tm-suspend-hypervisor-assist" node in the device
      tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for
      the XER[SO] bug.  In the absence of such nodes, a quirk enables both
      for POWER9 "Nimbus" DD2.2 processors.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b5af4f27
    • P
      powerpc: Free up CPU feature bits on 64-bit machines · 9bbf0b57
      Paul Mackerras 提交于
      This moves all the CPU feature bits that are only used on 32-bit
      machines to the top 20 bits of the CPU feature word and arranges
      for them to be defined only in 32-bit builds.  The features that
      are common to 32-bit and 64-bit machines are moved to bits 0-11
      of the CPU feature word.  This means that for 64-bit platforms,
      bits 44-63 can now be used for new features that only exist on
      64-bit machines.  (These bit numbers are counting from the right,
      i.e. the LSB is bit 0.)
      
      Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high
      16 bits, we have to adjust some assembly code.  Also, CPU_FTR_EMB_HV
      moved from the high 16 bits to the low 16 bits.
      
      Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only
      64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is
      a CPU execution mode as opposed to being a page attribute.
      
      With this we now have 20 free CPU feature bits on 64-bit machines.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9bbf0b57
    • P
      powerpc: Use feature bit for RTC presence rather than timebase presence · c0d64cf9
      Paul Mackerras 提交于
      All PowerPC CPUs other than the original PPC601 have a timebase
      register rather than the "real-time clock" (RTC) register that the
      PPC601 (and the original POWER and POWER2 CPUs) had.  Currently
      we have a CPU feature bit to indicate the presence of the timebase,
      but it makes more sense to use a bit to indicate the unusual
      situation rather than the common situation.  This therefore defines
      a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and
      arranges for it to be set on PPC601 systems.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c0d64cf9
  2. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  3. 11 2月, 2018 1 次提交
  4. 28 1月, 2018 3 次提交
  5. 27 1月, 2018 5 次提交
  6. 23 1月, 2018 3 次提交
  7. 22 1月, 2018 4 次提交
    • N
      powerpc/pseries, ps3: panic flush kernel messages before halting system · 35adacd6
      Nicholas Piggin 提交于
      Platforms with a panic handler that halts the system can have problems
      getting kernel messages out, because the panic notifiers are called
      before kernel/panic.c does its flushing of printk buffers an console
      etc.
      
      This was attempted to be solved with commit a3b2cb30 ("powerpc: Do
      not call ppc_md.panic in fadump panic notifier"), but that wasn't the
      right approach and caused other problems, and was reverted by commit
      ab9dbf77.
      
      Instead, the powernv shutdown paths have already had a similar
      problem, fixed by taking the message flushing sequence from
      kernel/panic.c. That's a little bit ugly, but while we have the code
      duplicated, it will work for this case as well. So have ppc panic
      handlers do the same flushing before they terminate.
      
      Without this patch, a qemu pseries_le_defconfig guest stops silently
      when issued the nmi command when xmon is off and no crash dumpers
      enabled. Afterwards, an oops is printed by each CPU as expected.
      
      Fixes: ab9dbf77 ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      35adacd6
    • G
      powerpc/tm: Fix endianness flip on trap · 1c200e63
      Gustavo Romero 提交于
      Currently it's possible that a thread on PPC64 LE has its endianness
      flipped inadvertently to Big-Endian resulting in a crash once the process
      is back from the signal handler.
      
      If giveup_all() is called when regs->msr has the bits MSR.FP and MSR.VEC
      disabled (and hence MSR.VSX disabled too) it returns without calling
      check_if_tm_restore_required() which copies regs->msr to ckpt_regs->msr if
      the process caught a signal whilst in transactional mode. Then once in
      setup_tm_sigcontexts() MSR from ckpt_regs.msr is used, but since
      check_if_tm_restore_required() was not called previuosly, gp_regs[PT_MSR]
      gets a copy of invalid MSR bits as MSR in ckpt_regs was not updated from
      regs->msr and so is zeroed. Later when leaving the signal handler once in
      sys_rt_sigreturn() the TS bits of gp_regs[PT_MSR] are checked to determine
      if restore_tm_sigcontexts() must be called to pull in the correct MSR state
      into the user context. Because TS bits are zeroed
      restore_tm_sigcontexts() is never called and MSR restored from the user
      context on returning from the signal handler has the MSR.LE (the endianness
      bit) forced to zero (Big-Endian). That leads, for instance, to 'nop' being
      treated as an illegal instruction in the following sequence:
      
      	tbegin.
      	beq	1f
      	trap
      	tend.
      1:	nop
      
      on PPC64 LE machines and the process dies just after returning from the
      signal handler.
      
      PPC64 BE is also affected but in a subtle way since forcing Big-Endian on
      a BE machine does not change the endianness.
      
      This commit fixes the issue described above by ensuring that once in
      setup_tm_sigcontexts() the MSR used is from regs->msr instead of from
      ckpt_regs->msr and by ensuring that we pull in only the MSR.FP, MSR.VEC,
      and MSR.VSX bits from ckpt_regs->msr.
      
      The fix was tested both on LE and BE machines and no regression regarding
      the powerpc/tm selftests was observed.
      Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1c200e63
    • A
      powerpc: Expose TSCR via sysfs · b6d34eb4
      Anton Blanchard 提交于
      The thread switch control register (TSCR) is a per core register
      that configures how the CPU shares resources between SMT threads.
      
      Exposing it via sysfs allows us to tune it at run time.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b6d34eb4
    • R
      powerpc: Use octal numbers for file permissions · 57ad583f
      Russell Currey 提交于
      Symbolic macros are unintuitive and hard to read, whereas octal constants
      are much easier to interpret.  Replace macros for the basic permission
      flags (user/group/other read/write/execute) with numeric constants
      instead, across the whole powerpc tree.
      
      Introducing a significant number of changes across the tree for no runtime
      benefit isn't exactly desirable, but so long as these macros are still
      used in the tree people will keep sending patches that add them.  Not only
      are they hard to parse at a glance, there are multiple ways of coming to
      the same value (as you can see with 0444 and 0644 in this patch) which
      hurts readability.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      57ad583f
  8. 21 1月, 2018 2 次提交
  9. 20 1月, 2018 4 次提交
  10. 19 1月, 2018 12 次提交