1. 06 11月, 2017 24 次提交
    • C
      powerpc: Don't enable FP/Altivec if not checkpointed · a7771176
      Cyril Bur 提交于
      Lazy save and restore of FP/Altivec means that a userspace process can
      be sent to userspace with FP or Altivec disabled and loaded only as
      required (by way of an FP/Altivec unavailable exception). Transactional
      Memory complicates this situation as a transaction could be started
      without FP/Altivec being loaded up. This causes the hardware to
      checkpoint incorrect registers. Handling FP/Altivec unavailable
      exceptions while a thread is transactional requires a reclaim and
      recheckpoint to ensure the CPU has correct state for both sets of
      registers.
      
      Lazy save and restore of FP/Altivec cannot be done if a process is
      transactional. If a facility was enabled it must remain enabled whenever
      a thread is transactional.
      
      Commit dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware
      transactional memory in use") ensures that the facilities are always
      enabled if a thread is transactional. A bug in the introduced code may
      cause it to inadvertently enable a facility that was (and should remain)
      disabled. The problem with this extraneous enablement is that the
      registers for the erroneously enabled facility have not been correctly
      recheckpointed - the recheckpointing code assumed the facility would
      remain disabled.
      
      Further compounding the issue, the transactional {fp,altivec,vsx}
      unavailable code has been incorrectly using the MSR to enable
      facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
      means if the registers are live on the CPU, not if the kernel should
      load them before returning to userspace. This has worked due to the bug
      mentioned above.
      
      This causes transactional threads which return to their failure handler
      to observe incorrect checkpointed registers. Perhaps an example will
      help illustrate the problem:
      
      A userspace process is running and uses both FP and Altivec registers.
      This process then continues to run for some time without touching
      either sets of registers. The kernel subsequently disables the
      facilities as part of lazy save and restore. The userspace process then
      performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
      registers. The process then performs a floating point instruction
      triggering a fp unavailable exception in the kernel.
      
      The kernel then loads the FP registers - and only the FP registers.
      Since the thread is transactional it must perform a reclaim and
      recheckpoint to ensure both the checkpointed registers and the
      transactional registers are correct. It then (correctly) enables
      MSR[FP] for the process. Later (on exception exist) the kernel also
      (inadvertently) enables MSR[VEC]. The process is then returned to
      userspace.
      
      Since the act of loading the FP registers doomed the transaction we know
      CPU will fail the transaction, restore its checkpointed registers, and
      return the process to its failure handler. The problem is that we're
      now running with Altivec enabled and the 'junk' checkpointed registers
      are restored. The kernel had only recheckpointed FP.
      
      This patch solves this by only activating FP/Altivec if userspace was
      using them when it entered the kernel and not simply if the process is
      transactional.
      
      Fixes: dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware
      transactional memory in use")
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a7771176
    • C
      powerpc/powernv: Add OPAL_BUSY to opal_error_code() · 77adbd22
      Cyril Bur 提交于
      Also export opal_error_code() so that it can be used in modules
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      77adbd22
    • C
      powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async · 9aab2449
      Cyril Bur 提交于
      This patch adds an _interruptible version of opal_async_wait_response().
      This is useful when a long running OPAL call is performed on behalf of
      a userspace thread, for example, the opal_flash_{read,write,erase}
      functions performed by the powernv-flash MTD driver.
      
      It is foreseeable that these functions would take upwards of two
      minutes causing the wait_event() to block long enough to cause hung
      task warnings. Furthermore, wait_event_interruptible() is preferable
      as otherwise there is no way for signals to stop the process which is
      going to be confusing in userspace.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9aab2449
    • S
      powernv/opal-sensor: remove not needed lock · 95e1bc1d
      Stewart Smith 提交于
      Parallel sensor reads could run out of async tokens due to
      opal_get_sensor_data grabbing tokens but then doing the sensor
      read behind a mutex, essentially serializing the (possibly
      asynchronous and relatively slow) sensor read.
      
      It turns out that the mutex isn't needed at all, not only
      should the OPAL interface allow concurrent reads, the implementation
      is certainly safe for that, and if any sensor we were reading
      from somewhere isn't, doing the mutual exclusion in the kernel
      is the wrong place to do it, OPAL should be doing it for the kernel.
      
      So, remove the mutex.
      
      Additionally, we shouldn't be printing out an error when we don't
      get a token as the only way this should happen is if we've been
      interrupted in down_interruptible() on the semaphore.
      Reported-by: NRobert Lippert <rlippert@google.com>
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      95e1bc1d
    • C
      powerpc/opal: Rework the opal-async interface · 86cd6d98
      Cyril Bur 提交于
      Future work will add an opal_async_wait_response_interruptible()
      which will call wait_event_interruptible(). This work requires extra
      token state to be tracked as wait_event_interruptible() can return and
      the caller could release the token before OPAL responds.
      
      Currently token state is tracked with two bitfields which are 64 bits
      big but may not need to be as OPAL informs Linux how many async tokens
      there are. It also uses an array indexed by token to store response
      messages for each token.
      
      The bitfields make it difficult to add more state and also provide a
      hard maximum as to how many tokens there can be - it is possible that
      OPAL will inform Linux that there are more than 64 tokens.
      
      Rather than add a bitfield to track the extra state, rework the
      internals slightly.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Fix __opal_async_get_token() when no tokens are free]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      86cd6d98
    • C
      powerpc/opal: Make __opal_async_{get, release}_token() static · 59cf9a1c
      Cyril Bur 提交于
      There are no callers of both __opal_async_get_token() and
      __opal_async_release_token().
      
      This patch also removes the possibility of "emergency through
      synchronous call to __opal_async_get_token()" as such it makes more
      sense to initialise opal_sync_sem for the maximum number of async
      tokens.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      59cf9a1c
    • W
      powerpc/opal: Fix EBUSY bug in acquiring tokens · 71e24d77
      William A. Kennington III 提交于
      The current code checks the completion map to look for the first token
      that is complete. In some cases, a completion can come in but the
      token can still be on lease to the caller processing the completion.
      If this completed but unreleased token is the first token found in the
      bitmap by another tasks trying to acquire a token, then the
      __test_and_set_bit call will fail since the token will still be on
      lease. The acquisition will then fail with an EBUSY.
      
      This patch reorganizes the acquisition code to look at the
      opal_async_token_map for an unleased token. If the token has no lease
      it must have no outstanding completions so we should never see an
      EBUSY, unless we have leased out too many tokens. Since
      opal_async_get_token_inrerruptible is protected by a semaphore, we
      will practically never see EBUSY anymore.
      
      Fixes: 8d724823 ("powerpc/powernv: Infrastructure to support OPAL async completion")
      Signed-off-by: NWilliam A. Kennington III <wak@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      71e24d77
    • A
      powerpc/eeh: Stop using do_gettimeofday() · edfd17ff
      Arnd Bergmann 提交于
      This interface is inefficient and deprecated because of the y2038
      overflow.
      
      ktime_get_seconds() is an appropriate replacement here, since it
      has sufficient granularity but is more efficient and uses monotonic
      time.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      edfd17ff
    • S
      bpf: take advantage of stack_depth tracking in powerpc JIT · ac0761eb
      Sandipan Das 提交于
      Take advantage of stack_depth tracking, originally introduced for
      x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
      to make sure it stays aligned for functions called from JITed bpf
      program.
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac0761eb
    • M
      powerpc/tm: Don't check for WARN in TM Bad Thing handling · 632f0574
      Michael Ellerman 提交于
      Currently when we take a TM Bad Thing program check exception, we
      search the bug table to see if the program check was generated by a
      WARN/WARN_ON etc.
      
      That makes no sense, the WARN macros use trap instructions, which
      should never generate a TM Bad Thing exception. If they ever did that
      would be a bug and we should oops.
      
      We do have some hand-coded bugs in tm.S, using EMIT_BUG_ENTRY, but
      those are all BUGs not WARNs, and they all use trap instructions
      anyway. Almost certainly this check was incorrectly copied from the
      REASON_TRAP handling in the same function.
      
      Remove it.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-By: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      632f0574
    • M
      powerpc/mm: Add a CONFIG option to choose if radix is used by default · 1fd6c022
      Michael Ellerman 提交于
      Currently if the hardware supports the radix MMU we will use
      it, *unless* "disable_radix" is passed on the kernel command line.
      
      However some users would like the reverse semantics. ie. The kernel
      uses the hash MMU by default, unless radix is explicitly requested on
      the command line.
      
      So add a CONFIG option to choose whether we use radix by default or
      not, and expand the disable_radix command line option to allow
      "disable_radix=no" which *enables* radix.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1fd6c022
    • M
      powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747
      Michael Ellerman 提交于
      CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
      on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
      found in "server" processors, from IBM mainly.
      
      Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
      annoying to have two symbols that always have the same value, it's not
      quite annoying enough to bother removing one.
      
      However with the arrival of Power9, we now have the situation where
      CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
      Radix MMU - *not* the "standard" MMU. So it is now actively confusing
      to use it, because it implies that code is disabled or inactive when
      the Radix MMU is in use, however that is not necessarily true.
      
      So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
      formatting updates of some of the affected lines.
      
      This will be a pain for backports, but c'est la vie.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4e003747
    • M
      powerpc/64: Free up CPU_FTR_ICSWX · c1807e3f
      Michael Ellerman 提交于
      The last user of CPU_FTR_ICSWX was removed in commit
      6ff4d3e9 ("powerpc: Remove old unused icswx based coprocessor
      support"), so free the bit up for future use.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1807e3f
    • A
      powerpc/mm/hash: Add pr_fmt() to hash_utils64.c · 7f142661
      Aneesh Kumar K.V 提交于
      Make the printks look a bit nicer by adding a prefix.
      
      Radix config now do
       radix-mmu: Page sizes from device-tree:
       radix-mmu: Page size shift = 12 AP=0x0
       radix-mmu: Page size shift = 16 AP=0x5
       radix-mmu: Page size shift = 21 AP=0x1
       radix-mmu: Page size shift = 30 AP=0x2
      
      This patch update hash config to do similar dmesg output. With the patch we have
      
       hash-mmu: Page sizes from device-tree:
       hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
       hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
       hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
       hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
       hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
       hash-mmu: base_shift=20: shift=20, sllp=0x0111, avpnm=0x00000000, tlbiel=0, penc=2
       hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
       hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7f142661
    • C
      powerpc/ipic: Fix status get and status clear · 6b148a7c
      Christophe Leroy 提交于
      IPIC Status is provided by register IPIC_SERSR and not by IPIC_SERMR
      which is the mask register.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b148a7c
    • A
      powerpc/powernv: Reserve a hole which appears after enabling IOV · d6f934fd
      Alexey Kardashevskiy 提交于
      In order to make generic IOV code work, the physical function IOV BAR
      should start from offset of the first VF. Since M64 segments share
      PE number space across PHB, and some PEs may be in use at the time
      when IOV is enabled, the existing code shifts the IOV BAR to the index
      of the first PE/VF. This creates a hole in IOMEM space which can be
      potentially taken by some other device.
      
      This reserves a temporary hole on a parent and releases it when IOV is
      disabled; the temporary resources are stored in pci_dn to avoid
      kmalloc/free.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d6f934fd
    • T
      powerpc/pseries/vio: Dispose of virq mapping on vdevice unregister · b8f89fea
      Tyrel Datwyler 提交于
      When a vdevice is DLPAR removed from the system the vio subsystem
      doesn't bother unmapping the virq from the irq_domain. As a result we
      have a virq mapped to a hardware irq that is no longer valid for the
      irq_domain. A side effect is that we are left with /proc/irq/<irq#>
      affinity entries, and attempts to modify the smp_affinity of the irq
      will fail.
      
      In the following observed example the kernel log is spammed by
      ics_rtas_set_affinity errors after the removal of a VSCSI adapter.
      This is a result of irqbalance trying to adjust the affinity every 10
      seconds.
      
        rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed
        ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
        ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
      
      This patch fixes the issue by calling irq_dispose_mapping() on the
      virq of the viodev on unregister.
      
      Fixes: f2ab6219 ("powerpc/pseries: Add PFO support to the VIO bus")
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b8f89fea
    • N
      powerpc/64s/radix: Fix process table entry cache invalidation · 30b49ec7
      Nicholas Piggin 提交于
      According to the architecture, the process table entry cache must be
      flushed with tlbie RIC=2.
      
      Currently the process table entry is set to invalid right before the
      PID is returned to the allocator, with no invalidation. This works on
      existing implementations that are known to not cache the process table
      entry for any except the current PIDR.
      
      It is architecturally correct and cleaner to invalidate with RIC=2
      after clearing the process table entry and before the PID is returned
      to the allocator. This can be done in arch_exit_mmap that runs before
      the final flush, and to ensure the final flush (fullmm) is always a
      RIC=2 variant.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      30b49ec7
    • N
      powerpc/64s/radix: Improve preempt handling in TLB code · dffe8449
      Nicholas Piggin 提交于
      Preempt should be consistently disabled for mm_is_thread_local tests,
      so bring the rest of these under preempt_disable().
      
      Preempt does not need to be disabled for the mm->context.id tests,
      which allows simplification and removal of gotos.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dffe8449
    • N
      powerpc/powernv: Use FIXUP_ENDIAN_HV in OPAL return · 63c9d8a4
      Nicholas Piggin 提交于
      Close the recoverability gap for OPAL calls by using FIXUP_ENDIAN_HV
      in the return path.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      63c9d8a4
    • N
      powerpc/book3s: Add an HV variant of FIXUP_ENDIAN that is recoverable · 8ca9c08d
      Nicholas Piggin 提交于
      Add an HV variant of FIXUP_ENDIAN which uses HSRR[01] and does not
      clear MSR[RI], which improves recoverability.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8ca9c08d
    • N
    • N
      powerpc/64: Fix latency tracing for lazy irq replay · ff967900
      Nicholas Piggin 提交于
      When returning from an exception to a soft-enabled context, pending
      IRQs are replayed but IRQ tracing is not reset, so a number of them
      can get chained together into the same IRQ-disabled trace.
      
      Fix this by having __check_irq_replay re-set IRQ trace. This is
      conceptually where we respond to the next interrupt, so it fits the
      semantics of the IRQ tracer.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ff967900
    • N
      KVM: PPC: Book3S HV: Handle host system reset in guest mode · 6de6638b
      Nicholas Piggin 提交于
      If the host takes a system reset interrupt while a guest is running,
      the CPU must exit the guest before processing the host exception
      handler.
      
      After this patch, taking a sysrq+x with a CPU running in a guest
      gives a trace like this:
      
         cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0]
             pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
             lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
             sp: c000000fdf577850
            msr: 9000000002803033
           current = 0xc000000fdf4b1e00
           paca    = 0xc00000000fd4d680	 softe: 3	 irq_happened: 0x01
             pid   = 6608, comm = qemu-system-ppc
         Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP
         [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv]
         [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm]
         [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm]
         [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm]
         [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0
         [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100
         [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c
         --- Exception: c01 (System Call) at 00007fff76868df0
         SP (7fff7069baf0) is in userspace
      
      Fixes: e36d0a2e ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6de6638b
  2. 22 10月, 2017 11 次提交
  3. 21 10月, 2017 2 次提交
    • M
      powerpc/tm: P9 disable transactionally suspended sigcontexts · 92fb8690
      Michael Neuling 提交于
      Unfortunately userspace can construct a sigcontext which enables
      suspend. Thus userspace can force Linux into a path where trechkpt is
      executed.
      
      This patch blocks this from happening on POWER9 by sanity checking
      sigcontexts passed in.
      
      ptrace doesn't have this problem as only MSR SE and BE can be changed
      via ptrace.
      
      This patch also adds a number of WARN_ON()s in case we ever enter
      suspend when we shouldn't. This should not happen, but if it does the
      symptoms are soft lockup warnings which are not obviously TM related,
      so the WARN_ON()s should make it obvious what's happening.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      92fb8690
    • M
      powerpc/powernv: Enable TM without suspend if possible · 54820530
      Michael Ellerman 提交于
      Some Power9 revisions can run in a mode where TM operates without
      suspended state. If we find ourself on a CPU that might be in this
      mode, we query OPAL to check, and if so we reenable TM in CPU
      features, and enable a new user feature to signal to userspace that we
      are in this mode.
      
      We do not enable the "normal" user feature, PPC_FEATURE2_HTM, but we
      do enable PPC_FEATURE2_HTM_NOSC because that indicates to userspace
      that the kernel will abort transactions on syscall entry, which is
      true regardless of the suspend mode.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      54820530
  4. 20 10月, 2017 3 次提交
    • M
      powerpc: Add PPC_FEATURE2_HTM_NO_SUSPEND · cba6ac48
      Michael Ellerman 提交于
      Some CPUs can operate in a mode where TM (Transactional Memory) is
      enabled but the suspended state of TM is disabled. In this mode
      tsuspend does not enter suspended state, instead the transaction is
      aborted. Similarly any other event that would lead to suspended state
      instead aborts the transaction.
      
      There is also an ABI change, in that in this mode processes are not
      allowed to sigreturn with an MSR that would lead to suspended state,
      Linux will instead return an error to the sigreturn syscall.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cba6ac48
    • C
      powerpc/tm: Add commandline option to disable hardware transactional memory · 07fd1761
      Cyril Bur 提交于
      Currently the kernel relies on firmware to inform it whether or not the
      CPU supports HTM and as long as the kernel was built with
      CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make
      use of the facility.
      
      There may be situations where it would be advantageous for the kernel
      to not allow userspace to use HTM, currently the only way to achieve
      this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n.
      
      This patch adds a simple commandline option so that HTM can be
      disabled at boot time.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Simplify to a bool, move to prom.c, put doco in the right place.
       Always disable, regardless of initial state, to avoid user confusion.]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      07fd1761
    • M
      KVM: PPC: Tie KVM_CAP_PPC_HTM to the user-visible TM feature · 2a3d6553
      Michael Ellerman 提交于
      Currently we use CPU_FTR_TM to decide if the CPU/kernel can support
      TM (Transactional Memory), and if it's true we advertise that to
      Qemu (or similar) via KVM_CAP_PPC_HTM.
      
      PPC_FEATURE2_HTM is the user-visible feature bit, which indicates that
      the CPU and kernel can support TM. Currently CPU_FTR_TM and
      PPC_FEATURE2_HTM always have the same value, either true or false, so
      using the former for KVM_CAP_PPC_HTM is correct.
      
      However some Power9 CPUs can operate in a mode where TM is enabled but
      TM suspended state is disabled. In this mode CPU_FTR_TM is true, but
      PPC_FEATURE2_HTM is false. Instead a different PPC_FEATURE2 bit is
      set, to indicate that this different mode of TM is available.
      
      It is not safe to let guests use TM as-is, when the CPU is in this
      mode. So to prevent that from happening, use PPC_FEATURE2_HTM to
      determine the value of KVM_CAP_PPC_HTM.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2a3d6553