1. 06 11月, 2017 3 次提交
    • M
      powerpc/64: Free up CPU_FTR_ICSWX · c1807e3f
      Michael Ellerman 提交于
      The last user of CPU_FTR_ICSWX was removed in commit
      6ff4d3e9 ("powerpc: Remove old unused icswx based coprocessor
      support"), so free the bit up for future use.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1807e3f
    • N
      powerpc/64: Fix latency tracing for lazy irq replay · ff967900
      Nicholas Piggin 提交于
      When returning from an exception to a soft-enabled context, pending
      IRQs are replayed but IRQ tracing is not reset, so a number of them
      can get chained together into the same IRQ-disabled trace.
      
      Fix this by having __check_irq_replay re-set IRQ trace. This is
      conceptually where we respond to the next interrupt, so it fits the
      semantics of the IRQ tracer.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ff967900
    • N
      KVM: PPC: Book3S HV: Handle host system reset in guest mode · 6de6638b
      Nicholas Piggin 提交于
      If the host takes a system reset interrupt while a guest is running,
      the CPU must exit the guest before processing the host exception
      handler.
      
      After this patch, taking a sysrq+x with a CPU running in a guest
      gives a trace like this:
      
         cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0]
             pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
             lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
             sp: c000000fdf577850
            msr: 9000000002803033
           current = 0xc000000fdf4b1e00
           paca    = 0xc00000000fd4d680	 softe: 3	 irq_happened: 0x01
             pid   = 6608, comm = qemu-system-ppc
         Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP
         [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv]
         [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm]
         [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm]
         [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm]
         [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0
         [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100
         [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c
         --- Exception: c01 (System Call) at 00007fff76868df0
         SP (7fff7069baf0) is in userspace
      
      Fixes: e36d0a2e ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6de6638b
  2. 22 10月, 2017 2 次提交
    • M
      powerpc: Disable the fast-endian switch syscall by default · 727f1361
      Michael Ellerman 提交于
      Back in 2008 we added support for "fast little-endian switch" in the
      syscall path. This added a special case syscall number 0x1ebe, which
      is caught very early in the system call exception and switches endian
      with as little overhead as possible. See commit 745a14cc
      ("[POWERPC] Add fast little-endian switch system call") for full
      details.
      
      Although it is fast, it's also completely non standard. The "syscall
      number" is out of the range of normal syscalls, it can't be traced or
      audited, and it's a bit of a wart. To the best of our knowledge it was
      only used by one program, now long since discontinued.
      
      So in an effort to shake out any current users, put it behind a config
      option, and make it default n. If anyone *is* using it they can
      quickly reinstate it with a rebuild, and we can flip it to default y.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      727f1361
    • M
      powerpc/64s: Move the two FAST_ENDIAN macros next to each other · 5c2511bf
      Michael Ellerman 提交于
      So we can #ifdef them in the next patch.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5c2511bf
  3. 21 10月, 2017 2 次提交
    • M
      powerpc/tm: P9 disable transactionally suspended sigcontexts · 92fb8690
      Michael Neuling 提交于
      Unfortunately userspace can construct a sigcontext which enables
      suspend. Thus userspace can force Linux into a path where trechkpt is
      executed.
      
      This patch blocks this from happening on POWER9 by sanity checking
      sigcontexts passed in.
      
      ptrace doesn't have this problem as only MSR SE and BE can be changed
      via ptrace.
      
      This patch also adds a number of WARN_ON()s in case we ever enter
      suspend when we shouldn't. This should not happen, but if it does the
      symptoms are soft lockup warnings which are not obviously TM related,
      so the WARN_ON()s should make it obvious what's happening.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      92fb8690
    • M
      powerpc/powernv: Enable TM without suspend if possible · 54820530
      Michael Ellerman 提交于
      Some Power9 revisions can run in a mode where TM operates without
      suspended state. If we find ourself on a CPU that might be in this
      mode, we query OPAL to check, and if so we reenable TM in CPU
      features, and enable a new user feature to signal to userspace that we
      are in this mode.
      
      We do not enable the "normal" user feature, PPC_FEATURE2_HTM, but we
      do enable PPC_FEATURE2_HTM_NOSC because that indicates to userspace
      that the kernel will abort transactions on syscall entry, which is
      true regardless of the suspend mode.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      54820530
  4. 20 10月, 2017 1 次提交
    • C
      powerpc/tm: Add commandline option to disable hardware transactional memory · 07fd1761
      Cyril Bur 提交于
      Currently the kernel relies on firmware to inform it whether or not the
      CPU supports HTM and as long as the kernel was built with
      CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make
      use of the facility.
      
      There may be situations where it would be advantageous for the kernel
      to not allow userspace to use HTM, currently the only way to achieve
      this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n.
      
      This patch adds a simple commandline option so that HTM can be
      disabled at boot time.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Simplify to a bool, move to prom.c, put doco in the right place.
       Always disable, regardless of initial state, to avoid user confusion.]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      07fd1761
  5. 19 10月, 2017 1 次提交
  6. 16 10月, 2017 5 次提交
  7. 13 10月, 2017 1 次提交
  8. 06 10月, 2017 1 次提交
  9. 05 10月, 2017 2 次提交
    • N
      powerpc/jprobes: Validate break handler invocation as being due to a jprobe_return() · 3368f569
      Naveen N. Rao 提交于
      Fix a circa 2005 FIXME by implementing a check to ensure that we
      actually got into the jprobe break handler() due to the trap in
      jprobe_return().
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3368f569
    • N
      powerpc/jprobes: Disable preemption when triggered through ftrace · 6baea433
      Naveen N. Rao 提交于
      KPROBES_SANITY_TEST throws the below splat when CONFIG_PREEMPT is
      enabled:
      
        Kprobe smoke test: started
        DEBUG_LOCKS_WARN_ON(val > preempt_count())
        ------------[ cut here ]------------
        WARNING: CPU: 19 PID: 1 at kernel/sched/core.c:3094 preempt_count_sub+0xcc/0x140
        Modules linked in:
        CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc7-nnr+ #97
        task: c0000000fea80000 task.stack: c0000000feb00000
        NIP:  c00000000011d3dc LR: c00000000011d3d8 CTR: c000000000a090d0
        REGS: c0000000feb03400 TRAP: 0700   Not tainted  (4.13.0-rc7-nnr+)
        MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28000282  XER: 00000000
        CFAR: c00000000015aa18 SOFTE: 0
        <snip>
        NIP preempt_count_sub+0xcc/0x140
        LR  preempt_count_sub+0xc8/0x140
        Call Trace:
          preempt_count_sub+0xc8/0x140 (unreliable)
          kprobe_handler+0x228/0x4b0
          program_check_exception+0x58/0x3b0
          program_check_common+0x16c/0x170
          --- interrupt: 0 at kprobe_target+0x8/0x20
                           LR = init_test_probes+0x248/0x7d0
          kp+0x0/0x80 (unreliable)
          livepatch_handler+0x38/0x74
          init_kprobes+0x1d8/0x208
          do_one_initcall+0x68/0x1d0
          kernel_init_freeable+0x298/0x374
          kernel_init+0x24/0x160
          ret_from_kernel_thread+0x5c/0x70
        Instruction dump:
        419effdc 3d22001b 39299240 81290000 2f890000 409effc8 3c82ffcb 3c62ffcb
        3884bc68 3863bc18 4803d5fd 60000000 <0fe00000> 4bffffa8 60000000 60000000
        ---[ end trace 432dd46b4ce3d29f ]---
        Kprobe smoke test: passed successfully
      
      The issue is that we aren't disabling preemption in
      kprobe_ftrace_handler(). Disable it.
      
      Fixes: ead514d5 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Trim oops a little for formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6baea433
  10. 04 10月, 2017 10 次提交
  11. 27 9月, 2017 1 次提交
    • M
      powerpc/64s: Add workaround for P9 vector CI load issue · 5080332c
      Michael Neuling 提交于
      POWER9 DD2.1 and earlier has an issue where some cache inhibited
      vector load will return bad data. The workaround is two part, one
      firmware/microcode part triggers HMI interrupts when hitting such
      loads, the other part is this patch which then emulates the
      instructions in Linux.
      
      The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
      lxvh8x.
      
      When an instruction triggers the HMI, all threads in the core will be
      sent to the HMI handler, not just the one running the vector load.
      
      In general, these spurious HMIs are detected by the emulation code and
      we just return back to the running process. Unfortunately, if a
      spurious interrupt occurs on a vector load that's to normal memory we
      have no way to detect that it's spurious (unless we walk the page
      tables, which is very expensive). In this case we emulate the load but
      we need do so using a vector load itself to ensure 128bit atomicity is
      preserved.
      
      Some additional debugfs emulated instruction counters are added also.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5080332c
  12. 26 9月, 2017 1 次提交
    • B
      powerpc/powernv: Rework EEH initialization on powernv · b9fde58d
      Benjamin Herrenschmidt 提交于
      Remove the post_init callback which is only used
      by powernv, we can just call it explicitly from
      the powernv code.
      
      This partially kills the ability to "disable" eeh at
      runtime via debugfs as this was calling that same
      callback again, but this is both unused and broken
      in several ways. If we want to revive it, we need
      to create a dedicated enable/disable callback on the
      backend that does the right thing.
      
      Let the bulk of eeh initialize normally at
      core_initcall() like it does on pseries by removing
      the hack in eeh_init() that delays it.
      
      Instead we make sure our eeh->probe cleanly bails
      out of the PEs haven't been created yet and we force
      a re-probe where we used to call eeh_init() again.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b9fde58d
  13. 21 9月, 2017 1 次提交
  14. 20 9月, 2017 2 次提交
    • N
      powerpc/kprobes: Update optprobes to use emulate_update_regs() · 8afafa6f
      Naveen N. Rao 提交于
      Optprobes depended on an updated regs->nip from analyse_instr() to
      identify the location to branch back from the optprobes trampoline.
      However, since commit 3cdfcbfd ("powerpc: Change analyse_instr so
      it doesn't modify *regs"), analyse_instr() doesn't update the registers
      anymore.  Due to this, we end up branching back from the optprobes
      trampoline to the same branch into the trampoline resulting in a loop.
      
      Fix this by calling out to emulate_update_regs() before using the nip.
      Additionally, explicitly compare the return value from analyse_instr()
      to 1, rather than just checking for !0 so as to guard against any
      future changes to analyse_instr() that may result in -1 being returned
      in more scenarios.
      
      Fixes: 3cdfcbfd ("powerpc: Change analyse_instr so it doesn't modify *regs")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8afafa6f
    • G
      powerpc/tm: Flush TM only if CPU has TM feature · c1fa0768
      Gustavo Romero 提交于
      Commit cd63f3cf ("powerpc/tm: Fix saving of TM SPRs in core dump")
      added code to access TM SPRs in flush_tmregs_to_thread(). However
      flush_tmregs_to_thread() does not check if TM feature is available on
      CPU before trying to access TM SPRs in order to copy live state to
      thread structures. flush_tmregs_to_thread() is indeed guarded by
      CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel
      was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on
      a CPU without TM feature available, thus rendering the execution
      of TM instructions that are treated by the CPU as illegal instructions.
      
      The fix is just to add proper checking in flush_tmregs_to_thread()
      if CPU has the TM feature before accessing any TM-specific resource,
      returning immediately if TM is no available on the CPU. Adding
      that checking in flush_tmregs_to_thread() instead of in places
      where it is called, like in vsr_get() and vsr_set(), is better because
      avoids the same problem cropping up elsewhere.
      
      Cc: stable@vger.kernel.org # v4.13+
      Fixes: cd63f3cf ("powerpc/tm: Fix saving of TM SPRs in core dump")
      Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1fa0768
  15. 15 9月, 2017 1 次提交
  16. 14 9月, 2017 1 次提交
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
  17. 09 9月, 2017 1 次提交
  18. 02 9月, 2017 1 次提交
    • C
      powerpc/xive: add XIVE Exploitation Mode to CAS · ac5e5a54
      Cédric Le Goater 提交于
      On POWER9, the Client Architecture Support (CAS) negotiation process
      determines whether the guest operates in XIVE Legacy compatibility or
      in XIVE exploitation mode. Now that we have initial guest support for
      the XIVE interrupt controller, let's inform the hypervisor what we can
      do.
      
      The platform advertises the XIVE Exploitation Mode support using the
      property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :
      
       - 0b00 XIVE legacy mode Only
       - 0b01 XIVE exploitation mode Only
       - 0b10 XIVE legacy or exploitation mode
      
      The OS asks for XIVE Exploitation Mode support using the property
      "ibm,architecture-vec-5", byte 23 bits 0-1:
      
       - 0b00 XIVE legacy mode Only
       - 0b01 XIVE exploitation mode Only
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac5e5a54
  19. 01 9月, 2017 3 次提交