1. 14 10月, 2018 5 次提交
  2. 13 10月, 2018 8 次提交
    • C
      powerpc/32: Add ioremap_wt() and ioremap_coherent() · 86c391bd
      Christophe Leroy 提交于
      Other arches have ioremap_wt() to map IO areas write-through.
      Implement it on PPC as well in order to avoid drivers using
      __ioremap(_PAGE_WRITETHRU)
      
      Also implement ioremap_coherent() to avoid drivers using
      __ioremap(_PAGE_COHERENT)
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      86c391bd
    • G
      powerpc: Detect the presence of big-cores via "ibm, thread-groups" · 425752c6
      Gautham R. Shenoy 提交于
      On IBM POWER9, the device tree exposes a property array identifed by
      "ibm,thread-groups" which will indicate which groups of threads share
      a particular set of resources.
      
      As of today we only have one form of grouping identifying the group of
      threads in the core that share the L1 cache, translation cache and
      instruction data flow.
      
      This patch adds helper functions to parse the contents of
      "ibm,thread-groups" and populate a per-cpu variable to cache
      information about siblings of each CPU that share the L1, traslation
      cache and instruction data-flow.
      
      It also defines a new global variable named "has_big_cores" which
      indicates if the cores on this configuration have multiple groups of
      threads that share L1 cache.
      
      For each online CPU, it maintains a cpu_smallcore_mask, which
      indicates the online siblings which share the L1-cache with it.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      425752c6
    • S
      powerpc/eeh: Cleanup eeh_ops.wait_state() · fef7f905
      Sam Bobroff 提交于
      The wait_state member of eeh_ops does not need to be platform
      dependent; it's just logic around eeh_ops.get_state(). Therefore,
      merge the two (slightly different!) platform versions into a new
      function, eeh_wait_state() and remove the eeh_ops member.
      
      While doing this, also correct:
      * The wait logic, so that it never waits longer than max_wait.
      * The wait logic, so that it never waits less than
        EEH_STATE_MIN_WAIT_TIME.
      * One call site where the result is treated like a bit field before
        it's checked for negative error values.
      * In pseries_eeh_get_state(), rename the "state" parameter to "delay"
        because that's what it is.
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fef7f905
    • S
      powerpc/eeh: Cleanup eeh_pe_state_mark() · e762bb89
      Sam Bobroff 提交于
      Currently, eeh_pe_state_mark() marks a PE (and it's children) with a
      state and then performs additional processing if that state included
      EEH_PE_ISOLATED.
      
      The state parameter is always a constant at the call site, so
      rearrange eeh_pe_state_mark() into two functions and just call the
      appropriate one at each site.
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e762bb89
    • S
      powerpc/eeh: Cleanup eeh_enabled() · 54644927
      Sam Bobroff 提交于
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      54644927
    • S
      powerpc/eeh: Cleanup list_head field names · 80e65b00
      Sam Bobroff 提交于
      Instances of struct eeh_pe are placed in a tree structure using the
      fields "child_list" and "child", so place these next to each other
      in the definition.
      
      The field "child" is a list entry, so remove the unnecessary and
      misleading use of the list initializer, LIST_HEAD(), on it.
      
      The eeh_dev struct contains two list entry fields, called "list" and
      "rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop
      initializing them with LIST_HEAD().
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      80e65b00
    • S
      powerpc/eeh: Cleanup unused field in eeh_dev · b95a4606
      Sam Bobroff 提交于
      The 'bus' member of struct eeh_dev is assigned to once but never used,
      so remove it.
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b95a4606
    • S
      powerpc/eeh: Cleanup EEH_POSTPONED_PROBE · bffc0176
      Sam Bobroff 提交于
      Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect
      message "EEH: No capable adapters found" from being displayed during
      the boot of powernv systems.
      
      It is necessary because, on powernv, the call to eeh_probe_devices()
      made from eeh_init() is too early and EEH can't yet be enabled. A
      second call is made later from eeh_pnv_post_init(), which succeeds.
      
      (On pseries, the first call succeeds because PCI devices are set up
      early enough and no second call is made.)
      
      This can be simplified by moving the early call to eeh_probe_devices()
      from eeh_init() (where it's seen by both platforms) to
      pSeries_final_fixup(), so that each platform only calls
      eeh_probe_devices() once, at a point where it can succeed.
      This is slightly later in the boot sequence, but but still early
      enough and it is now in the same place in the sequence for both
      platforms (the pcibios_fixup hook).
      
      The display of the message can be cleaned up as well, by moving it
      into eeh_probe_devices().
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bffc0176
  3. 03 10月, 2018 12 次提交
    • B
      powerpc/tm: Remove msr_tm_active() · 5c784c84
      Breno Leitao 提交于
      Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
      CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
      returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.
      
      This function is not necessary, since MSR_TM_ACTIVE() just do the same and
      could be used, removing the dualism and simplifying the code.
      
      This patchset remove every instance of msr_tm_active() and replaced it
      by MSR_TM_ACTIVE().
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5c784c84
    • B
      powerpc/ptrace: Add support for PTRACE_SYSEMU · 5521eb4b
      Breno Leitao 提交于
      This is a patch that adds support for PTRACE_SYSEMU ptrace request in
      PowerPC architecture.
      
      When ptrace(PTRACE_SYSEMU, ...) request is called, it will be handled by
      the arch independent function ptrace_resume(), which will tag the task with
      the TIF_SYSCALL_EMU flag. This flag needs to be handled from a platform
      dependent point of view, which is what this patch does.
      
      This patch adds this task's flag as part of the _TIF_SYSCALL_DOTRACE, which
      is the MACRO that is used to trace syscalls at entrance/exit.
      
      Since TIF_SYSCALL_EMU is now part of _TIF_SYSCALL_DOTRACE, if the task has
      _TIF_SYSCALL_DOTRACE set, it will hit do_syscall_trace_enter() at syscall
      entrance and do_syscall_trace_leave() at syscall leave.
      do_syscall_trace_enter() needs to handle the TIF_SYSCALL_EMU flag properly,
      which will interrupt the syscall executing if TIF_SYSCALL_EMU is set. The
      output values should not be changed, i.e. the return value (r3) should
      contain the original syscall argument on exit.
      
      With this flag set, the syscall is not executed fundamentally, because
      do_syscall_trace_enter() is returning -1 which is bigger than NR_syscall,
      thus, skipping the syscall execution and exiting userspace.
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5521eb4b
    • B
      powerpc: Redefine TIF_32BITS thread flag · 16d7c69c
      Breno Leitao 提交于
      Moving TIF_32BIT to use bit 20 instead of 4 in the task flag field.
      
      This change is making room for an upcoming new task macro
      (_TIF_SYSCALL_EMU) which is preferred to set a bit in the lower 16-bits
      part of the word.
      
      This upcoming flag macro will take part in a composed macro
      (_TIF_SYSCALL_DOTRACE) which will contain other flags as well, and it is
      preferred that the whole _TIF_SYSCALL_DOTRACE macro only sets the lower 16
      bits of a word, so, it could be handled using immediate operations (as load
      immediate, add immediate, ...) where the immediate operand (SI) is limited
      to 16-bits.
      
      Another possible solution would be using the LOAD_REG_IMMEDIATE() macro
      to load a full 64-bits word immediate, but it takes 5 operations instead of
      one.
      
      Having TIF_32BITS being redefined to use an upper bit is not a problem
      since there is only one place in the assembly code where TIF_32BIT is being
      used, and it could be replaced with an operation with right shift (addis),
      since it is used alone, i.e. not being part of a composed macro, which has
      different bits set, and would require LOAD_REG_IMMEDIATE().
      
      Tested on a 64 bits Big Endian machine running a 32 bits task.
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      16d7c69c
    • C
      powerpc/64: add stack protector support · 06ec27ae
      Christophe Leroy 提交于
      On PPC64, as register r13 points to the paca_struct at all time,
      this patch adds a copy of the canary there, which is copied at
      task_switch.
      That new canary is then used by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r13
      -mstack-protector-guard-offset=offsetof(struct paca_struct, canary))
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      06ec27ae
    • C
      powerpc/32: add stack protector support · c3ff2a51
      Christophe Leroy 提交于
      This functionality was tentatively added in the past
      (commit 6533b7c1 ("powerpc: Initial stack protector
      (-fstack-protector) support")) but had to be reverted
      (commit f2574030 ("powerpc: Revert the initial stack
      protector support") because of GCC implementing it differently
      whether it had been built with libc support or not.
      
      Now, GCC offers the possibility to manually set the
      stack-protector mode (global or tls) regardless of libc support.
      
      This time, the patch selects HAVE_STACKPROTECTOR only if
      -mstack-protector-guard=tls is supported by GCC.
      
      On PPC32, as register r2 points to current task_struct at
      all time, the stack_canary located inside task_struct can be
      used directly by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r2
      -mstack-protector-guard-offset=offsetof(struct task_struct, stack_canary))
      
      The protector is disabled for prom_init and bootx_init as
      it is too early to handle it properly.
      
       $ echo CORRUPT_STACK > /sys/kernel/debug/provoke-crash/DIRECT
      [  134.943666] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: lkdtm_CORRUPT_STACK+0x64/0x64
      [  134.943666]
      [  134.955414] CPU: 0 PID: 283 Comm: sh Not tainted 4.18.0-s3k-dev-12143-ga3272be41209 #835
      [  134.963380] Call Trace:
      [  134.965860] [c6615d60] [c001f76c] panic+0x118/0x260 (unreliable)
      [  134.971775] [c6615dc0] [c001f654] panic+0x0/0x260
      [  134.976435] [c6615dd0] [c032c368] lkdtm_CORRUPT_STACK_STRONG+0x0/0x64
      [  134.982769] [c6615e00] [ffffffff] 0xffffffff
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c3ff2a51
    • C
      powerpc/traps: merge unrecoverable_exception() and nonrecoverable_exception() · 51423a9c
      Christophe Leroy 提交于
      PPC32 uses nonrecoverable_exception() while PPC64 uses
      unrecoverable_exception().
      
      Both functions are doing almost the same thing.
      
      This patch removes nonrecoverable_exception()
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      51423a9c
    • A
    • A
      powerpc/mm/thp: update pmd_trans_huge to check for pmd_present · 8890e033
      Aneesh Kumar K.V 提交于
      We need to make sure pmd_trans_huge returns false for a pmd migration entry.
      We mark the migration entry by clearing the _PAGE_PRESENT bit. We keep the
      _PAGE_PTE bit set to indicate a leaf page table entry. Hence we need to make
      sure we check for pmd_present() so that pmd_trans_huge won't return true on
      pmd migration entry.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8890e033
    • A
      powerpc/mm/hugetlb/book3s: add _PAGE_PRESENT to hugepd pointer. · f1981b5b
      Aneesh Kumar K.V 提交于
      This make hugetlb directory pointer similar to other page able entries. A hugepd
      entry is identified by lack of _PAGE_PTE bit set and directory size stored in
      HUGEPD_SHIFT_MASK. We update that to also look at _PAGE_PRESENT
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f1981b5b
    • A
      powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit · da7ad366
      Aneesh Kumar K.V 提交于
      With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid
      pgd/pud/pmd entry. We also switch the p**_present() to look at this bit.
      
      With pmd_present, we have a special case. We need to make sure we consider a
      pmd marked invalid during THP split as present. Right now we clear the
      _PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special
      case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is
      only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit
      for this special case. pmd_present is also updated to look at _PAGE_INVALID.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da7ad366
    • V
      powerpc/powernv: Make possible for user to force a full ipl cec reboot · 8139046a
      Vaibhav Jain 提交于
      Ever since fast reboot is enabled by default in opal,
      opal_cec_reboot() will use fast-reset instead of full IPL to perform
      system reboot. This leaves the user with no direct way to force a full
      IPL reboot except changing an nvram setting that persistently disables
      fast-reset for all subsequent reboots.
      
      This patch provides a more direct way for the user to force a one-shot
      full IPL reboot by passing the command line argument 'full' to the
      reboot command. So the user will be able to tweak the reboot behavior
      via:
      
        $ sudo reboot full	# Force a full ipl reboot skipping fast-reset
      
        or
        $ sudo reboot  	# default reboot path (usually fast-reset)
      
      The reboot command passes the un-parsed command argument to the kernel
      via the 'Reboot' syscall which is then passed on to the arch function
      pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
      and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
      IPL reset.
      Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Acked-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8139046a
    • M
      Revert "convert SLB miss handlers to C" and subsequent commits · 54be0b9c
      Michael Ellerman 提交于
      This reverts commits:
        5e46e29e ("powerpc/64s/hash: convert SLB miss handlers to C")
        8fed04d0 ("powerpc/64s/hash: remove user SLB data from the paca")
        655deecf ("powerpc/64s/hash: SLB allocation status bitmaps")
        2e162674 ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup")
        89ca4e12 ("powerpc/64s/hash: Add a SLB preload cache")
      
      This series had a few bugs, and the fixes are not all trivial. So
      revert most of it for now.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      54be0b9c
  4. 19 9月, 2018 12 次提交
    • A
      powerpc: Fix duplicate const clang warning in user access code · e00d93ac
      Anton Blanchard 提交于
      This re-applies commit b91c1e3e ("powerpc: Fix duplicate const
      clang warning in user access code") (Jun 2015) which was undone in
      commits:
        f2ca8090 ("powerpc/sparse: Constify the address pointer in __get_user_nosleep()") (Feb 2017)
        d466f6c5 ("powerpc/sparse: Constify the address pointer in __get_user_nocheck()") (Feb 2017)
        f84ed59a ("powerpc/sparse: Constify the address pointer in __get_user_check()") (Feb 2017)
      
      We see a large number of duplicate const errors in the user access
      code when building with llvm/clang:
      
        include/linux/pagemap.h:576:8: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
              ret = __get_user(c, uaddr);
      
      The problem is we are doing const __typeof__(*(ptr)), which will hit
      the warning if ptr is marked const.
      
      Removing const does not seem to have any effect on GCC code
      generation.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e00d93ac
    • N
      powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR request · 063b8b12
      Nathan Fontenot 提交于
      The updates to powerpc numa and memory hotplug code now use the
      in-kernel LMB array instead of the device tree. This change allows the
      pseries memory DLPAR code to only update the device tree once after
      successfully handling a DLPAR request.
      
      Prior to the in-kernel LMB array, the numa code looked up the affinity
      for memory being added in the device tree, the code now looks this up
      in the LMB array. This change means the memory hotplug code can just
      update the affinity for an LMB in the LMB array instead of updating
      the device tree.
      
      This also provides a savings in kernel memory. When updating the
      device tree old properties are never free'ed since there is no
      usecount on properties. This behavior leads to a new copy of the
      property being allocated every time a LMB is added or removed (i.e. a
      request to add 100 LMBs creates 100 new copies of the property). With
      this update only a single new property is created when a DLPAR request
      completes successfully.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      063b8b12
    • N
      powerpc/64s/hash: Add a SLB preload cache · 89ca4e12
      Nicholas Piggin 提交于
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      89ca4e12
    • N
      powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup · 2e162674
      Nicholas Piggin 提交于
      This will be used by the SLB code in the next patch, but for now this
      sets the slb_addr_limit to the correct size for 32-bit tasks.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2e162674
    • N
      powerpc/64s/hash: SLB allocation status bitmaps · 655deecf
      Nicholas Piggin 提交于
      Add 32-entry bitmaps to track the allocation status of the first 32
      SLB entries, and whether they are user or kernel entries. These are
      used to allocate free SLB entries first, before resorting to the round
      robin allocator.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      655deecf
    • N
      powerpc/64s/hash: remove user SLB data from the paca · 8fed04d0
      Nicholas Piggin 提交于
      User SLB mappig data is copied into the PACA from the mm->context so
      it can be accessed by the SLB miss handlers.
      
      After the C conversion, SLB miss handlers now run with relocation on,
      and user SLB misses are able to take recursive kernel SLB misses, so
      the user SLB mapping data can be removed from the paca and accessed
      directly.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8fed04d0
    • N
      powerpc/64s/hash: convert SLB miss handlers to C · 5e46e29e
      Nicholas Piggin 提交于
      This patch moves SLB miss handlers completely to C, using the standard
      exception handler macros to set up the stack and branch to C.
      
      This can be done because the segment containing the kernel stack is
      always bolted, so accessing it with relocation on will not cause an
      SLB exception.
      
      Arbitrary kernel memory may not be accessed when handling kernel space
      SLB misses, so care should be taken there. However user SLB misses can
      access any kernel memory, which can be used to move some fields out of
      the paca (in later patches).
      
      User SLB misses could quite easily reconcile IRQs and set up a first
      class kernel environment and exit via ret_from_except, however that
      doesn't seem to be necessary at the moment, so we only do that if a
      bad fault is encountered.
      
      [ Credit to Aneesh for bug fixes, error checks, and improvements to bad
        address handling, etc ]
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      
      Since RFC:
      - Added MSR[RI] handling
      - Fixed up a register loss bug exposed by irq tracing (Aneesh)
      - Reject misses outside the defined kernel regions (Aneesh)
      - Added several more sanity checks and error handling (Aneesh), we may
        look at consolidating these tests and tightenig up the code but for
        a first pass we decided it's better to check carefully.
      
      Since v1:
      - Fixed SLB cache corruption (Aneesh)
      - Fixed untidy SLBE allocation "leak" in get_vsid error case
      - Now survives some stress testing on real hardware
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5e46e29e
    • N
      powerpc/64s/hash: remove the vmalloc segment from the bolted SLB · 85376e2a
      Nicholas Piggin 提交于
      Remove the vmalloc segment from bolted SLBEs. This is not required to
      be bolted, and seems like it was added to help pre-load the SLB on
      context switch. However there are now other segments like the vmemmap
      segment and non-zero node memory that often take misses after a context
      switch, so it is better to solve this in a more general way.
      
      A subsequent change will track free SLB entries and uses those rather
      than round-robin overwrite valid entries, which makes it far less
      likely for kernel SLBEs to be evicted after they are installed.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85376e2a
    • M
      powerpc/pseries: Dump the SLB contents on SLB MCE errors. · c6d15258
      Mahesh Salgaonkar 提交于
      If we get a machine check exceptions due to SLB errors then dump the
      current SLB contents which will be very much helpful in debugging the
      root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
      faulty SLB entries. In real mode mce handler saves the old SLB contents
      into this buffer accessible through paca and print it out later in virtual
      mode.
      
      With this patch the console will log SLB contents like below on SLB MCE
      errors:
      
      [  507.297236] SLB contents of cpu 0x1
      [  507.297237] Last SLB entry inserted at slot 16
      [  507.297238] 00 c000000008000000 400ea1b217000500
      [  507.297239]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
      [  507.297240] 01 d000000008000000 400d43642f000510
      [  507.297242]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297243] 11 f000000008000000 400a86c85f000500
      [  507.297244]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
      [  507.297245] 12 00007f0008000000 4008119624000d90
      [  507.297246]   1T  ESID=       7f  VSID=      8119624 LLP:110
      [  507.297247] 13 0000000018000000 00092885f5150d90
      [  507.297247]  256M ESID=        1  VSID=   92885f5150 LLP:110
      [  507.297248] 14 0000010008000000 4009e7cb50000d90
      [  507.297249]   1T  ESID=        1  VSID=      9e7cb50 LLP:110
      [  507.297250] 15 d000000008000000 400d43642f000510
      [  507.297251]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297252] 16 d000000008000000 400d43642f000510
      [  507.297253]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297253] ----------------------------------
      [  507.297254] SLB cache ptr value = 3
      [  507.297254] Valid SLB cache entries:
      [  507.297255] 00 EA[0-35]=    7f000
      [  507.297256] 01 EA[0-35]=        1
      [  507.297257] 02 EA[0-35]=     1000
      [  507.297257] Rest of SLB cache entries:
      [  507.297258] 03 EA[0-35]=    7f000
      [  507.297258] 04 EA[0-35]=        1
      [  507.297259] 05 EA[0-35]=     1000
      [  507.297260] 06 EA[0-35]=       12
      [  507.297260] 07 EA[0-35]=    7f000
      Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c6d15258
    • M
      powerpc/pseries: Display machine check error details. · 8f0b8056
      Mahesh Salgaonkar 提交于
      Extract the MCE error details from RTAS extended log and display it to
      console.
      
      With this patch you should now see mce logs like below:
      
      [  142.371818] Severe Machine check interrupt [Recovered]
      [  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
      [  142.371822]   Initiator: CPU
      [  142.371823]   Error type: SLB [Multihit]
      [  142.371824]     Effective address: d00000000ca70000
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8f0b8056
    • M
      powerpc/pseries: Flush SLB contents on SLB MCE errors. · a43c1590
      Mahesh Salgaonkar 提交于
      On pseries, as of today system crashes if we get a machine check
      exceptions due to SLB errors. These are soft errors and can be fixed
      by flushing the SLBs so the kernel can continue to function instead of
      system crash. We do this in real mode before turning on MMU. Otherwise
      we would run into nested machine checks. This patch now fetches the
      rtas error log in real mode and flushes the SLBs on SLB/ERAT errors.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichal Suchanek <msuchanek@suse.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a43c1590
    • M
      powerpc/pseries: Define MCE error event section. · 04fce21c
      Mahesh Salgaonkar 提交于
      On pseries, the machine check error details are part of RTAS extended
      event log passed under Machine check exception section. This patch adds
      the definition of rtas MCE event section and related helper
      functions.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      04fce21c
  5. 18 9月, 2018 1 次提交
    • M
      powerpc: Avoid code patching freed init sections · 51c3c62b
      Michael Neuling 提交于
      This stops us from doing code patching in init sections after they've
      been freed.
      
      In this chain:
        kvm_guest_init() ->
          kvm_use_magic_page() ->
            fault_in_pages_readable() ->
      	 __get_user() ->
      	   __get_user_nocheck() ->
      	     barrier_nospec();
      
      We have a code patching location at barrier_nospec() and
      kvm_guest_init() is an init function. This whole chain gets inlined,
      so when we free the init section (hence kvm_guest_init()), this code
      goes away and hence should no longer be patched.
      
      We seen this as userspace memory corruption when using a memory
      checker while doing partition migration testing on powervm (this
      starts the code patching post migration via
      /sys/kernel/mobility/migration). In theory, it could also happen when
      using /sys/kernel/debug/powerpc/barrier_nospec.
      
      Cc: stable@vger.kernel.org # 4.13+
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      51c3c62b
  6. 17 9月, 2018 2 次提交