1. 20 6月, 2016 1 次提交
    • M
      KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt · fd7bacbc
      Mahesh Salgaonkar 提交于
      When a guest is assigned to a core it converts the host Timebase (TB)
      into guest TB by adding guest timebase offset before entering into
      guest. During guest exit it restores the guest TB to host TB. This means
      under certain conditions (Guest migration) host TB and guest TB can differ.
      
      When we get an HMI for TB related issues the opal HMI handler would
      try fixing errors and restore the correct host TB value. With no guest
      running, we don't have any issues. But with guest running on the core
      we run into TB corruption issues.
      
      If we get an HMI while in the guest, the current HMI handler invokes opal
      hmi handler before forcing guest to exit. The guest exit path subtracts
      the guest TB offset from the current TB value which may have already
      been restored with host value by opal hmi handler. This leads to incorrect
      host and guest TB values.
      
      With split-core, things become more complex. With split-core, TB also gets
      split and each subcore gets its own TB register. When a hmi handler fixes
      a TB error and restores the TB value, it affects all the TB values of
      sibling subcores on the same core. On TB errors all the thread in the core
      gets HMI. With existing code, the individual threads call opal hmi handle
      independently which can easily throw TB out of sync if we have guest
      running on subcores. Hence we will need to co-ordinate with all the
      threads before making opal hmi handler call followed by TB resync.
      
      This patch introduces a sibling subcore state structure (shared by all
      threads in the core) in paca which holds information about whether sibling
      subcores are in Guest mode or host mode. An array in_guest[] of size
      MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
      The subcore id is used as index into in_guest[] array. Only primary
      thread entering/exiting the guest is responsible to set/unset its
      designated array element.
      
      On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
      this patch will now force guest to vacate the core/subcore. Primary
      thread from each subcore will then turn off its respective bit
      from the above bitmap during the guest exit path just after the
      guest->host partition switch is complete.
      
      All other threads that have just exited the guest OR were already in host
      will wait until all other subcores clears their respective bit.
      Once all the subcores turn off their respective bit, all threads will
      will make call to opal hmi handler.
      
      It is not necessary that opal hmi handler would resync the TB value for
      every HMI interrupts. It would do so only for the HMI caused due to
      TB errors. For rest, it would not touch TB value. Hence to make things
      simpler, primary thread would call TB resync explicitly once for each
      core immediately after opal hmi handler instead of subtracting guest
      offset from TB. TB resync call will restore the TB with host value.
      Thus we can be sure about the TB state.
      
      One of the primary threads exiting the guest will take up the
      responsibility of calling TB resync. It will use one of the top bits
      (bit 63) from subcore state flags bitmap to make the decision. The first
      primary thread (among the subcores) that is able to set the bit will
      have to call the TB resync. Rest all other threads will wait until TB
      resync is complete.  Once TB resync is complete all threads will then
      proceed.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      fd7bacbc
  2. 24 5月, 2016 1 次提交
  3. 21 5月, 2016 1 次提交
    • J
      exit_thread: remove empty bodies · 5f56a5df
      Jiri Slaby 提交于
      Define HAVE_EXIT_THREAD for archs which want to do something in
      exit_thread. For others, let's define exit_thread as an empty inline.
      
      This is a cleanup before we change the prototype of exit_thread to
      accept a task parameter.
      
      [akpm@linux-foundation.org: fix mips]
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f56a5df
  4. 12 5月, 2016 5 次提交
  5. 11 5月, 2016 15 次提交
  6. 06 5月, 2016 1 次提交
  7. 01 5月, 2016 4 次提交
  8. 28 4月, 2016 1 次提交
  9. 27 4月, 2016 3 次提交
    • T
      ftrace: Match dot symbols when searching functions on ppc64 · 7132e2d6
      Thiago Jung Bauermann 提交于
      In the ppc64 big endian ABI, function symbols point to function
      descriptors. The symbols which point to the function entry points
      have a dot in front of the function name. Consequently, when the
      ftrace filter mechanism searches for the symbol corresponding to
      an entry point address, it gets the dot symbol.
      
      As a result, ftrace filter users have to be aware of this ABI detail on
      ppc64 and prepend a dot to the function name when setting the filter.
      
      The perf probe command insulates the user from this by ignoring the dot
      in front of the symbol name when matching function names to symbols,
      but the sysfs interface does not. This patch makes the ftrace filter
      mechanism do the same when searching symbols.
      
      Fixes the following failure in ftracetest's kprobe_ftrace.tc:
      
        .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument
      
      That failure is on this line of kprobe_ftrace.tc:
      
        echo _do_fork > set_ftrace_filter
      
      This is because there's no _do_fork entry in the functions list:
      
        # cat available_filter_functions | grep _do_fork
        ._do_fork
      
      This change introduces no regressions on the perf and ftracetest
      testsuite results.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7132e2d6
    • C
      powerpc: Add support for userspace P9 copy paste · 8a649045
      Chris Smart 提交于
      The copy paste facility introduced in POWER9 provides an optimised
      mechanism for a userspace application to copy a cacheline. This is
      provided by a pair of instructions, copy and paste, while a third,
      cp_abort (copy paste abort), provides a clean up of the state in case of
      a failure.
      
      The copy instruction will read a 128 byte cacheline and store it in an
      internal buffer. The subsequent paste instruction will store this
      internal buffer to memory and set a CR field if the paste succeeds.
      
      Since the state of the copy paste buffer is internal (and not
      architecturally visible), in the unlikely event of a context switch, the
      state cannot be stored and the paste should therefore fail.
      
      The cp_abort instruction exists to fail and clean up any such
      interrupted copy paste sequence and is to be called by the kernel as
      part of the context switch. Doing so prevents data from a preceding copy
      in one process leaking into the paste of another.
      
      This code enables use of the cp_abort instruction if a supported
      processor is detected.
      
      NOTE: this is for userspace only, not in kernel, and does not deal
      with KVM guests.
      
      Patch created with much assistance from Michael Neuling
      <mikey@neuling.org>
      Signed-off-by: NChris Smart <chris@distroguy.com>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8a649045
    • A
      powerpc/eeh: fix misleading indentation · 2d521784
      Andrew Donnellan 提交于
      Found by smatch.
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2d521784
  10. 21 4月, 2016 2 次提交
    • H
      powerpc/book3s64: Remove __end_handlers marker · 057b6d7e
      Hari Bathini 提交于
      The __end_handlers marker was intended to mark down upto code that gets
      called from exception prologs. But that hasn't kept pace with code
      changes. Case in point, slb_miss_realmode being called from exception
      prolog code but isn't below __end_handlers marker. So, __end_handlers
      marker is as good as a comment but could be misleading at times if it
      isn't in sync with the code, as is the case now. So, let us avoid this
      confusion by having a better comment and removing __end_handlers marker
      altogether.
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      057b6d7e
    • H
      powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel · 8ed8ab40
      Hari Bathini 提交于
      Some of the interrupt vectors on 64-bit POWER server processors are only
      32 bytes long (8 instructions), which is not enough for the full
      first-level interrupt handler. For these we need to branch to an
      out-of-line (OOL) handler. But when we are running a relocatable kernel,
      interrupt vectors till __end_interrupts marker are copied down to real
      address 0x100. So, branching to labels (ie. OOL handlers) outside this
      section must be handled differently (see LOAD_HANDLER()), considering
      relocatable kernel, which would need at least 4 instructions.
      
      However, branching from interrupt vector means that we corrupt the
      CFAR (come-from address register) on POWER7 and later processors as
      mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
      that contains the part up to the point where the CFAR is saved in the
      PACA should be part of the short interrupt vectors before we branch out
      to OOL handlers.
      
      But as mentioned already, there are interrupt vectors on 64-bit POWER
      server processors that are only 32 bytes long (like vectors 0x4f00,
      0x4f20, etc.), which cannot accomodate the above two cases at the same
      time owing to space constraint. Currently, in these interrupt vectors,
      we simply branch out to OOL handlers, without using LOAD_HANDLER(),
      which leaves us vulnerable when running a relocatable kernel (eg. kdump
      case). While this has been the case for sometime now and kdump is used
      widely, we were fortunate not to see any problems so far, for three
      reasons:
      
        1. In almost all cases, production kernel (relocatable) is used for
           kdump as well, which would mean that crashed kernel's OOL handler
           would be at the same place where we end up branching to, from short
           interrupt vector of kdump kernel.
        2. Also, OOL handler was unlikely the reason for crash in almost all
           the kdump scenarios, which meant we had a sane OOL handler from
           crashed kernel that we branched to.
        3. On most 64-bit POWER server processors, page size is large enough
           that marking interrupt vector code as executable (see commit
           429d2e83) leads to marking OOL handler code from crashed kernel,
           that sits right below interrupt vector code from kdump kernel, as
           executable as well.
      
      Let us fix this by moving the __end_interrupts marker down past OOL
      handlers to make sure that we also copy OOL handlers to real address
      0x100 when running a relocatable kernel.
      
      This fix has been tested successfully in kdump scenario, on an LPAR with
      4K page size by using different default/production kernel and kdump
      kernel.
      
      Also tested by manually corrupting the OOL handlers in the first kernel
      and then kdump'ing, and then causing the OOL handlers to fire - mpe.
      
      Fixes: c1fb6816 ("powerpc: Add relocation on exception vector handlers")
      Cc: stable@vger.kernel.org
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8ed8ab40
  11. 18 4月, 2016 3 次提交
    • A
      powerpc: Update TM user feature bits in scan_features() · 4705e024
      Anton Blanchard 提交于
      We need to update the user TM feature bits (PPC_FEATURE2_HTM and
      PPC_FEATURE2_HTM) to mirror what we do with the kernel TM feature
      bit.
      
      At the moment, if firmware reports TM is not available we turn off
      the kernel TM feature bit but leave the userspace ones on. Userspace
      thinks it can execute TM instructions and it dies trying.
      
      This (together with a QEMU patch) fixes PR KVM, which doesn't currently
      support TM.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4705e024
    • A
      powerpc: Update cpu_user_features2 in scan_features() · beff8237
      Anton Blanchard 提交于
      scan_features() updates cpu_user_features but not cpu_user_features2.
      
      Amongst other things, cpu_user_features2 contains the user TM feature
      bits which we must keep in sync with the kernel TM feature bit.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      beff8237
    • A
      powerpc: scan_features() updates incorrect bits for REAL_LE · 6997e57d
      Anton Blanchard 提交于
      The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
      feature value, meaning all the remaining elements initialise the wrong
      values.
      
      This means instead of checking for byte 5, bit 0, we check for byte 0,
      bit 0, and then we incorrectly set the CPU feature bit as well as MMU
      feature bit 1 and CPU user feature bits 0 and 2 (5).
      
      Checking byte 0 bit 0 (IBM numbering), means we're looking at the
      "Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
      In practice that bit is set on all platforms which have the property.
      
      This means we set CPU_FTR_REAL_LE always. In practice that seems not to
      matter because all the modern cpus which have this property also
      implement REAL_LE, and we've never needed to disable it.
      
      We're also incorrectly setting MMU feature bit 1, which is:
      
        #define MMU_FTR_TYPE_8xx		0x00000002
      
      Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
      code, which can't run on the same cpus as scan_features(). So this also
      doesn't matter in practice.
      
      Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
      is not currently used, and bit 0 is:
      
        #define PPC_FEATURE_PPC_LE		0x00000001
      
      Which says the CPU supports the old style "PPC Little Endian" mode.
      Again this should be harmless in practice as no 64-bit CPUs implement
      that mode.
      
      Fix the code by adding the missing initialisation of the MMU feature.
      
      Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
      would be unsafe to start using it as old kernels incorrectly set it.
      
      Fixes: 44ae3ab3 ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      [mpe: Flesh out changelog, add comment reserving 0x4]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6997e57d
  12. 14 4月, 2016 2 次提交
    • M
      powerpc/livepatch: Add live patching support on ppc64le · 85baa095
      Michael Ellerman 提交于
      Add the kconfig logic & assembly support for handling live patched
      functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
      depends on the new -mprofile-kernel ftrace ABI, which is only supported
      currently on ppc64le.
      
      Live patching is handled by a special ftrace handler. This means it runs
      from ftrace_caller(). The live patch handler modifies the NIP so as to
      redirect the return from ftrace_caller() to the new patched function.
      
      However there is one particularly tricky case we need to handle.
      
      If a function A calls another function B, and it is known at link time
      that they share the same TOC, then A will not save or restore its TOC,
      and will call the local entry point of B.
      
      When we live patch B, we replace it with a new function C, which may
      not have the same TOC as A. At live patch time it's too late to modify A
      to do the TOC save/restore, so the live patching code must interpose
      itself between A and C, and do the TOC save/restore that A omitted.
      
      An additionaly complication is that the livepatch code can not create a
      stack frame in order to save the TOC. That is because if C takes > 8
      arguments, or is varargs, A will have written the arguments for C in
      A's stack frame.
      
      To solve this, we introduce a "livepatch stack" which grows upward from
      the base of the regular stack, and is used to store the TOC & LR when
      calling a live patched function.
      
      When the patched function returns, we retrieve the real LR & TOC from
      the livepatch stack, restore them, and pop the livepatch "stack frame".
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      85baa095
    • M
      powerpc/livepatch: Add livepatch stack to struct thread_info · 5d31a96e
      Michael Ellerman 提交于
      In order to support live patching we need to maintain an alternate
      stack of TOC & LR values. We use the base of the stack for this, and
      store the "live patch stack pointer" in struct thread_info.
      
      Unlike the other fields of thread_info, we can not statically initialise
      that value, so it must be done at run time.
      
      This patch just adds the code to support that, it is not enabled until
      the next patch which actually adds live patch support.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      5d31a96e
  13. 12 4月, 2016 1 次提交