1. 02 3月, 2016 5 次提交
    • C
      powerpc: Add the ability to save Altivec without giving it up · 6f515d84
      Cyril Bur 提交于
      This patch adds the ability to be able to save the VEC registers to the
      thread struct without giving up (disabling the facility) next time the
      process returns to userspace.
      
      This patch builds on a previous optimisation for the FPU registers in the
      thread copy path to avoid a possibly pointless reload of VEC state.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6f515d84
    • C
      powerpc: Add the ability to save FPU without giving it up · 8792468d
      Cyril Bur 提交于
      This patch adds the ability to be able to save the FPU registers to the
      thread struct without giving up (disabling the facility) next time the
      process returns to userspace.
      
      This patch optimises the thread copy path (as a result of a fork() or
      clone()) so that the parent thread can return to userspace with hot
      registers avoiding a possibly pointless reload of FPU register state.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8792468d
    • C
      powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two · de2a20aa
      Cyril Bur 提交于
      This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
      marking {fpu,altivec,vsx} as being unused by a thread.
      
      Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
      task switching can be made if these two operations are decoupled.
      save_all() will permit the saving of registers to thread structs and leave
      threads MSR with bits enabled.
      
      This patch introduces no functional change.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de2a20aa
    • C
      powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98
      Cyril Bur 提交于
      Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
      a problem unless a process is using these facilities.
      
      Modern versions of GCC are very good at automatically vectorising code,
      new and modernised workloads make use of floating point and vector
      facilities, even the kernel makes use of vectorised memcpy.
      
      All this combined greatly increases the cost of a syscall since the
      kernel uses the facilities sometimes even in syscall fast-path making it
      increasingly common for a thread to take an *_unavailable exception soon
      after a syscall, not to mention potentially taking all three.
      
      The obvious overcompensation to this problem is to simply always load
      all the facilities on every exit to userspace. Loading up all FPU, VEC
      and VSX registers every time can be expensive and if a workload does
      avoid using them, it should not be forced to incur this penalty.
      
      An 8bit counter is used to detect if the registers have been used in the
      past and the registers are always loaded until the value wraps to back
      to zero.
      
      Several versions of the assembly in entry_64.S were tested:
      
        1. Always calling C.
        2. Performing a common case check and then calling C.
        3. A complex check in asm.
      
      After some benchmarking it was determined that avoiding C in the common
      case is a performance benefit (option 2). The full check in asm (option
      3) greatly complicated that codepath for a negligible performance gain
      and the trade-off was deemed not worth it.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Move load_vec in the struct to fill an existing hole, reword change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      
      fixup
      70fe3d98
    • C
      powerpc: Explicitly disable math features when copying thread · d272f667
      Cyril Bur 提交于
      Currently when threads get scheduled off they always giveup the FPU,
      Altivec (VMX) and Vector (VSX) units if they were using them. When they are
      scheduled back on a fault is then taken to enable each facility and load
      registers. As a result explicitly disabling FPU/VMX/VSX has not been
      necessary.
      
      Future changes and optimisations remove this mandatory giveup and fault
      which could cause calls such as clone() and fork() to copy threads and run
      them later with FPU/VMX/VSX enabled but no registers loaded.
      
      This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
      threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
      registers must be loaded. This allows for a smarter return to userspace.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d272f667
  2. 14 12月, 2015 1 次提交
  3. 10 12月, 2015 3 次提交
  4. 02 12月, 2015 4 次提交
  5. 01 12月, 2015 9 次提交
  6. 23 11月, 2015 1 次提交
    • M
      powerpc/tm: Check for already reclaimed tasks · 7f821fc9
      Michael Neuling 提交于
      Currently we can hit a scenario where we'll tm_reclaim() twice.  This
      results in a TM bad thing exception because the second reclaim occurs
      when not in suspend mode.
      
      The scenario in which this can happen is the following.  We attempt to
      deliver a signal to userspace.  To do this we need obtain the stack
      pointer to write the signal context.  To get this stack pointer we
      must tm_reclaim() in case we need to use the checkpointed stack
      pointer (see get_tm_stackpointer()).  Normally we'd then return
      directly to userspace to deliver the signal without going through
      __switch_to().
      
      Unfortunatley, if at this point we get an error (such as a bad
      userspace stack pointer), we need to exit the process.  The exit will
      result in a __switch_to().  __switch_to() will attempt to save the
      process state which results in another tm_reclaim().  This
      tm_reclaim() now causes a TM Bad Thing exception as this state has
      already been saved and the processor is no longer in TM suspend mode.
      Whee!
      
      This patch checks the state of the MSR to ensure we are TM suspended
      before we attempt the tm_reclaim().  If we've already saved the state
      away, we should no longer be in TM suspend mode.  This has the
      additional advantage of checking for a potential TM Bad Thing
      exception.
      
      Found using syscall fuzzer.
      
      Fixes: fb09692e ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes")
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7f821fc9
  7. 16 7月, 2015 1 次提交
  8. 14 7月, 2015 1 次提交
  9. 07 6月, 2015 1 次提交
  10. 20 3月, 2015 1 次提交
  11. 17 11月, 2014 1 次提交
    • V
      powerpc: Use generic PIE randomization · 59994fb0
      Vineeth Vijayan 提交于
      Back in 2009 we merged 501cb16d "Randomise PIEs", which added support for
      randomizing PIE (Position Independent Executable) binaries.
      
      That commit added randomize_et_dyn(), which correctly randomized the addresses,
      but failed to honor PF_RANDOMIZE. That means it was not possible to disable PIE
      randomization via the personality flag, or /proc/sys/kernel/randomize_va_space.
      
      Since then there has been generic support for PIE randomization added to
      binfmt_elf.c, selectable via ARCH_BINFMT_ELF_RANDOMIZE_PIE.
      
      Enabling that allows us to drop randomize_et_dyn(), which means we start
      honoring PF_RANDOMIZE correctly.
      
      It also causes a fairly major change to how we layout PIE binaries.
      
      Currently we will place the binary at 512MB-520MB for 32 bit binaries, or
      512MB-1.5GB for 64 bit binaries, eg:
      
          $ cat /proc/$$/maps
          4e550000-4e580000 r-xp 00000000 08:02 129813       /bin/dash
          4e580000-4e590000 rw-p 00020000 08:02 129813       /bin/dash
          10014110000-10014140000 rw-p 00000000 00:00 0      [heap]
          3fffaa3f0000-3fffaa5a0000 r-xp 00000000 08:02 921  /lib/powerpc64le-linux-gnu/libc-2.19.so
          3fffaa5a0000-3fffaa5b0000 rw-p 001a0000 08:02 921  /lib/powerpc64le-linux-gnu/libc-2.19.so
          3fffaa5c0000-3fffaa5d0000 rw-p 00000000 00:00 0
          3fffaa5d0000-3fffaa5f0000 r-xp 00000000 00:00 0    [vdso]
          3fffaa5f0000-3fffaa620000 r-xp 00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
          3fffaa620000-3fffaa630000 rw-p 00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
          3ffffc340000-3ffffc370000 rw-p 00000000 00:00 0    [stack]
      
      With this commit applied we don't do any special randomisation for the binary,
      and instead rely on mmap randomisation. This means the binary ends up at high
      addresses, eg:
      
          $ cat /proc/$$/maps
          3fff99820000-3fff999d0000 r-xp 00000000 08:02 921    /lib/powerpc64le-linux-gnu/libc-2.19.so
          3fff999d0000-3fff999e0000 rw-p 001a0000 08:02 921    /lib/powerpc64le-linux-gnu/libc-2.19.so
          3fff999f0000-3fff99a00000 rw-p 00000000 00:00 0
          3fff99a00000-3fff99a20000 r-xp 00000000 00:00 0      [vdso]
          3fff99a20000-3fff99a50000 r-xp 00000000 08:02 1246   /lib/powerpc64le-linux-gnu/ld-2.19.so
          3fff99a50000-3fff99a60000 rw-p 00020000 08:02 1246   /lib/powerpc64le-linux-gnu/ld-2.19.so
          3fff99a60000-3fff99a90000 r-xp 00000000 08:02 129813 /bin/dash
          3fff99a90000-3fff99aa0000 rw-p 00020000 08:02 129813 /bin/dash
          3fffc3de0000-3fffc3e10000 rw-p 00000000 00:00 0      [stack]
          3fffc55e0000-3fffc5610000 rw-p 00000000 00:00 0      [heap]
      
      Although this should be OK, it's possible it might break badly written
      binaries that make assumptions about the address space layout.
      Signed-off-by: NVineeth Vijayan <vvijayan@mvista.com>
      [mpe: Rewrite changelog]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      59994fb0
  12. 10 11月, 2014 1 次提交
  13. 05 11月, 2014 1 次提交
  14. 03 11月, 2014 1 次提交
    • C
      powerpc: Replace __get_cpu_var uses · 69111bac
      Christoph Lameter 提交于
      This still has not been merged and now powerpc is the only arch that does
      not have this change. Sorry about missing linuxppc-dev before.
      
      V2->V2
        - Fix up to work against 3.18-rc1
      
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [mpe: Fix build errors caused by set/or_softirq_pending(), and rework
            assignment in __set_breakpoint() to use memcpy().]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69111bac
  15. 15 10月, 2014 2 次提交
    • A
      powerpc: Rename __get_SP() to current_stack_pointer() · acf620ec
      Anton Blanchard 提交于
      Michael points out that __get_SP() is a pretty horrible
      function name. Let's give it a better name.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      acf620ec
    • A
      powerpc: Reimplement __get_SP() as a function not a define · bfe9a2cf
      Anton Blanchard 提交于
      Li Zhong points out an issue with our current __get_SP()
      implementation. If ftrace function tracing is enabled (ie -pg
      profiling using _mcount) we spill a stack frame on 64bit all the
      time.
      
      If a function calls __get_SP() and later calls a function that is
      tail call optimised, we will pop the stack frame and the value
      returned by __get_SP() is no longer valid. An example from Li can
      be found in save_stack_trace -> save_context_stack:
      
      c0000000000432c0 <.save_stack_trace>:
      c0000000000432c0:       mflr    r0
      c0000000000432c4:       std     r0,16(r1)
      c0000000000432c8:       stdu    r1,-128(r1) <-- stack frame for _mcount
      c0000000000432cc:       std     r3,112(r1)
      c0000000000432d0:       bl      <._mcount>
      c0000000000432d4:       nop
      
      c0000000000432d8:       mr      r4,r1 <-- __get_SP()
      
      c0000000000432dc:       ld      r5,632(r13)
      c0000000000432e0:       ld      r3,112(r1)
      c0000000000432e4:       li      r6,1
      
      c0000000000432e8:       addi    r1,r1,128 <-- pop stack frame
      
      c0000000000432ec:       ld      r0,16(r1)
      c0000000000432f0:       mtlr    r0
      c0000000000432f4:       b       <.save_context_stack> <-- tail call optimized
      
      save_context_stack ends up with a stack pointer below the current
      one, and it is likely to be scribbled over.
      
      Fix this by making __get_SP() a function which returns the
      callers stack frame. Also replace inline assembly which grabs
      the stack pointer in save_stack_trace and show_stack with
      __get_SP().
      
      This also fixes an issue with perf_arch_fetch_caller_regs().
      It currently unwinds the stack once, which will skip a
      valid stack frame on a leaf function. With the __get_SP() fixes
      in this patch, we never need to unwind the stack frame to get
      to the first interesting frame.
      
      We have to export __get_SP() because perf_arch_fetch_caller_regs()
      (which is used in modules) calls it from a header file.
      Reported-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bfe9a2cf
  16. 25 9月, 2014 1 次提交
  17. 27 8月, 2014 2 次提交
    • T
      Revert "powerpc: Replace __get_cpu_var uses" · 23f66e2d
      Tejun Heo 提交于
      This reverts commit 5828f666 due to
      build failure after merging with pending powerpc changes.
      
      Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.auSigned-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      23f66e2d
    • C
      powerpc: Replace __get_cpu_var uses · 5828f666
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      tj: Folded a fix patch.
          http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5828f666
  18. 05 8月, 2014 1 次提交
  19. 28 7月, 2014 2 次提交
  20. 11 6月, 2014 1 次提交
    • S
      powerpc: Correct DSCR during TM context switch · 96d01610
      Sam bobroff 提交于
      Correct the DSCR SPR becoming temporarily corrupted if a task is
      context switched during a transaction.
      
      The problem occurs while suspending the task and is caused by saving
      the DSCR to thread.dscr after it has already been set to the CPU's
      default value:
      
      __switch_to() calls __switch_to_tm()
      	which calls tm_reclaim_task()
      	which calls tm_reclaim_thread()
      	which calls tm_reclaim()
      		where the DSCR is set to the CPU's default
      __switch_to() calls _switch()
      		where thread.dscr is set to the DSCR
      
      When the task is resumed, it's transaction will be doomed (as usual)
      and the DSCR SPR will be corrupted, although the checkpointed value
      will be correct. Therefore the DSCR will be immediately corrected by
      the transaction aborting, unless it has been suspended. In that case
      the incorrect value can be seen by the task until it resumes the
      transaction.
      
      The fix is to treat the DSCR similarly to the TAR and save it early
      in __switch_to().
      
      A program exposing the problem is added to the kernel self tests as:
      tools/testing/selftests/powerpc/tm/tm-resched-dscr.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      CC: <stable@vger.kernel.org> [v3.10+]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      96d01610