1. 28 7月, 2014 1 次提交
  2. 11 6月, 2014 1 次提交
    • S
      powerpc: Correct DSCR during TM context switch · 96d01610
      Sam bobroff 提交于
      Correct the DSCR SPR becoming temporarily corrupted if a task is
      context switched during a transaction.
      
      The problem occurs while suspending the task and is caused by saving
      the DSCR to thread.dscr after it has already been set to the CPU's
      default value:
      
      __switch_to() calls __switch_to_tm()
      	which calls tm_reclaim_task()
      	which calls tm_reclaim_thread()
      	which calls tm_reclaim()
      		where the DSCR is set to the CPU's default
      __switch_to() calls _switch()
      		where thread.dscr is set to the DSCR
      
      When the task is resumed, it's transaction will be doomed (as usual)
      and the DSCR SPR will be corrupted, although the checkpointed value
      will be correct. Therefore the DSCR will be immediately corrected by
      the transaction aborting, unless it has been suspended. In that case
      the incorrect value can be seen by the task until it resumes the
      transaction.
      
      The fix is to treat the DSCR similarly to the TAR and save it early
      in __switch_to().
      
      A program exposing the problem is added to the kernel self tests as:
      tools/testing/selftests/powerpc/tm/tm-resched-dscr.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      CC: <stable@vger.kernel.org> [v3.10+]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      96d01610
  3. 28 5月, 2014 1 次提交
    • S
      powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e
      Sam bobroff 提交于
      Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
      it is no longer possible to set the DSCR on a per-CPU basis.
      
      The old behaviour was to minipulate the DSCR SPR directly but this is no
      longer sufficient: the value is quickly overwritten by context switching.
      
      This patch stores the per-CPU DSCR value in a kernel variable rather than
      directly in the SPR and it is used whenever a process has not set the DSCR
      itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.
      
      Writes to the old global default (/sys/devices/system/cpu/dscr_default)
      now set all of the per-CPU values and reads return the last written value.
      
      The new per-CPU default is added to the paca_struct and is used everywhere
      outside of sysfs.c instead of the old global default.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1739ea9e
  4. 23 4月, 2014 6 次提交
  5. 15 1月, 2014 2 次提交
    • P
      powerpc: Don't corrupt transactional state when using FP/VMX in kernel · d31626f7
      Paul Mackerras 提交于
      Currently, when we have a process using the transactional memory
      facilities on POWER8 (that is, the processor is in transactional
      or suspended state), and the process enters the kernel and the
      kernel then uses the floating-point or vector (VMX/Altivec) facility,
      we end up corrupting the user-visible FP/VMX/VSX state.  This
      happens, for example, if a page fault causes a copy-on-write
      operation, because the copy_page function will use VMX to do the
      copy on POWER8.  The test program below demonstrates the bug.
      
      The bug happens because when FP/VMX state for a transactional process
      is stored in the thread_struct, we store the checkpointed state in
      .fp_state/.vr_state and the transactional (current) state in
      .transact_fp/.transact_vr.  However, when the kernel wants to use
      FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
      which saves the current state in .fp_state/.vr_state.  Furthermore,
      when we return to the user process we return with FP/VMX/VSX
      disabled.  The next time the process uses FP/VMX/VSX, we don't know
      which set of state (the current register values, .fp_state/.vr_state,
      or .transact_fp/.transact_vr) we should be using, since we have no
      way to tell if we are still in the same transaction, and if not,
      whether the previous transaction succeeded or failed.
      
      Thus it is necessary to strictly adhere to the rule that if FP has
      been enabled at any point in a transaction, we must keep FP enabled
      for the user process with the current transactional state in the
      FP registers, until we detect that it is no longer in a transaction.
      Similarly for VMX; once enabled it must stay enabled until the
      process is no longer transactional.
      
      In order to keep this rule, we add a new thread_info flag which we
      test when returning from the kernel to userspace, called TIF_RESTORE_TM.
      This flag indicates that there is FP/VMX/VSX state to be restored
      before entering userspace, and when it is set the .tm_orig_msr field
      in the thread_struct indicates what state needs to be restored.
      The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
      bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
      which are called from enable_kernel_fp/altivec, giveup_vsx, and
      flush_fp/altivec_to_thread instead of giveup_fpu/altivec.
      
      The other thing to be done is to get the transactional FP/VMX/VSX
      state from .fp_state/.vr_state when doing reclaim, if that state
      has been saved there by giveup_fpu/altivec_maybe_transactional.
      Having done this, we set the FP/VMX bit in the thread's MSR after
      reclaim to indicate that that part of the state is now valid
      (having been reclaimed from the processor's checkpointed state).
      
      Finally, in the signal handling code, we move the clearing of the
      transactional state bits in the thread's MSR a bit earlier, before
      calling flush_fp_to_thread(), so that we don't unnecessarily set
      the TIF_RESTORE_TM bit.
      
      This is the test program:
      
      /* Michael Neuling 4/12/2013
       *
       * See if the altivec state is leaked out of an aborted transaction due to
       * kernel vmx copy loops.
       *
       *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
       *
       */
      
      /* We don't use all of these, but for reference: */
      
      int main(int argc, char *argv[])
      {
      	long double vecin = 1.3;
      	long double vecout;
      	unsigned long pgsize = getpagesize();
      	int i;
      	int fd;
      	int size = pgsize*16;
      	char tmpfile[] = "/tmp/page_faultXXXXXX";
      	char buf[pgsize];
      	char *a;
      	uint64_t aborted = 0;
      
      	fd = mkstemp(tmpfile);
      	assert(fd >= 0);
      
      	memset(buf, 0, pgsize);
      	for (i = 0; i < size; i += pgsize)
      		assert(write(fd, buf, pgsize) == pgsize);
      
      	unlink(tmpfile);
      
      	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      	assert(a != MAP_FAILED);
      
      	asm __volatile__(
      		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
      		TBEGIN
      		"beq	3f ;"
      		TSUSPEND
      		"xxlxor 40,40,40 ; " // set 40 to 0
      		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
      		TABORT
      		TRESUME
      		TEND
      		"li	%[res], 0 ;"
      		"b	5f ;"
      		"3: ;" // Abort handler
      		"li	%[res], 1 ;"
      		"5: ;"
      		"stxvd2x 40,0,%[vecoutptr] ; "
      		: [res]"=r"(aborted)
      		: [vecinptr]"r"(&vecin),
      		  [vecoutptr]"r"(&vecout),
      		  [map]"r"(a)
      		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");
      
      	if (aborted && (vecin != vecout)){
      		printf("FAILED: vector state leaked on abort %f != %f\n",
      		       (double)vecin, (double)vecout);
      		exit(1);
      	}
      
      	munmap(a, size);
      
      	close(fd);
      
      	printf("PASSED!\n");
      	return 0;
      }
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d31626f7
    • M
      Move precessing of MCE queued event out from syscall exit path. · 30c82635
      Mahesh Salgaonkar 提交于
      Huge Dickins reported an issue that b5ff4211
      "powerpc/book3s: Queue up and process delayed MCE events" breaks the
      PowerMac G5 boot. This patch fixes it by moving the mce even processing
      away from syscall exit, which was wrong to do that in first place, and
      using irq work framework to delay processing of mce event.
      
      Reported-by: Hugh Dickins <hughd@google.com
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      30c82635
  6. 05 12月, 2013 1 次提交
  7. 06 11月, 2013 1 次提交
  8. 11 10月, 2013 2 次提交
  9. 27 8月, 2013 1 次提交
  10. 14 8月, 2013 2 次提交
  11. 09 8月, 2013 2 次提交
    • M
      powerpc: Save the TAR register earlier · c2d52644
      Michael Neuling 提交于
      This moves us to save the Target Address Register (TAR) a earlier in
      __switch_to.  It introduces a new function save_tar() to do this.
      
      We need to save the TAR earlier as we will overwrite it in the transactional
      memory reclaim/recheckpoint path.  We are going to do this in a subsequent
      patch which will fix saving the TAR register when it's modified inside a
      transaction.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: <stable@vger.kernel.org> [v3.10]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c2d52644
    • M
      powerpc: Fix context switch DSCR on POWER8 · 2517617e
      Michael Neuling 提交于
      POWER8 allows the DSCR to be accessed directly from userspace via a new SPR
      number 0x3 (Rather than 0x11.  DSCR SPR number 0x11 is still used on POWER8 but
      like POWER7, is only accessible in HV and OS modes).  Currently, we allow this
      by setting H/FSCR DSCR bit on boot.
      
      Unfortunately this doesn't work, as the kernel needs to see the DSCR change so
      that it knows to no longer restore the system wide version of DSCR on context
      switch (ie. to set thread.dscr_inherit).
      
      This clears the H/FSCR DSCR bit initially.  If a process then accesses the DSCR
      (via SPR 0x3), it'll trap into the kernel where we set thread.dscr_inherit in
      facility_unavailable_exception().
      
      We also change _switch() so that we set or clear the H/FSCR DSCR bit based on
      the thread.dscr_inherit.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: <stable@vger.kernel.org> [v3.10]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2517617e
  12. 20 6月, 2013 1 次提交
    • B
      powerpc: Restore dbcr0 on user space exit · 13d543cd
      Bharat Bhushan 提交于
      On BookE (Branch taken + Single Step) is as same as Branch Taken
      on BookS and in Linux we simulate BookS behavior for BookE as well.
      When doing so, in Branch taken handling we want to set DBCR0_IC but
      we update the current->thread->dbcr0 and not DBCR0.
      
      Now on 64bit the current->thread.dbcr0 (and other debug registers)
      is synchronized ONLY on context switch flow. But after handling
      Branch taken in debug exception if we return back to user space
      without context switch then single stepping change (DBCR0_ICMP)
      does not get written in h/w DBCR0 and Instruction Complete exception
      does not happen.
      
      This fixes using ptrace reliably on BookE-PowerPC
      
      lmbench latency test (lat_syscall) Results are (they varies a little
      on each run)
      
      1) ./lat_syscall <action> /dev/shm/uImage
      
      action:	Open	read	write	stat	fstat	null
      Before:	3.8618	0.2017	0.2851	1.6789	0.2256	0.0856
      After:	3.8580	0.2017	0.2851	1.6955	0.2255	0.0856
      
      1) ./lat_syscall -P 2 -N 10 <action> /dev/shm/uImage
      action:	Open	read	write	stat	fstat	null
      Before:	4.1388	0.2238	0.3066	1.7106	0.2256	0.0856
      After:	4.1413	0.2236	0.3062	1.7107	0.2256	0.0856
      
      [ Slightly modified to avoid extra branch in the fast path
        on Book3S and fix build on all non-BookE 64-bit -- BenH
      ]
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      13d543cd
  13. 10 6月, 2013 1 次提交
  14. 01 6月, 2013 1 次提交
  15. 24 5月, 2013 1 次提交
  16. 14 5月, 2013 2 次提交
  17. 02 5月, 2013 2 次提交
  18. 15 4月, 2013 2 次提交
    • K
      powerpc: add a missing label in resume_kernel · d8b92292
      Kevin Hao 提交于
      A label 0 was missed in the patch a9c4e541 (powerpc/kprobe: Complete
      kprobe and migrate exception frame). This will cause the kernel
      branch to an undetermined address if there really has a conflict when
      updating the thread flags.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Cc: stable@vger.kernel.org
      Acked-By: NTiejun Chen <tiejun.chen@windriver.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      d8b92292
    • A
      powerpc: Fix audit crash due to save/restore PPR changes · 05e38e5d
      Alistair Popple 提交于
      The current mainline crashes when hitting userspace with the following:
      
      kernel BUG at kernel/auditsc.c:1769!
      cpu 0x1: Vector: 700 (Program Check) at [c000000023883a60]
          pc: c0000000001047a8: .__audit_syscall_entry+0x38/0x130
          lr: c00000000000ed64: .do_syscall_trace_enter+0xc4/0x270
          sp: c000000023883ce0
         msr: 8000000000029032
        current = 0xc000000023800000
        paca    = 0xc00000000f080380   softe: 0        irq_happened: 0x01
          pid   = 1629, comm = start_udev
      kernel BUG at kernel/auditsc.c:1769!
      enter ? for help
      [c000000023883d80] c00000000000ed64 .do_syscall_trace_enter+0xc4/0x270
      [c000000023883e30] c000000000009b08 syscall_dotrace+0xc/0x38
       --- Exception: c00 (System Call) at 0000008010ec50dc
      
      Bisecting found the following patch caused it:
      
      commit 44e9309f
      Author: Haren Myneni <haren@linux.vnet.ibm.com>
      powerpc: Implement PPR save/restore
      
      It was found this patch corrupted r9 when calling
      SET_DEFAULT_THREAD_PPR()
      
      Using r10 as a scratch register instead of r9 solved the problem.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      05e38e5d
  19. 11 4月, 2013 1 次提交
  20. 15 2月, 2013 1 次提交
  21. 08 2月, 2013 1 次提交
  22. 29 1月, 2013 1 次提交
  23. 28 1月, 2013 1 次提交
    • F
      cputime: Generic on-demand virtual cputime accounting · abf917cd
      Frederic Weisbecker 提交于
      If we want to stop the tick further idle, we need to be
      able to account the cputime without using the tick.
      
      Virtual based cputime accounting solves that problem by
      hooking into kernel/user boundaries.
      
      However implementing CONFIG_VIRT_CPU_ACCOUNTING require
      low level hooks and involves more overhead. But we already
      have a generic context tracking subsystem that is required
      for RCU needs by archs which plan to shut down the tick
      outside idle.
      
      This patch implements a generic virtual based cputime
      accounting that relies on these generic kernel/user hooks.
      
      There are some upsides of doing this:
      
      - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
      if context tracking is already built (already necessary for RCU in full
      tickless mode).
      
      - We can rely on the generic context tracking subsystem to dynamically
      (de)activate the hooks, so that we can switch anytime between virtual
      and tick based accounting. This way we don't have the overhead
      of the virtual accounting when the tick is running periodically.
      
      And one downside:
      
      - There is probably more overhead than a native virtual based cputime
      accounting. But this relies on hooks that are already set anyway.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      abf917cd
  24. 10 1月, 2013 3 次提交
    • H
      powerpc: Implement PPR save/restore · 44e9309f
      Haren Myneni 提交于
      [PATCH 6/6] powerpc: Implement PPR save/restore
      
      When the task enters in to kernel space, the user defined priority (PPR)
      will be saved in to PACA at the beginning of first level exception
      vector and then copy from PACA to thread_info in second level vector.
      PPR will be restored from thread_info before exits the kernel space.
      
      P7/P8 temporarily raises the thread priority to higher level during
      exception until the program executes HMT_* calls. But it will not modify
      PPR register. So we save PPR value whenever some register is available
      to use and then calls HMT_MEDIUM to increase the priority. This feature
      supports on P7 or later processors.
      
      We save/ restore PPR for all exception vectors except system call entry.
      GLIBC will be saving / restore for system calls. So the default PPR
      value (3) will be set for the system call exit when the task returned
      to the user space.
      Signed-off-by: NHaren Myneni <haren@us.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      44e9309f
    • H
      powerpc: Move branch instruction from ACCOUNT_CPU_USER_ENTRY to caller · 5d75b264
      Haren Myneni 提交于
      [PATCH 1/6] powerpc: Move branch instruction from ACCOUNT_CPU_USER_ENTRY to caller
      
      The first instruction in ACCOUNT_CPU_USER_ENTRY is 'beq' which checks for
      exceptions coming from kernel mode. PPR value will be saved immediately after
      ACCOUNT_CPU_USER_ENTRY and is also for user level exceptions. So moved this
      branch instruction in the caller code.
      Signed-off-by: NHaren Myneni <haren@us.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5d75b264
    • I
      powerpc: Add code to handle soft-disabled doorbells on server · fe9e1d54
      Ian Munsie 提交于
      This patch adds the logic to properly handle doorbells that come in when
      interrupts have been soft disabled and to replay them when interrupts
      are re-enabled:
      
      - masked_##_H##interrupt is modified to leave interrupts enabled when a
        doorbell has come in since doorbells are edge sensitive and as such
        won't be automatically re-raised.
      
      - __check_irq_replay now tests if a doorbell happened on book3s, and
        returns either 0xe80 or 0xa00 depending on whether we are the
        hypervisor or not.
      
      - restore_check_irq_replay now tests for the two possible server
        doorbell vector numbers to replay.
      
      - __replay_interrupt also adds tests for the two server doorbell vector
        numbers, and is modified to use a compare instruction rather than an
        andi. on the single bit difference between 0x500 and 0x900.
      
      The last two use a CPU feature section to avoid needlessly testing
      against the hypervisor vector if it is not the hypervisor, and vice
      versa.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fe9e1d54
  25. 15 11月, 2012 1 次提交
    • L
      powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning ! · 12660b17
      Li Zhong 提交于
      This patch tries to fix the following BUG report:
      
      [    0.012313] BUG: MAX_STACK_TRACE_ENTRIES too low!
      [    0.012318] turning off the locking correctness validator.
      [    0.012321] Call Trace:
      [    0.012330] [c00000017666f6d0] [c000000000012128] .show_stack+0x78/0x184 (unreliable)
      [    0.012339] [c00000017666f780] [c0000000000b6348] .save_trace+0x12c/0x14c
      [    0.012345] [c00000017666f800] [c0000000000b7448] .mark_lock+0x2bc/0x710
      [    0.012351] [c00000017666f8b0] [c0000000000bb198] .__lock_acquire+0x748/0xaec
      [    0.012357] [c00000017666f9b0] [c0000000000bb684] .lock_acquire+0x148/0x194
      [    0.012365] [c00000017666fa80] [c00000000069371c] .mutex_lock_nested+0x84/0x4ec
      [    0.012372] [c00000017666fb90] [c000000000096998] .smpboot_register_percpu_thread+0x3c/0x10c
      [    0.012380] [c00000017666fc30] [c0000000009ba910] .spawn_ksoftirqd+0x28/0x48
      [    0.012386] [c00000017666fcb0] [c00000000000a98c] .do_one_initcall+0xd8/0x1d0
      [    0.012392] [c00000017666fd60] [c00000000000b1f8] .kernel_init+0x120/0x398
      
      [    0.012398] [c00000017666fe30] [c000000000009ad4] .ret_from_kernel_thread+0x5c/0x64
      [    0.012404] [c00000017666fa00] [c00000017666fb20] 0xc00000017666fb20
      [    0.012410] [c00000017666fa80] [c00000000069371c] .mutex_lock_nested+0x84/0x4ec
      [    0.012416] [c00000017666fb90] [c000000000096998] .smpboot_register_percpu_thread+0x3c/0x10c
      [    0.012422] [c00000017666fc30] [c0000000009ba910] .spawn_ksoftirqd+0x28/0x48
      [    0.012427] [c00000017666fcb0] [c00000000000a98c] .do_one_initcall+0xd8/0x1d0
      [    0.012433] [c00000017666fd60] [c00000000000b1f8] .kernel_init+0x120/0x398
      
      [    0.012439] [c00000017666fe30] [c000000000009ad4] .ret_from_kernel_thread+0x5c/0x64
      .......
      
      The reason is that the back chain of c00000017666fe30
      (ret_from_kernel_thread) contains some invalid value, which might form a
      loop.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      12660b17
  26. 22 10月, 2012 1 次提交