You need to sign in or sign up before continuing.
  1. 17 12月, 2015 2 次提交
  2. 01 12月, 2015 3 次提交
  3. 29 7月, 2015 2 次提交
    • M
      powerpc/kernel: Change the do_syscall_trace_enter() API · d3837414
      Michael Ellerman 提交于
      The API for calling do_syscall_trace_enter() is currently sensible
      enough, it just returns the (modified) syscall number.
      
      However once we enable seccomp filter it will get more complicated. When
      seccomp filter runs, the seccomp kernel code (via SECCOMP_RET_ERRNO), or
      a ptracer (via SECCOMP_RET_TRACE), may reject the syscall and *may* or may
      *not* set a return value in r3.
      
      That means the assembler that calls do_syscall_trace_enter() can not
      blindly return ENOSYS, it needs to only return ENOSYS if a return value
      has not already been set.
      
      There is no way to implement that logic with the current API. So change
      the do_syscall_trace_enter() API to make it deal with the return code
      juggling, and the assembler can then just return whatever return code it
      is given.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      d3837414
    • M
      powerpc/kernel: Switch to using MAX_ERRNO · c3525940
      Michael Ellerman 提交于
      Currently on powerpc we have our own #define for the highest (negative)
      errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
      which are not clear.
      
      The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.
      
      In particular seccomp uses MAX_ERRNO to restrict the value that a
      seccomp filter can return.
      
      Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
      tracer wanting to return 600, expecting it to be seen as an error, would
      instead find on powerpc that userspace sees a successful syscall with a
      return value of 600.
      
      To avoid this inconsistency, switch powerpc to use MAX_ERRNO.
      
      We are somewhat confident that generic syscalls that can return a
      non-error value above negative MAX_ERRNO have already been updated to
      use force_successful_syscall_return().
      
      I have also checked all the powerpc specific syscalls, and believe that
      none of them expect to return a non-error value between -MAX_ERRNO and
      -516. So this change should be safe ...
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      c3525940
  4. 19 6月, 2015 1 次提交
    • S
      powerpc/tm: Abort syscalls in active transactions · b4b56f9e
      Sam bobroff 提交于
      This patch changes the syscall handler to doom (tabort) active
      transactions when a syscall is made and return very early without
      performing the syscall and keeping side effects to a minimum (no CPU
      accounting or system call tracing is performed). Also included is a
      new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
      behaviour to userspace.
      
      Currently, the system call instruction automatically suspends an
      active transaction which causes side effects to persist when an active
      transaction fails.
      
      This does change the kernel's behaviour, but in a way that was
      documented as unsupported.  It doesn't reduce functionality as
      syscalls will still be performed after tsuspend; it just requires that
      the transaction be explicitly suspended.  It also provides a
      consistent interface and makes the behaviour of user code
      substantially the same across powerpc and platforms that do not
      support suspended transactions (e.g. x86 and s390).
      
      Performance measurements using
      http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
      a normal (non-aborted) system call increases by about 0.25%.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b4b56f9e
  5. 07 6月, 2015 1 次提交
  6. 30 4月, 2015 1 次提交
    • M
      Revert "powerpc/tm: Abort syscalls in active transactions" · 68fc378c
      Michael Ellerman 提交于
      This reverts commit feba4036.
      
      Although the principle of this change is good, the implementation has a
      few issues.
      
      Firstly we can sometimes fail to abort a syscall because r12 may have
      been clobbered by C code if we went down the virtual CPU accounting
      path, or if syscall tracing was enabled.
      
      Secondly we have decided that it is safer to abort the syscall even
      earlier in the syscall entry path, so that we avoid the syscall tracing
      path when we are transactional.
      
      So that we have time to thoroughly test those changes we have decided to
      revert this for this merge window and will merge the fixed version in
      the next window.
      
      NB. Rather than reverting the selftest we just drop tm-syscall from
      TEST_PROGS so that it's not run by default.
      
      Fixes: feba4036 ("powerpc/tm: Abort syscalls in active transactions")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      68fc378c
  7. 11 4月, 2015 1 次提交
    • S
      powerpc/tm: Abort syscalls in active transactions · feba4036
      Sam bobroff 提交于
      This patch changes the syscall handler to doom (tabort) active
      transactions when a syscall is made and return immediately without
      performing the syscall.
      
      Currently, the system call instruction automatically suspends an
      active transaction which causes side effects to persist when an active
      transaction fails.
      
      This does change the kernel's behaviour, but in a way that was
      documented as unsupported. It doesn't reduce functionality because
      syscalls will still be performed after tsuspend. It also provides a
      consistent interface and makes the behaviour of user code
      substantially the same across powerpc and platforms that do not
      support suspended transactions (e.g. x86 and s390).
      
      Performance measurements using
      http://ozlabs.org/~anton/junkcode/null_syscall.c
      indicate the cost of a system call increases by about 0.5%.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Acked-By: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      feba4036
  8. 28 3月, 2015 1 次提交
    • M
      powerpc: Add a proper syscall for switching endianness · 529d235a
      Michael Ellerman 提交于
      We currently have a "special" syscall for switching endianness. This is
      syscall number 0x1ebe, which is handled explicitly in the 64-bit syscall
      exception entry.
      
      That has a few problems, firstly the syscall number is outside of the
      usual range, which confuses various tools. For example strace doesn't
      recognise the syscall at all.
      
      Secondly it's handled explicitly as a special case in the syscall
      exception entry, which is complicated enough without it.
      
      As a first step toward removing the special syscall, we need to add a
      regular syscall that implements the same functionality.
      
      The logic is simple, it simply toggles the MSR_LE bit in the userspace
      MSR. This is the same as the special syscall, with the caveat that the
      special syscall clobbers fewer registers.
      
      This version clobbers r9-r12, XER, CTR, and CR0-1,5-7.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      529d235a
  9. 02 2月, 2015 2 次提交
    • M
      powerpc: Remove old compile time disabled syscall tracing code · a4bcbe6a
      Michael Ellerman 提交于
      We have code to do syscall tracing which is disabled at compile time by
      default. It's not been touched since the dawn of time (ie. v2.6.12).
      
      There are now better ways to do syscall tracing, ie. using the
      raw_syscall, or syscall tracepoints.
      
      For the specific case of tracing syscalls at boot on a system that
      doesn't get to userspace, you can boot with:
      
        trace_event=syscalls tp_printk=on
      
      Which will trace syscalls from boot, and echo all output to the console.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a4bcbe6a
    • M
      powerpc/kernel: Make syscall_exit a local label · 4c3b2168
      Michael Ellerman 提交于
      Currently when we back trace something that is in a syscall we see
      something like this:
      
      [c000000000000000] [c000000000000000] SyS_read+0x6c/0x110
      [c000000000000000] [c000000000000000] syscall_exit+0x0/0x98
      
      Although it's entirely correct, seeing syscall_exit at the bottom can be
      confusing - we were exiting from a syscall and then called SyS_read() ?
      
      If we instead change syscall_exit to be a local label we get something
      more intuitive:
      
      [c0000001fa46fde0] [c00000000026719c] SyS_read+0x6c/0x110
      [c0000001fa46fe30] [c000000000009264] system_call+0x38/0xd0
      
      ie. we were handling a system call, and it was SyS_read().
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4c3b2168
  10. 23 1月, 2015 1 次提交
  11. 10 11月, 2014 2 次提交
  12. 31 10月, 2014 1 次提交
    • A
      powerpc: do_notify_resume can be called with bad thread_info flags argument · 808be314
      Anton Blanchard 提交于
      Back in 7230c564 ("powerpc: Rework lazy-interrupt handling") we
      added a call out to restore_interrupts() (written in c) before calling
      do_notify_resume:
      
              bl      restore_interrupts
              addi    r3,r1,STACK_FRAME_OVERHEAD
              bl      do_notify_resume
      
      Unfortunately do_notify_resume takes two arguments, the second one
      being the thread_info flags:
      
      void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
      
      We do populate r4 (the second argument) earlier, but
      restore_interrupts() is free to muck it up all it wants. My guess is
      the gcc compiler gods shone down on us and its register allocator
      never used r4. Sometimes, rarely, luck is on our side.
      
      LLVM on the other hand did trample r4.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      808be314
  13. 05 8月, 2014 1 次提交
  14. 28 7月, 2014 1 次提交
  15. 11 6月, 2014 1 次提交
    • S
      powerpc: Correct DSCR during TM context switch · 96d01610
      Sam bobroff 提交于
      Correct the DSCR SPR becoming temporarily corrupted if a task is
      context switched during a transaction.
      
      The problem occurs while suspending the task and is caused by saving
      the DSCR to thread.dscr after it has already been set to the CPU's
      default value:
      
      __switch_to() calls __switch_to_tm()
      	which calls tm_reclaim_task()
      	which calls tm_reclaim_thread()
      	which calls tm_reclaim()
      		where the DSCR is set to the CPU's default
      __switch_to() calls _switch()
      		where thread.dscr is set to the DSCR
      
      When the task is resumed, it's transaction will be doomed (as usual)
      and the DSCR SPR will be corrupted, although the checkpointed value
      will be correct. Therefore the DSCR will be immediately corrected by
      the transaction aborting, unless it has been suspended. In that case
      the incorrect value can be seen by the task until it resumes the
      transaction.
      
      The fix is to treat the DSCR similarly to the TAR and save it early
      in __switch_to().
      
      A program exposing the problem is added to the kernel self tests as:
      tools/testing/selftests/powerpc/tm/tm-resched-dscr.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      CC: <stable@vger.kernel.org> [v3.10+]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      96d01610
  16. 28 5月, 2014 1 次提交
    • S
      powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e
      Sam bobroff 提交于
      Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
      it is no longer possible to set the DSCR on a per-CPU basis.
      
      The old behaviour was to minipulate the DSCR SPR directly but this is no
      longer sufficient: the value is quickly overwritten by context switching.
      
      This patch stores the per-CPU DSCR value in a kernel variable rather than
      directly in the SPR and it is used whenever a process has not set the DSCR
      itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.
      
      Writes to the old global default (/sys/devices/system/cpu/dscr_default)
      now set all of the per-CPU values and reads return the last written value.
      
      The new per-CPU default is added to the paca_struct and is used everywhere
      outside of sysfs.c instead of the old global default.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1739ea9e
  17. 23 4月, 2014 6 次提交
  18. 15 1月, 2014 2 次提交
    • P
      powerpc: Don't corrupt transactional state when using FP/VMX in kernel · d31626f7
      Paul Mackerras 提交于
      Currently, when we have a process using the transactional memory
      facilities on POWER8 (that is, the processor is in transactional
      or suspended state), and the process enters the kernel and the
      kernel then uses the floating-point or vector (VMX/Altivec) facility,
      we end up corrupting the user-visible FP/VMX/VSX state.  This
      happens, for example, if a page fault causes a copy-on-write
      operation, because the copy_page function will use VMX to do the
      copy on POWER8.  The test program below demonstrates the bug.
      
      The bug happens because when FP/VMX state for a transactional process
      is stored in the thread_struct, we store the checkpointed state in
      .fp_state/.vr_state and the transactional (current) state in
      .transact_fp/.transact_vr.  However, when the kernel wants to use
      FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
      which saves the current state in .fp_state/.vr_state.  Furthermore,
      when we return to the user process we return with FP/VMX/VSX
      disabled.  The next time the process uses FP/VMX/VSX, we don't know
      which set of state (the current register values, .fp_state/.vr_state,
      or .transact_fp/.transact_vr) we should be using, since we have no
      way to tell if we are still in the same transaction, and if not,
      whether the previous transaction succeeded or failed.
      
      Thus it is necessary to strictly adhere to the rule that if FP has
      been enabled at any point in a transaction, we must keep FP enabled
      for the user process with the current transactional state in the
      FP registers, until we detect that it is no longer in a transaction.
      Similarly for VMX; once enabled it must stay enabled until the
      process is no longer transactional.
      
      In order to keep this rule, we add a new thread_info flag which we
      test when returning from the kernel to userspace, called TIF_RESTORE_TM.
      This flag indicates that there is FP/VMX/VSX state to be restored
      before entering userspace, and when it is set the .tm_orig_msr field
      in the thread_struct indicates what state needs to be restored.
      The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
      bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
      which are called from enable_kernel_fp/altivec, giveup_vsx, and
      flush_fp/altivec_to_thread instead of giveup_fpu/altivec.
      
      The other thing to be done is to get the transactional FP/VMX/VSX
      state from .fp_state/.vr_state when doing reclaim, if that state
      has been saved there by giveup_fpu/altivec_maybe_transactional.
      Having done this, we set the FP/VMX bit in the thread's MSR after
      reclaim to indicate that that part of the state is now valid
      (having been reclaimed from the processor's checkpointed state).
      
      Finally, in the signal handling code, we move the clearing of the
      transactional state bits in the thread's MSR a bit earlier, before
      calling flush_fp_to_thread(), so that we don't unnecessarily set
      the TIF_RESTORE_TM bit.
      
      This is the test program:
      
      /* Michael Neuling 4/12/2013
       *
       * See if the altivec state is leaked out of an aborted transaction due to
       * kernel vmx copy loops.
       *
       *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
       *
       */
      
      /* We don't use all of these, but for reference: */
      
      int main(int argc, char *argv[])
      {
      	long double vecin = 1.3;
      	long double vecout;
      	unsigned long pgsize = getpagesize();
      	int i;
      	int fd;
      	int size = pgsize*16;
      	char tmpfile[] = "/tmp/page_faultXXXXXX";
      	char buf[pgsize];
      	char *a;
      	uint64_t aborted = 0;
      
      	fd = mkstemp(tmpfile);
      	assert(fd >= 0);
      
      	memset(buf, 0, pgsize);
      	for (i = 0; i < size; i += pgsize)
      		assert(write(fd, buf, pgsize) == pgsize);
      
      	unlink(tmpfile);
      
      	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      	assert(a != MAP_FAILED);
      
      	asm __volatile__(
      		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
      		TBEGIN
      		"beq	3f ;"
      		TSUSPEND
      		"xxlxor 40,40,40 ; " // set 40 to 0
      		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
      		TABORT
      		TRESUME
      		TEND
      		"li	%[res], 0 ;"
      		"b	5f ;"
      		"3: ;" // Abort handler
      		"li	%[res], 1 ;"
      		"5: ;"
      		"stxvd2x 40,0,%[vecoutptr] ; "
      		: [res]"=r"(aborted)
      		: [vecinptr]"r"(&vecin),
      		  [vecoutptr]"r"(&vecout),
      		  [map]"r"(a)
      		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");
      
      	if (aborted && (vecin != vecout)){
      		printf("FAILED: vector state leaked on abort %f != %f\n",
      		       (double)vecin, (double)vecout);
      		exit(1);
      	}
      
      	munmap(a, size);
      
      	close(fd);
      
      	printf("PASSED!\n");
      	return 0;
      }
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d31626f7
    • M
      Move precessing of MCE queued event out from syscall exit path. · 30c82635
      Mahesh Salgaonkar 提交于
      Huge Dickins reported an issue that b5ff4211
      "powerpc/book3s: Queue up and process delayed MCE events" breaks the
      PowerMac G5 boot. This patch fixes it by moving the mce even processing
      away from syscall exit, which was wrong to do that in first place, and
      using irq work framework to delay processing of mce event.
      
      Reported-by: Hugh Dickins <hughd@google.com
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      30c82635
  19. 05 12月, 2013 1 次提交
  20. 06 11月, 2013 1 次提交
  21. 11 10月, 2013 2 次提交
  22. 27 8月, 2013 1 次提交
  23. 14 8月, 2013 2 次提交
  24. 09 8月, 2013 2 次提交
    • M
      powerpc: Save the TAR register earlier · c2d52644
      Michael Neuling 提交于
      This moves us to save the Target Address Register (TAR) a earlier in
      __switch_to.  It introduces a new function save_tar() to do this.
      
      We need to save the TAR earlier as we will overwrite it in the transactional
      memory reclaim/recheckpoint path.  We are going to do this in a subsequent
      patch which will fix saving the TAR register when it's modified inside a
      transaction.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: <stable@vger.kernel.org> [v3.10]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c2d52644
    • M
      powerpc: Fix context switch DSCR on POWER8 · 2517617e
      Michael Neuling 提交于
      POWER8 allows the DSCR to be accessed directly from userspace via a new SPR
      number 0x3 (Rather than 0x11.  DSCR SPR number 0x11 is still used on POWER8 but
      like POWER7, is only accessible in HV and OS modes).  Currently, we allow this
      by setting H/FSCR DSCR bit on boot.
      
      Unfortunately this doesn't work, as the kernel needs to see the DSCR change so
      that it knows to no longer restore the system wide version of DSCR on context
      switch (ie. to set thread.dscr_inherit).
      
      This clears the H/FSCR DSCR bit initially.  If a process then accesses the DSCR
      (via SPR 0x3), it'll trap into the kernel where we set thread.dscr_inherit in
      facility_unavailable_exception().
      
      We also change _switch() so that we set or clear the H/FSCR DSCR bit based on
      the thread.dscr_inherit.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: <stable@vger.kernel.org> [v3.10]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2517617e
  25. 20 6月, 2013 1 次提交
    • B
      powerpc: Restore dbcr0 on user space exit · 13d543cd
      Bharat Bhushan 提交于
      On BookE (Branch taken + Single Step) is as same as Branch Taken
      on BookS and in Linux we simulate BookS behavior for BookE as well.
      When doing so, in Branch taken handling we want to set DBCR0_IC but
      we update the current->thread->dbcr0 and not DBCR0.
      
      Now on 64bit the current->thread.dbcr0 (and other debug registers)
      is synchronized ONLY on context switch flow. But after handling
      Branch taken in debug exception if we return back to user space
      without context switch then single stepping change (DBCR0_ICMP)
      does not get written in h/w DBCR0 and Instruction Complete exception
      does not happen.
      
      This fixes using ptrace reliably on BookE-PowerPC
      
      lmbench latency test (lat_syscall) Results are (they varies a little
      on each run)
      
      1) ./lat_syscall <action> /dev/shm/uImage
      
      action:	Open	read	write	stat	fstat	null
      Before:	3.8618	0.2017	0.2851	1.6789	0.2256	0.0856
      After:	3.8580	0.2017	0.2851	1.6955	0.2255	0.0856
      
      1) ./lat_syscall -P 2 -N 10 <action> /dev/shm/uImage
      action:	Open	read	write	stat	fstat	null
      Before:	4.1388	0.2238	0.3066	1.7106	0.2256	0.0856
      After:	4.1413	0.2236	0.3062	1.7107	0.2256	0.0856
      
      [ Slightly modified to avoid extra branch in the fast path
        on Book3S and fix build on all non-BookE 64-bit -- BenH
      ]
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      13d543cd