1. 14 5月, 2014 3 次提交
    • O
      x86/traps: Make math_error() static · 5e1b05be
      Oleg Nesterov 提交于
      Trivial, make math_error() static.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      5e1b05be
    • D
      uprobes/x86: Fix scratch register selection for rip-relative fixups · 1ea30fb6
      Denys Vlasenko 提交于
      Before this patch, instructions such as div, mul, shifts with count
      in CL, cmpxchg are mishandled.
      
      This patch adds vex prefix handling. In particular, it avoids colliding
      with register operand encoded in vex.vvvv field.
      
      Since we need to avoid two possible register operands, the selection of
      scratch register needs to be from at least three registers.
      
      After looking through a lot of CPU docs, it looks like the safest choice
      is SI,DI,BX. Selecting BX needs care to not collide with implicit use of
      BX by cmpxchg8b.
      
      Test-case:
      
      	#include <stdio.h>
      
      	static const char *const pass[] = { "FAIL", "pass" };
      
      	long two = 2;
      	void test1(void)
      	{
      		long ax = 0, dx = 0;
      		asm volatile("\n"
      	"			xor	%%edx,%%edx\n"
      	"			lea	2(%%edx),%%eax\n"
      	// We divide 2 by 2. Result (in eax) should be 1:
      	"	probe1:		.globl	probe1\n"
      	"			divl	two(%%rip)\n"
      	// If we have a bug (eax mangled on entry) the result will be 2,
      	// because eax gets restored by probe machinery.
      		: "=a" (ax), "=d" (dx) /*out*/
      		: "0" (ax), "1" (dx) /*in*/
      		: "memory" /*clobber*/
      		);
      		dprintf(2, "%s: %s\n", __func__,
      			pass[ax == 1]
      		);
      	}
      
      	long val2 = 0;
      	void test2(void)
      	{
      		long old_val = val2;
      		long ax = 0, dx = 0;
      		asm volatile("\n"
      	"			mov	val2,%%eax\n"     // eax := val2
      	"			lea	1(%%eax),%%edx\n" // edx := eax+1
      	// eax is equal to val2. cmpxchg should store edx to val2:
      	"	probe2:		.globl  probe2\n"
      	"			cmpxchg %%edx,val2(%%rip)\n"
      	// If we have a bug (eax mangled on entry), val2 will stay unchanged
      		: "=a" (ax), "=d" (dx) /*out*/
      		: "0" (ax), "1" (dx) /*in*/
      		: "memory" /*clobber*/
      		);
      		dprintf(2, "%s: %s\n", __func__,
      			pass[val2 == old_val + 1]
      		);
      	}
      
      	long val3[2] = {0,0};
      	void test3(void)
      	{
      		long old_val = val3[0];
      		long ax = 0, dx = 0;
      		asm volatile("\n"
      	"			mov	val3,%%eax\n"  // edx:eax := val3
      	"			mov	val3+4,%%edx\n"
      	"			mov	%%eax,%%ebx\n" // ecx:ebx := edx:eax + 1
      	"			mov	%%edx,%%ecx\n"
      	"			add	$1,%%ebx\n"
      	"			adc	$0,%%ecx\n"
      	// edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3:
      	"	probe3:		.globl  probe3\n"
      	"			cmpxchg8b val3(%%rip)\n"
      	// If we have a bug (edx:eax mangled on entry), val3 will stay unchanged.
      	// If ecx:edx in mangled, val3 will get wrong value.
      		: "=a" (ax), "=d" (dx) /*out*/
      		: "0" (ax), "1" (dx) /*in*/
      		: "cx", "bx", "memory" /*clobber*/
      		);
      		dprintf(2, "%s: %s\n", __func__,
      			pass[val3[0] == old_val + 1 && val3[1] == 0]
      		);
      	}
      
      	int main(int argc, char **argv)
      	{
      		test1();
      		test2();
      		test3();
      		return 0;
      	}
      
      Before this change all tests fail if probe{1,2,3} are probed.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      1ea30fb6
    • D
      uprobes/x86: Simplify rip-relative handling · 50204c6f
      Denys Vlasenko 提交于
      It is possible to replace rip-relative addressing mode with addressing
      mode of the same length: (reg+disp32). This eliminates the need to fix
      up immediate and correct for changing instruction length.
      
      And we can kill arch_uprobe->def.riprel_target.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      50204c6f
  2. 01 5月, 2014 18 次提交
  3. 24 4月, 2014 1 次提交
  4. 21 4月, 2014 3 次提交
    • A
      um: Memory corruption on startup · 0565103d
      Anton Ivanov 提交于
      The reverse case of this race (you must msync before read) is
      well known. This is the not so common one.
      
      It can be triggered only on systems which do a lot of task
      switching and only at UML startup. If you are starting 200+ UMLs
      ~ 0.5% will always die without this fix.
      Signed-off-by: NAnton Ivanov <antivano@cisco.com>
      [rw: minor whitespace fixes]
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      0565103d
    • A
      um: Missing pipe handling · 9fcb663b
      Anton Ivanov 提交于
      UML does not handle sigpipe. As a result when running it under
      expect or redirecting the IO from the console to an external program
      it will crash if the program stops or exits.
      Signed-off-by: NAnton Ivanov <antivano@cisco.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      9fcb663b
    • T
      uml: Simplify tempdir logic. · 0d71832e
      Tristan Schmelcher 提交于
      Inferring the mount hierarchy correctly from /proc/mounts is hard when MS_MOVE
      may have been used, and the previous code did it wrongly. This change simplifies
      the logic to only require that /dev/shm be _on_ tmpfs (which can be checked
      trivially with statfs) rather than that it be a _mountpoint_ of tmpfs, since
      there isn't a compelling reason to be that strict. We also now check for tmpfs
      on whatever directory we ultimately use so that the user is better informed.
      
      This change also moves the more standard TMPDIR environment variable check ahead
      of the others.
      
      Applies to 3.12.
      Signed-off-by: NTristan Schmelcher <tschmelcher@google.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      0d71832e
  5. 19 4月, 2014 4 次提交
    • K
      mips: export flush_icache_range · 8229f1a0
      Kees Cook 提交于
      The lkdtm module performs tests against executable memory ranges, so it
      needs to flush the icache for proper behaviors.  Other architectures
      already export this, so do the same for MIPS.
      
      [akpm@linux-foundation.org: relocate export sites]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8229f1a0
    • V
      Shiraz has moved · 9cc23682
      Viresh Kumar 提交于
      shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
      company.  Replace ST's id with shiraz.linux.kernel@gmail.com.
      
      It also updates .mailmap file to fix address for 'git shortlog'.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9cc23682
    • M
      powerpc/mm: fix ".__node_distance" undefined · 12c743eb
      Mike Qiu 提交于
        CHK     include/config/kernel.release
        CHK     include/generated/uapi/linux/version.h
        CHK     include/generated/utsrelease.h
        ...
        Building modules, stage 2.
      WARNING: 1 bad relocations
      c0000000013d6a30 R_PPC64_ADDR64    uprobes_fetch_type_table
        WRAP    arch/powerpc/boot/zImage.pseries
        WRAP    arch/powerpc/boot/zImage.epapr
        MODPOST 1849 modules
      ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
      make[1]: *** [__modpost] Error 1
      make: *** [modules] Error 2
      make: *** Waiting for unfinished jobs....
      
      The reason is symbol "__node_distance" not been exported in powerpc.
      Signed-off-by: NMike Qiu <qiudayu@linux.vnet.ibm.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12c743eb
    • V
      ARC: Delete stale barrier.h · 64ee9f32
      Vineet Gupta 提交于
      Commit 93ea02bb ("arch: Clean up asm/barrier.h implementations")
      wired generic barrier.h for ARC, but failed to delete the existing file.
      
      In 3.15, due to rcupdate.h updates, this causes a build breakage on ARC:
      
            CC      arch/arc/kernel/asm-offsets.s
          In file included from include/linux/sched.h:45:0,
                           from arch/arc/kernel/asm-offsets.c:9:
          include/linux/rculist.h: In function __list_add_rcu:
          include/linux/rculist.h:54:2: error: implicit declaration of function smp_store_release [-Werror=implicit-function-declaration]
            rcu_assign_pointer(list_next_rcu(prev), new);
            ^
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64ee9f32
  6. 18 4月, 2014 11 次提交
    • Y
      perf/x86: Export perf_assign_events() · 4a3dc121
      Yan, Zheng 提交于
      export perf_assign_events to allow building perf Intel uncore driver
      as module
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1395133004-23205-3-git-send-email-zheng.z.yan@intel.com
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: eranian@google.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4a3dc121
    • V
      perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU · 24223657
      Venkatesh Srinivas 提交于
      CPUs which should support the RAPL counters according to
      Family/Model/Stepping may still issue #GP when attempting to access
      the RAPL MSRs. This may happen when Linux is running under KVM and
      we are passing-through host F/M/S data, for example. Use rdmsrl_safe
      to first access the RAPL_POWER_UNIT MSR; if this fails, do not
      attempt to use this PMU.
      Signed-off-by: NVenkatesh Srinivas <venkateshs@google.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
      Cc: zheng.z.yan@intel.com
      Cc: eranian@google.com
      Cc: ak@linux.intel.com
      Cc: linux-kernel@vger.kernel.org
      [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      24223657
    • O
      uprobes/x86: Emulate relative conditional "near" jmp's · 6cc5e7ff
      Oleg Nesterov 提交于
      Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
      if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
      condition.
      
      Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
      correct no matter if this jmp is "near" or "short".
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      6cc5e7ff
    • O
      uprobes/x86: Emulate relative conditional "short" jmp's · 8f95505b
      Oleg Nesterov 提交于
      Teach branch_emulate_op() to emulate the conditional "short" jmp's which
      check regs->flags.
      
      Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
      They all are rel8 and thus they can't trigger the problem, but perhaps
      we will add the support in future just for completeness.
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      8f95505b
    • O
      uprobes/x86: Emulate relative call's · 8e89c0be
      Oleg Nesterov 提交于
      See the previous "Emulate unconditional relative jmp's" which explains
      why we can not execute "jmp" out-of-line, the same applies to "call".
      
      Emulating of rip-relative call is trivial, we only need to additionally
      push the ret-address. If this fails, we execute this instruction out of
      line and this should trigger the trap, the probed application should die
      or the same insn will be restarted if a signal handler expands the stack.
      We do not even need ->post_xol() for this case.
      
      But there is a corner (and almost theoretical) case: another thread can
      expand the stack right before we execute this insn out of line. In this
      case it hit the same problem we are trying to solve. So we simply turn
      the probed insn into "call 1f; 1:" and add ->post_xol() which restores
      ->sp and restarts.
      
      Many thanks to Jonathan who finally found the standalone reproducer,
      otherwise I would never resolve the "random SIGSEGV's under systemtap"
      bug-report. Now that the problem is clear we can write the simplified
      test-case:
      
      	void probe_func(void), callee(void);
      
      	int failed = 1;
      
      	asm (
      		".text\n"
      		".align 4096\n"
      		".globl probe_func\n"
      		"probe_func:\n"
      		"call callee\n"
      		"ret"
      	);
      
      	/*
      	 * This assumes that:
      	 *
      	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
      	 *
      	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
      	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
      	 *
      	 * so we can target the non-canonical address from xol_vma using
      	 * the simple math below, 100 * 4096 is just the random offset
      	 */
      	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");
      
      	void callee(void)
      	{
      		failed = 0;
      	}
      
      	int main(void)
      	{
      		probe_func();
      		return failed;
      	}
      
      It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
      randomize_va_space/etc can change the placement of xol area).
      
      Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
      differently. This patch relies on lib/insn.c and thus implements the intel's
      behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
      this insn, so we postpone the fix until we decide what should we do; emulate
      or not, support or not, etc.
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      8e89c0be
    • O
      uprobes/x86: Emulate nop's using ops->emulate() · d2410063
      Oleg Nesterov 提交于
      Finally we can kill the ugly (and very limited) code in __skip_sstep().
      Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.
      
      Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
      "(rep;)+ nop;" at least, and (afaics) much more.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      d2410063
    • O
      uprobes/x86: Emulate unconditional relative jmp's · 7ba6db2d
      Oleg Nesterov 提交于
      Currently we always execute all insns out-of-line, including relative
      jmp's and call's. This assumes that even if regs->ip points to nowhere
      after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
      update it correctly.
      
      However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
      is not canonical. In this case CPU generates #GP and general_protection()
      kills the task which tries to execute this insn out-of-line.
      
      Now that we have uprobe_xol_ops we can teach uprobes to emulate these
      insns and solve the problem. This patch adds branch_xol_ops which has
      a single branch_emulate_op() hook, so far it can only handle rel8/32
      relative jmp's.
      
      TODO: move ->fixup into the union along with rip_rela_target_address.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      7ba6db2d
    • O
      uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and... · 8faaed1b
      Oleg Nesterov 提交于
      uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()
      
      1. Add the trivial sizeof_long() helper and change other callers of
         is_ia32_task() to use it.
      
         TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
         not necessarily mean 32bit. Fortunately syscall-like insns can't be
         probed so it actually works, but it would be better to rename and
         use is_ia32_frame().
      
      2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
         and adjust_ret_addr() should be named "nleft". And in fact only the
         last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
         needs to inspect the non-zero error code.
      
      TODO: adjust_ret_addr() should die. We can always calculate the value
      we need to write into *regs->sp, just UPROBE_FIX_CALL should record
      insn->length.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      8faaed1b
    • O
      uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible · 75f9ef0b
      Oleg Nesterov 提交于
      SIGILL after the failed arch_uprobe_post_xol() should only be used as
      a last resort, we should try to restart the probed insn if possible.
      
      Currently only adjust_ret_addr() can fail, and this can only happen if
      another thread unmapped our stack after we executed "call" out-of-line.
      Most probably the application if buggy, but even in this case it can
      have a handler for SIGSEGV/etc. And in theory it can be even correct
      and do something non-trivial with its memory.
      
      Of course we can't restart unconditionally, so arch_uprobe_post_xol()
      does this only if ->post_xol() returns -ERESTART even if currently this
      is the only possible error.
      
      default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
      pointed out it should not forget to pop off the return address pushed
      by this insn executed out-of-line.
      
      Note: this is not "perfect", we do not want the extra handler_chain()
      after restart, but I think this is the best solution we can realistically
      do without too much uglifications.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      75f9ef0b
    • O
      uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails · 014940ba
      Oleg Nesterov 提交于
      Currently the error from arch_uprobe_post_xol() is silently ignored.
      This doesn't look good and this can lead to the hard-to-debug problems.
      
      1. Change handle_singlestep() to loudly complain and send SIGILL.
      
         Note: this only affects x86, ppc/arm can't fail.
      
      2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
         avoid TF games if it is going to return an error.
      
         This can help to to analyze the problem, if nothing else we should
         not report ->ip = xol_slot in the core-file.
      
         Note: this means that handle_riprel_post_xol() can be called twice,
         but this is fine because it is idempotent.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      014940ba
    • O
      uprobes/x86: Conditionalize the usage of handle_riprel_insn() · e55848a4
      Oleg Nesterov 提交于
      arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
      but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
      Move the callsite into "default" case and change the "0xff" case to
      fall-through.
      
      We are going to add the various hooks to handle the rip-relative
      jmp/call instructions (and more), we need this change to enforce the
      fact that the new code can not conflict with is_riprel_insn() logic
      which, after this change, can only be used by default_xol_ops.
      
      Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
      directly. This is fine unless another _xol_ops we may add later will
      need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
      In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
      we will need anyway in the long term.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      e55848a4