1. 12 8月, 2016 1 次提交
    • D
      uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions · 68187872
      Denys Vlasenko 提交于
      Since instruction decoder now supports EVEX-encoded instructions, two fixes
      are needed to correctly handle them in uprobes.
      
      Extended bits for MODRM.rm field need to be sanitized just like we do it
      for VEX3, to avoid encoding wrong register for register-relative access.
      
      EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
      ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
      paranoid here: proper encoding for register-relative access
      should have EVEX.x = 1.
      
      Secondly, we should fetch vex.vvvv for EVEX too.
      This is now super easy because instruction decoder populates
      vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.1+
      Fixes: 8a764a87 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
      Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68187872
  2. 19 4月, 2016 1 次提交
  3. 13 4月, 2016 1 次提交
  4. 31 7月, 2015 3 次提交
  5. 09 6月, 2015 1 次提交
  6. 23 3月, 2015 1 次提交
  7. 19 2月, 2015 3 次提交
    • D
      uprobes/x86: Fix 2-byte opcode table · 5154d4f2
      Denys Vlasenko 提交于
      Enabled probing of lar, lsl, popcnt, lddqu, prefetch insns.
      They should be safe to probe, they throw no exceptions.
      
      Enabled probing of 3-byte opcodes 0f 38-3f xx - these are
      vector isns, so should be safe.
      
      Enabled probing of many currently undefined 0f xx insns.
      At the rate new vector instructions are getting added,
      we don't want to constantly enable more bits.
      We want to only occasionally *disable* ones which
      for some reason can't be probed.
      This includes 0f 24,26 opcodes, which are undefined
      since Pentium. On 486, they were "mov to/from test register".
      
      Explained more fully what 0f 78,79 opcodes are.
      
      Explained what 0f ae opcode is. (It's unclear why we don't allow
      probing it, but let's not change it for now).
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1423768732-32194-3-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5154d4f2
    • D
      uprobes/x86: Fix 1-byte opcode tables · 67fc8092
      Denys Vlasenko 提交于
      This change fixes 1-byte opcode tables so that only insns
      for which we have real reasons to disallow probing are marked
      with unset bits.
      
      To that end:
      
      Set bits for all prefix bytes. Their setting is ignored anyway -
      we check the bitmap against OPCODE1(insn), not against first
      byte. Keeping them set to 0 only confuses code reader with
      "why we don't support that opcode" question.
      
      Thus: enable bytes c4,c5 in 64-bit mode (VEX prefixes).
      Byte 62 (EVEX prefix) is not yet enabled since insn decoder
      does not support that yet.
      
      For 32-bit mode, enable probing of opcodes 63 (arpl) and d6
      (salc). They don't require any special handling.
      
      For 64-bit mode, disable 9a and ea - these undefined opcodes
      were mistakenly left enabled.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1423768732-32194-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67fc8092
    • D
      uprobes/x86: Add comment with insn opcodes, mnemonics and why we dont support them · 097f4e5e
      Denys Vlasenko 提交于
      After adding these, it's clear we have some awkward choices
      there. Some valid instructions are prohibited from uprobing
      while several invalid ones are allowed.
      
      Hopefully future edits to the good-opcode tables will fix wrong
      bits or explain why those bits are not wrong.
      
      No actual code changes.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1423768732-32194-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      097f4e5e
  8. 18 11月, 2014 1 次提交
    • D
      x86: Remove arbitrary instruction size limit in instruction decoder · 6ba48ff4
      Dave Hansen 提交于
      The current x86 instruction decoder steps along through the
      instruction stream but always ensures that it never steps farther
      than the largest possible instruction size (MAX_INSN_SIZE).
      
      The MPX code is now going to be doing some decoding of userspace
      instructions.  We copy those from userspace in to the kernel and
      they're obviously completely untrusted coming from userspace.  In
      addition to the constraint that instructions can only be so long,
      we also have to be aware of how long the buffer is that came in
      from userspace.  This _looks_ to be similar to what the perf and
      kprobes is doing, but it's unclear to me whether they are
      affected.
      
      The whole reason we need this is that it is perfectly valid to be
      executing an instruction within MAX_INSN_SIZE bytes of an
      unreadable page. We should be able to gracefully handle short
      reads in those cases.
      
      This adds support to the decoder to record how long the buffer
      being decoded is and to refuse to "validate" the instruction if
      we would have gone over the end of the buffer to decode it.
      
      The kprobes code probably needs to be looked at here a bit more
      carefully.  This patch still respects the MAX_INSN_SIZE limit
      there but the kprobes code does look like it might be able to
      be a bit more strict than it currently is.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6ba48ff4
  9. 05 6月, 2014 1 次提交
  10. 14 5月, 2014 2 次提交
    • D
      uprobes/x86: Fix scratch register selection for rip-relative fixups · 1ea30fb6
      Denys Vlasenko 提交于
      Before this patch, instructions such as div, mul, shifts with count
      in CL, cmpxchg are mishandled.
      
      This patch adds vex prefix handling. In particular, it avoids colliding
      with register operand encoded in vex.vvvv field.
      
      Since we need to avoid two possible register operands, the selection of
      scratch register needs to be from at least three registers.
      
      After looking through a lot of CPU docs, it looks like the safest choice
      is SI,DI,BX. Selecting BX needs care to not collide with implicit use of
      BX by cmpxchg8b.
      
      Test-case:
      
      	#include <stdio.h>
      
      	static const char *const pass[] = { "FAIL", "pass" };
      
      	long two = 2;
      	void test1(void)
      	{
      		long ax = 0, dx = 0;
      		asm volatile("\n"
      	"			xor	%%edx,%%edx\n"
      	"			lea	2(%%edx),%%eax\n"
      	// We divide 2 by 2. Result (in eax) should be 1:
      	"	probe1:		.globl	probe1\n"
      	"			divl	two(%%rip)\n"
      	// If we have a bug (eax mangled on entry) the result will be 2,
      	// because eax gets restored by probe machinery.
      		: "=a" (ax), "=d" (dx) /*out*/
      		: "0" (ax), "1" (dx) /*in*/
      		: "memory" /*clobber*/
      		);
      		dprintf(2, "%s: %s\n", __func__,
      			pass[ax == 1]
      		);
      	}
      
      	long val2 = 0;
      	void test2(void)
      	{
      		long old_val = val2;
      		long ax = 0, dx = 0;
      		asm volatile("\n"
      	"			mov	val2,%%eax\n"     // eax := val2
      	"			lea	1(%%eax),%%edx\n" // edx := eax+1
      	// eax is equal to val2. cmpxchg should store edx to val2:
      	"	probe2:		.globl  probe2\n"
      	"			cmpxchg %%edx,val2(%%rip)\n"
      	// If we have a bug (eax mangled on entry), val2 will stay unchanged
      		: "=a" (ax), "=d" (dx) /*out*/
      		: "0" (ax), "1" (dx) /*in*/
      		: "memory" /*clobber*/
      		);
      		dprintf(2, "%s: %s\n", __func__,
      			pass[val2 == old_val + 1]
      		);
      	}
      
      	long val3[2] = {0,0};
      	void test3(void)
      	{
      		long old_val = val3[0];
      		long ax = 0, dx = 0;
      		asm volatile("\n"
      	"			mov	val3,%%eax\n"  // edx:eax := val3
      	"			mov	val3+4,%%edx\n"
      	"			mov	%%eax,%%ebx\n" // ecx:ebx := edx:eax + 1
      	"			mov	%%edx,%%ecx\n"
      	"			add	$1,%%ebx\n"
      	"			adc	$0,%%ecx\n"
      	// edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3:
      	"	probe3:		.globl  probe3\n"
      	"			cmpxchg8b val3(%%rip)\n"
      	// If we have a bug (edx:eax mangled on entry), val3 will stay unchanged.
      	// If ecx:edx in mangled, val3 will get wrong value.
      		: "=a" (ax), "=d" (dx) /*out*/
      		: "0" (ax), "1" (dx) /*in*/
      		: "cx", "bx", "memory" /*clobber*/
      		);
      		dprintf(2, "%s: %s\n", __func__,
      			pass[val3[0] == old_val + 1 && val3[1] == 0]
      		);
      	}
      
      	int main(int argc, char **argv)
      	{
      		test1();
      		test2();
      		test3();
      		return 0;
      	}
      
      Before this change all tests fail if probe{1,2,3} are probed.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      1ea30fb6
    • D
      uprobes/x86: Simplify rip-relative handling · 50204c6f
      Denys Vlasenko 提交于
      It is possible to replace rip-relative addressing mode with addressing
      mode of the same length: (reg+disp32). This eliminates the need to fix
      up immediate and correct for changing instruction length.
      
      And we can kill arch_uprobe->def.riprel_target.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      50204c6f
  11. 01 5月, 2014 18 次提交
  12. 18 4月, 2014 7 次提交
    • O
      uprobes/x86: Emulate relative conditional "near" jmp's · 6cc5e7ff
      Oleg Nesterov 提交于
      Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
      if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
      condition.
      
      Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
      correct no matter if this jmp is "near" or "short".
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      6cc5e7ff
    • O
      uprobes/x86: Emulate relative conditional "short" jmp's · 8f95505b
      Oleg Nesterov 提交于
      Teach branch_emulate_op() to emulate the conditional "short" jmp's which
      check regs->flags.
      
      Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
      They all are rel8 and thus they can't trigger the problem, but perhaps
      we will add the support in future just for completeness.
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      8f95505b
    • O
      uprobes/x86: Emulate relative call's · 8e89c0be
      Oleg Nesterov 提交于
      See the previous "Emulate unconditional relative jmp's" which explains
      why we can not execute "jmp" out-of-line, the same applies to "call".
      
      Emulating of rip-relative call is trivial, we only need to additionally
      push the ret-address. If this fails, we execute this instruction out of
      line and this should trigger the trap, the probed application should die
      or the same insn will be restarted if a signal handler expands the stack.
      We do not even need ->post_xol() for this case.
      
      But there is a corner (and almost theoretical) case: another thread can
      expand the stack right before we execute this insn out of line. In this
      case it hit the same problem we are trying to solve. So we simply turn
      the probed insn into "call 1f; 1:" and add ->post_xol() which restores
      ->sp and restarts.
      
      Many thanks to Jonathan who finally found the standalone reproducer,
      otherwise I would never resolve the "random SIGSEGV's under systemtap"
      bug-report. Now that the problem is clear we can write the simplified
      test-case:
      
      	void probe_func(void), callee(void);
      
      	int failed = 1;
      
      	asm (
      		".text\n"
      		".align 4096\n"
      		".globl probe_func\n"
      		"probe_func:\n"
      		"call callee\n"
      		"ret"
      	);
      
      	/*
      	 * This assumes that:
      	 *
      	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
      	 *
      	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
      	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
      	 *
      	 * so we can target the non-canonical address from xol_vma using
      	 * the simple math below, 100 * 4096 is just the random offset
      	 */
      	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");
      
      	void callee(void)
      	{
      		failed = 0;
      	}
      
      	int main(void)
      	{
      		probe_func();
      		return failed;
      	}
      
      It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
      randomize_va_space/etc can change the placement of xol area).
      
      Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
      differently. This patch relies on lib/insn.c and thus implements the intel's
      behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
      this insn, so we postpone the fix until we decide what should we do; emulate
      or not, support or not, etc.
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      8e89c0be
    • O
      uprobes/x86: Emulate nop's using ops->emulate() · d2410063
      Oleg Nesterov 提交于
      Finally we can kill the ugly (and very limited) code in __skip_sstep().
      Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.
      
      Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
      "(rep;)+ nop;" at least, and (afaics) much more.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      d2410063
    • O
      uprobes/x86: Emulate unconditional relative jmp's · 7ba6db2d
      Oleg Nesterov 提交于
      Currently we always execute all insns out-of-line, including relative
      jmp's and call's. This assumes that even if regs->ip points to nowhere
      after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
      update it correctly.
      
      However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
      is not canonical. In this case CPU generates #GP and general_protection()
      kills the task which tries to execute this insn out-of-line.
      
      Now that we have uprobe_xol_ops we can teach uprobes to emulate these
      insns and solve the problem. This patch adds branch_xol_ops which has
      a single branch_emulate_op() hook, so far it can only handle rel8/32
      relative jmp's.
      
      TODO: move ->fixup into the union along with rip_rela_target_address.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NJonathan Lebon <jlebon@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      7ba6db2d
    • O
      uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and... · 8faaed1b
      Oleg Nesterov 提交于
      uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()
      
      1. Add the trivial sizeof_long() helper and change other callers of
         is_ia32_task() to use it.
      
         TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
         not necessarily mean 32bit. Fortunately syscall-like insns can't be
         probed so it actually works, but it would be better to rename and
         use is_ia32_frame().
      
      2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
         and adjust_ret_addr() should be named "nleft". And in fact only the
         last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
         needs to inspect the non-zero error code.
      
      TODO: adjust_ret_addr() should die. We can always calculate the value
      we need to write into *regs->sp, just UPROBE_FIX_CALL should record
      insn->length.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      8faaed1b
    • O
      uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible · 75f9ef0b
      Oleg Nesterov 提交于
      SIGILL after the failed arch_uprobe_post_xol() should only be used as
      a last resort, we should try to restart the probed insn if possible.
      
      Currently only adjust_ret_addr() can fail, and this can only happen if
      another thread unmapped our stack after we executed "call" out-of-line.
      Most probably the application if buggy, but even in this case it can
      have a handler for SIGSEGV/etc. And in theory it can be even correct
      and do something non-trivial with its memory.
      
      Of course we can't restart unconditionally, so arch_uprobe_post_xol()
      does this only if ->post_xol() returns -ERESTART even if currently this
      is the only possible error.
      
      default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
      pointed out it should not forget to pop off the return address pushed
      by this insn executed out-of-line.
      
      Note: this is not "perfect", we do not want the extra handler_chain()
      after restart, but I think this is the best solution we can realistically
      do without too much uglifications.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      75f9ef0b