提交 · 0eb14833d5b1ea1accfeffb71be5de5929f85da9 · openanolis / cloud-kernel

14 5月, 2014 8 次提交

x86/traps: Kill DO_ERROR_INFO() · 0eb14833

由 Oleg Nesterov 提交于 5月 08, 2014

Now that DO_ERROR_INFO() doesn't differ from DO_ERROR() we can remove
it and use DO_ERROR() instead.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

0eb14833

x86/traps: Shift fill_trap_info() from DO_ERROR_INFO() to do_error_trap() · 1c326c4d

由 Oleg Nesterov 提交于 5月 08, 2014

Move the callsite of fill_trap_info() into do_error_trap() and remove
the "siginfo_t *info" argument.

This obviously breaks DO_ERROR() which passed info == NULL, we simply
change fill_trap_info() to return "siginfo_t *" and add the "default"
case which returns SEND_SIG_PRIV.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

1c326c4d

x86/traps: Introduce fill_trap_info(), simplify DO_ERROR_INFO() · 958d3d72

由 Oleg Nesterov 提交于 5月 07, 2014

Extract the fill-siginfo code from DO_ERROR_INFO() into the new helper,
fill_trap_info().

It can calculate si_code and si_addr looking at trapnr, so we can remove
these arguments from DO_ERROR_INFO() and simplify the source code. The
generated code is the same, __builtin_constant_p(trapnr) == T.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

958d3d72

x86/traps: Introduce do_error_trap() · dff0796e

由 Oleg Nesterov 提交于 5月 07, 2014

Move the common code from DO_ERROR() and DO_ERROR_INFO() into the new
helper, do_error_trap(). This simplifies define's and shaves 527 bytes
from traps.o.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

dff0796e

x86/traps: Use SEND_SIG_PRIV instead of force_sig() · 38cad57b

由 Oleg Nesterov 提交于 5月 07, 2014

force_sig() is just force_sig_info(SEND_SIG_PRIV). Imho it should die,
we have too many ugly "send signal" helpers.

And do_trap() looks just ugly because it uses force_sig_info() or
force_sig() depending on info != NULL.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

38cad57b

O
x86/traps: Make math_error() static · 5e1b05be
由 Oleg Nesterov 提交于 5月 08, 2014
```
Trivial, make math_error() static.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
```
5e1b05be

uprobes/x86: Fix scratch register selection for rip-relative fixups · 1ea30fb6

由 Denys Vlasenko 提交于 5月 02, 2014

Before this patch, instructions such as div, mul, shifts with count
in CL, cmpxchg are mishandled.

This patch adds vex prefix handling. In particular, it avoids colliding
with register operand encoded in vex.vvvv field.

Since we need to avoid two possible register operands, the selection of
scratch register needs to be from at least three registers.

After looking through a lot of CPU docs, it looks like the safest choice
is SI,DI,BX. Selecting BX needs care to not collide with implicit use of
BX by cmpxchg8b.

Test-case:

	#include <stdio.h>

	static const char *const pass[] = { "FAIL", "pass" };

	long two = 2;
	void test1(void)
	{
		long ax = 0, dx = 0;
		asm volatile("\n"
	"			xor	%%edx,%%edx\n"
	"			lea	2(%%edx),%%eax\n"
	// We divide 2 by 2. Result (in eax) should be 1:
	"	probe1:		.globl	probe1\n"
	"			divl	two(%%rip)\n"
	// If we have a bug (eax mangled on entry) the result will be 2,
	// because eax gets restored by probe machinery.
		: "=a" (ax), "=d" (dx) /*out*/
		: "0" (ax), "1" (dx) /*in*/
		: "memory" /*clobber*/
		);
		dprintf(2, "%s: %s\n", __func__,
			pass[ax == 1]
		);
	}

	long val2 = 0;
	void test2(void)
	{
		long old_val = val2;
		long ax = 0, dx = 0;
		asm volatile("\n"
	"			mov	val2,%%eax\n"     // eax := val2
	"			lea	1(%%eax),%%edx\n" // edx := eax+1
	// eax is equal to val2. cmpxchg should store edx to val2:
	"	probe2:		.globl  probe2\n"
	"			cmpxchg %%edx,val2(%%rip)\n"
	// If we have a bug (eax mangled on entry), val2 will stay unchanged
		: "=a" (ax), "=d" (dx) /*out*/
		: "0" (ax), "1" (dx) /*in*/
		: "memory" /*clobber*/
		);
		dprintf(2, "%s: %s\n", __func__,
			pass[val2 == old_val + 1]
		);
	}

	long val3[2] = {0,0};
	void test3(void)
	{
		long old_val = val3[0];
		long ax = 0, dx = 0;
		asm volatile("\n"
	"			mov	val3,%%eax\n"  // edx:eax := val3
	"			mov	val3+4,%%edx\n"
	"			mov	%%eax,%%ebx\n" // ecx:ebx := edx:eax + 1
	"			mov	%%edx,%%ecx\n"
	"			add	$1,%%ebx\n"
	"			adc	$0,%%ecx\n"
	// edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3:
	"	probe3:		.globl  probe3\n"
	"			cmpxchg8b val3(%%rip)\n"
	// If we have a bug (edx:eax mangled on entry), val3 will stay unchanged.
	// If ecx:edx in mangled, val3 will get wrong value.
		: "=a" (ax), "=d" (dx) /*out*/
		: "0" (ax), "1" (dx) /*in*/
		: "cx", "bx", "memory" /*clobber*/
		);
		dprintf(2, "%s: %s\n", __func__,
			pass[val3[0] == old_val + 1 && val3[1] == 0]
		);
	}

	int main(int argc, char **argv)
	{
		test1();
		test2();
		test3();
		return 0;
	}

Before this change all tests fail if probe{1,2,3} are probed.
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

1ea30fb6

uprobes/x86: Simplify rip-relative handling · 50204c6f

由 Denys Vlasenko 提交于 5月 01, 2014

It is possible to replace rip-relative addressing mode with addressing
mode of the same length: (reg+disp32). This eliminates the need to fix
up immediate and correct for changing instruction length.

And we can kill arch_uprobe->def.riprel_target.
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

50204c6f

01 5月, 2014 18 次提交

uprobes/x86: Simplify riprel_{pre,post}_xol() and make them similar · c90a6950

由 Oleg Nesterov 提交于 4月 27, 2014

Ignoring the "correction" logic riprel_pre_xol() and riprel_post_xol()
are very similar but look quite differently.

1. Add the "UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX" check at the start
   of riprel_pre_xol(), like the same check in riprel_post_xol().

2. Add the trivial scratch_reg() helper which returns the address of
   scratch register pre_xol/post_xol need to change.

3. Change these functions to use the new helper and avoid copy-and-paste
   under if/else branches.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

c90a6950

uprobes/x86: Kill the "autask" arg of riprel_pre_xol() · 7f55e82b

由 Oleg Nesterov 提交于 4月 27, 2014

default_pre_xol_op() passes &current->utask->autask to riprel_pre_xol()
and this is just ugly because it still needs to load current->utask to
read ->vaddr.

Remove this argument, change riprel_pre_xol() to use current->utask.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

7f55e82b

uprobes/x86: Rename *riprel* helpers to make the naming consistent · 1475ee7f

由 Oleg Nesterov 提交于 4月 27, 2014

handle_riprel_insn(), pre_xol_rip_insn() and handle_riprel_post_xol()
look confusing and inconsistent. Rename them into riprel_analyze(),
riprel_pre_xol(), and riprel_post_xol() respectively.

No changes in compiled code.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

1475ee7f

uprobes/x86: Cleanup the usage of UPROBE_FIX_IP/UPROBE_FIX_CALL · 83cd5914

由 Oleg Nesterov 提交于 4月 25, 2014

Now that UPROBE_FIX_IP/UPROBE_FIX_CALL are mutually exclusive we can
use a single "fix_ip_or_call" enum instead of 2 fix_* booleans. This
way the logic looks more understandable and clean to me.

While at it, join "case 0xea" with other "ip is correct" ret/lret cases.
Also change default_post_xol_op() to use "else if" for the same reason.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

83cd5914

uprobes/x86: Kill adjust_ret_addr(), simplify UPROBE_FIX_CALL logic · 1dc76e6e

由 Oleg Nesterov 提交于 4月 25, 2014

The only insn which could have both UPROBE_FIX_IP and UPROBE_FIX_CALL
was 0xe8 "call relative", and now it is handled by branch_xol_ops.

So we can change default_post_xol_op(UPROBE_FIX_CALL) to simply push
the address of next insn == utask->vaddr + insn.length, just we need
to record insn.length into the new auprobe->def.ilen member.

Note: if/when we teach branch_xol_ops to support jcxz/loopz we can
remove the "correction" logic, UPROBE_FIX_IP can use the same address.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

1dc76e6e

uprobes/x86: Introduce push_ret_address() · 2b82cadf

由 Oleg Nesterov 提交于 4月 24, 2014

Extract the "push return address" code from branch_emulate_op() into
the new simple helper, push_ret_address(). It will have more users.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

2b82cadf

uprobes/x86: Cleanup the usage of arch_uprobe->def.fixups, make it u8 · 78d9af4c

由 Oleg Nesterov 提交于 4月 24, 2014

handle_riprel_insn() assumes that nobody else could modify ->fixups
before. This is correct but fragile, change it to use "|=".

Also make ->fixups u8, we are going to add the new members into the
union. It is not clear why UPROBE_FIX_RIP_.X lived in the upper byte,
redefine them so that they can fit into u8.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

78d9af4c

uprobes/x86: Move default_xol_ops's data into arch_uprobe->def · 97aa5cdd

由 Oleg Nesterov 提交于 4月 22, 2014

Finally we can move arch_uprobe->fixups/rip_rela_target_address
into the new "def" struct and place this struct in the union, they
are only used by default_xol_ops paths.

The patch also renames rip_rela_target_address to riprel_target just
to make this name shorter.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

97aa5cdd

uprobes/x86: Move UPROBE_FIX_SETF logic from arch_uprobe_post_xol() to default_post_xol_op() · 220ef8dc

由 Oleg Nesterov 提交于 4月 21, 2014

UPROBE_FIX_SETF is only needed to handle "popf" correctly but it is
processed by the generic arch_uprobe_post_xol() code. This doesn't
allows us to make ->fixups private for default_xol_ops.

1 Change default_post_xol_op(UPROBE_FIX_SETF) to set ->saved_tf = T.

   "popf" always reads the flags from stack, it doesn't matter if TF
   was set or not before single-step. Ignoring the naming, this is
   even more logical, "saved_tf" means "owned by application" and we
   do not own this flag after "popf".

2. Change arch_uprobe_post_xol() to save ->saved_tf into the local
   "bool send_sigtrap" before ->post_xol().

3. Change arch_uprobe_post_xol() to ignore UPROBE_FIX_SETF and just
   check ->saved_tf after ->post_xol().

With this patch ->fixups and ->rip_rela_target_address are only used
by default_xol_ops hooks, we are ready to remove them from the common
part of arch_uprobe.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

220ef8dc

uprobes/x86: Don't use arch_uprobe_abort_xol() in arch_uprobe_post_xol() · 6ded5f38

由 Oleg Nesterov 提交于 4月 21, 2014

014940ba "uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails"
changed arch_uprobe_post_xol() to use arch_uprobe_abort_xol() if ->post_xol
fails. This was correct and helped to avoid the additional complications,
we need to clear X86_EFLAGS_TF in this case.

However, now that we have uprobe_xol_ops->abort() hook it would be better
to avoid arch_uprobe_abort_xol() here. ->post_xol() should likely do what
->abort() does anyway, we should not do the same work twice. Currently only
handle_riprel_post_xol() can be called twice, this is unnecessary but safe.
Still this is not clean and can lead to the problems in future.

Change arch_uprobe_post_xol() to clear X86_EFLAGS_TF and restore ->ip by
hand and avoid arch_uprobe_abort_xol(). This temporary uglifies the usage
of autask.saved_tf, we will cleanup this later.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

6ded5f38

uprobes/x86: Introduce uprobe_xol_ops->abort() and default_abort_op() · 588fbd61

由 Oleg Nesterov 提交于 4月 21, 2014

arch_uprobe_abort_xol() calls handle_riprel_post_xol() even if
auprobe->ops != default_xol_ops. This is fine correctness wise, only
default_pre_xol_op() can set UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX and
otherwise handle_riprel_post_xol() is nop.

But this doesn't look clean and this doesn't allow us to move ->fixups
into the union in arch_uprobe. Move this handle_riprel_post_xol() call
into the new default_abort_op() hook and change arch_uprobe_abort_xol()
accordingly.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

588fbd61

uprobes/x86: Don't change the task's state if ->pre_xol() fails · dd91016d

由 Oleg Nesterov 提交于 4月 22, 2014

Currently this doesn't matter, the only ->pre_xol() hook can't fail,
but we need to fix arch_uprobe_pre_xol() anyway. If ->pre_xol() fails
we should not change regs->ip/flags, we should just return the error
to make restart actually possible.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

dd91016d

uprobes/x86: Fix is_64bit_mm() with CONFIG_X86_X32 · b24dc8da

由 Oleg Nesterov 提交于 4月 19, 2014

is_64bit_mm() assumes that mm->context.ia32_compat means the 32-bit
instruction set, this is not true if the task is TIF_X32.

Change set_personality_ia32() to initialize mm->context.ia32_compat
by TIF_X32 or TIF_IA32 instead of 1. This allows to fix is_64bit_mm()
without affecting other users, they all treat ia32_compat as "bool".

TIF_ in ->ia32_compat looks a bit strange, but this is grep-friendly
and avoids the new define's.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

b24dc8da

uprobes/x86: Make good_insns_* depend on CONFIG_X86_* · 8dbacad9

由 Oleg Nesterov 提交于 4月 19, 2014

Add the suitable ifdef's around good_insns_* arrays. We do not want
to add the ugly ifdef's into their only user, uprobe_init_insn(), so
the "#else" branch simply defines them as NULL. This doesn't generate
the extra code, gcc is smart enough, although the code is fine even if
it could not detect that (without CONFIG_IA32_EMULATION) is_64bit_mm()
is __builtin_constant_p().

The patch looks more complicated because it also moves good_insns_64
up close to good_insns_32.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

8dbacad9

uprobes/x86: Shift "insn_complete" from branch_setup_xol_ops() to uprobe_init_insn() · ff261964

由 Oleg Nesterov 提交于 4月 19, 2014

Change uprobe_init_insn() to make insn_complete() == T, this makes
other insn_get_*() calls unnecessary.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

ff261964

uprobes/x86: Add is_64bit_mm(), kill validate_insn_bits() · 2ae1f49a

由 Oleg Nesterov 提交于 4月 19, 2014

1. Extract the ->ia32_compat check from 64bit validate_insn_bits()
   into the new helper, is_64bit_mm(), it will have more users.

   TODO: this checks is actually wrong if mm owner is X32 task,
   we need another fix which changes set_personality_ia32().

   TODO: even worse, the whole 64-or-32-bit logic is very broken
   and the fix is not simple, we need the nontrivial changes in
   the core uprobes code.

2. Kill validate_insn_bits() and change its single caller to use
   uprobe_init_insn(is_64bit_mm(mm).
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

2ae1f49a

uprobes/x86: Add uprobe_init_insn(), kill validate_insn_{32,64}bits() · 73175d0d

由 Oleg Nesterov 提交于 4月 19, 2014

validate_insn_32bits() and validate_insn_64bits() are very similar,
turn them into the single uprobe_init_insn() which has the additional
"bool x86_64" argument which can be passed to insn_init() and used to
choose between good_insns_64/good_insns_32.

Also kill UPROBE_FIX_NONE, it has no users.

Note: the current code doesn't use ifdef's consistently, good_insns_64
depends on CONFIG_X86_64 but good_insns_32 is unconditional. This patch
removes ifdef around good_insns_64, we will add it back later along with
the similar one for good_insns_32.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

73175d0d

uprobes/x86: Refuse to attach uprobe to "word-sized" branch insns · 250bbd12

由 Denys Vlasenko 提交于 4月 24, 2014

All branch insns on x86 can be prefixed with the operand-size
override prefix, 0x66. It was only ever useful for performing
jumps to 32-bit offsets in 16-bit code segments.

In 32-bit code, such instructions are useless since
they cause IP truncation to 16 bits, and in case of call insns,
they save only 16 bits of return address and misalign
the stack pointer as a "bonus".

In 64-bit code, such instructions are treated differently by Intel
and AMD CPUs: Intel ignores the prefix altogether,
AMD treats them the same as in 32-bit mode.

Before this patch, the emulation code would execute
the instructions as if they have no 0x66 prefix.

With this patch, we refuse to attach uprobes to such insns.
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
Acked-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

250bbd12

24 4月, 2014 1 次提交

perf/x86: Fix RAPL rdmsrl_safe() usage · 9f7ff893

由 Stephane Eranian 提交于 4月 23, 2014

This patch fixes a bug introduced by:

  24223657 ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU")

The rdmsrl_safe() function returns 0 on success.
The current code was failing to detect the RAPL PMU
on real hardware  (missing /sys/devices/power) because
the return value of rdmsrl_safe() was misinterpreted.
Signed-off-by: NStephane Eranian <eranian@google.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NVenkatesh Srinivas <venkateshs@google.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20140423170418.GA12767@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>

9f7ff893

18 4月, 2014 13 次提交

perf/x86: Export perf_assign_events() · 4a3dc121

由 Yan, Zheng 提交于 3月 18, 2014

export perf_assign_events to allow building perf Intel uncore driver
as module
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1395133004-23205-3-git-send-email-zheng.z.yan@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4a3dc121

perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU · 24223657

由 Venkatesh Srinivas 提交于 3月 13, 2014

CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.
Signed-off-by: NVenkatesh Srinivas <venkateshs@google.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

24223657

uprobes/x86: Emulate relative conditional "near" jmp's · 6cc5e7ff

由 Oleg Nesterov 提交于 4月 07, 2014

Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
condition.

Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
correct no matter if this jmp is "near" or "short".
Reported-by: NJonathan Lebon <jlebon@redhat.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

6cc5e7ff

uprobes/x86: Emulate relative conditional "short" jmp's · 8f95505b

由 Oleg Nesterov 提交于 4月 06, 2014

Teach branch_emulate_op() to emulate the conditional "short" jmp's which
check regs->flags.

Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
They all are rel8 and thus they can't trigger the problem, but perhaps
we will add the support in future just for completeness.
Reported-by: NJonathan Lebon <jlebon@redhat.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

8f95505b

uprobes/x86: Emulate relative call's · 8e89c0be

由 Oleg Nesterov 提交于 4月 06, 2014

See the previous "Emulate unconditional relative jmp's" which explains
why we can not execute "jmp" out-of-line, the same applies to "call".

Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.

But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.

Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
randomize_va_space/etc can change the placement of xol area).

Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
differently. This patch relies on lib/insn.c and thus implements the intel's
behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
this insn, so we postpone the fix until we decide what should we do; emulate
or not, support or not, etc.
Reported-by: NJonathan Lebon <jlebon@redhat.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

8e89c0be

uprobes/x86: Emulate nop's using ops->emulate() · d2410063

由 Oleg Nesterov 提交于 4月 05, 2014

Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.

Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

d2410063

uprobes/x86: Emulate unconditional relative jmp's · 7ba6db2d

由 Oleg Nesterov 提交于 4月 05, 2014

Currently we always execute all insns out-of-line, including relative
jmp's and call's. This assumes that even if regs->ip points to nowhere
after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
update it correctly.

However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
is not canonical. In this case CPU generates #GP and general_protection()
kills the task which tries to execute this insn out-of-line.

Now that we have uprobe_xol_ops we can teach uprobes to emulate these
insns and solve the problem. This patch adds branch_xol_ops which has
a single branch_emulate_op() hook, so far it can only handle rel8/32
relative jmp's.

TODO: move ->fixup into the union along with rip_rela_target_address.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NJonathan Lebon <jlebon@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

7ba6db2d

uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and... · 8faaed1b

由 Oleg Nesterov 提交于 4月 06, 2014

uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()

1. Add the trivial sizeof_long() helper and change other callers of
   is_ia32_task() to use it.

   TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
   not necessarily mean 32bit. Fortunately syscall-like insns can't be
   probed so it actually works, but it would be better to rename and
   use is_ia32_frame().

2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
   and adjust_ret_addr() should be named "nleft". And in fact only the
   last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
   needs to inspect the non-zero error code.

TODO: adjust_ret_addr() should die. We can always calculate the value
we need to write into *regs->sp, just UPROBE_FIX_CALL should record
insn->length.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

8faaed1b

uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible · 75f9ef0b

由 Oleg Nesterov 提交于 4月 03, 2014

SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.

Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.

Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.

default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
pointed out it should not forget to pop off the return address pushed
by this insn executed out-of-line.

Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

75f9ef0b

uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails · 014940ba

由 Oleg Nesterov 提交于 4月 03, 2014

Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.

1. Change handle_singlestep() to loudly complain and send SIGILL.

   Note: this only affects x86, ppc/arm can't fail.

2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
   avoid TF games if it is going to return an error.

   This can help to to analyze the problem, if nothing else we should
   not report ->ip = xol_slot in the core-file.

   Note: this means that handle_riprel_post_xol() can be called twice,
   but this is fine because it is idempotent.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>

014940ba

uprobes/x86: Conditionalize the usage of handle_riprel_insn() · e55848a4

由 Oleg Nesterov 提交于 3月 31, 2014

arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.

We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can not conflict with is_riprel_insn() logic
which, after this change, can only be used by default_xol_ops.

Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
directly. This is fine unless another _xol_ops we may add later will
need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
we will need anyway in the long term.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

e55848a4

uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops · 8ad8e9d3

由 Oleg Nesterov 提交于 3月 31, 2014

Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.

This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

8ad8e9d3

uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks · 34e7317d

由 Oleg Nesterov 提交于 3月 31, 2014

No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.

Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

34e7317d

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功