1. 11 2月, 2013 1 次提交
    • I
      Merge branch 'uprobes/core' of... · a3d4fd7a
      Ingo Molnar 提交于
      Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
      
      Improve uprobes performance by adding 'pre-filtering' support,
      by Oleg Nesterov:
      
      	# time perl -e 'syscall -1 for 1..100_000'
      	real    0m0.040s
      	user    0m0.027s
      	sys     0m0.010s
      
      	# perf probe -x /lib/libc.so.6 syscall
      	# perf record -e probe_libc:syscall sleep 100 &
      
      Before this series:
      
      	# time perl -e 'syscall -1 for 1..100_000'
      	real    0m1.714s
      	user    0m0.103s
      	sys     0m1.607s
      
      After:
      
      	# time perl -e 'syscall -1 for 1..100_000'
      	real    0m0.037s
      	user    0m0.013s
      	sys     0m0.023s
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a3d4fd7a
  2. 09 2月, 2013 38 次提交
    • O
      uprobes/perf: Avoid uprobe_apply() whenever possible · b2fe8ba6
      Oleg Nesterov 提交于
      uprobe_perf_open/close call the costly uprobe_apply() every time,
      we can avoid it if:
      
      	- "nr_systemwide != 0" is not changed.
      
      	- There is another process/thread with the same ->mm.
      
      	- copy_proccess() does inherit_event(). dup_mmap() preserves the
      	  inserted breakpoints.
      
      	- event->attr.enable_on_exec == T, we can rely on uprobe_mmap()
      	  called by exec/mmap paths.
      
      	- tp_target is exiting. Only _close() checks PF_EXITING, I don't
      	  think TRACE_REG_PERF_OPEN can hit the dying task too often.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      b2fe8ba6
    • O
      uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE · f42d24a1
      Oleg Nesterov 提交于
      Change uprobe_trace_func() and uprobe_perf_func() to return "int". Change
      uprobe_dispatcher() to return "trace_ret | perf_ret" although this is not
      needed, currently TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive.
      
      The only functional change is that uprobe_perf_func() checks the filtering
      too and returns UPROBE_HANDLER_REMOVE if nobody wants to trace current.
      
      Testing:
      
      	# perf probe -x /lib/libc.so.6 syscall
      
      	# perf record -e probe_libc:syscall -i perl -e 'fork; syscall -1 for 1..10; wait'
      
      	# perf report --show-total-period
      		100.00%            10     perl  libc-2.8.so    [.] syscall
      
      Before this patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				20
      
      A child process doesn't have a counter, but still it hits this breakoint
      "copied" by dup_mmap().
      
      After the patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				11
      
      The child process hits this int3 only once and does unapply_uprobe().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f42d24a1
    • O
      uprobes/perf: Teach trace_uprobe/perf code to pre-filter · 31ba3348
      Oleg Nesterov 提交于
      Finally implement uprobe_perf_filter() which checks ->nr_systemwide or
      ->perf_events to figure out whether we need to insert the breakpoint.
      
      uprobe_perf_open/close are changed to do uprobe_apply(true/false) when
      the new perf event comes or goes away.
      
      Note that currently this is very suboptimal:
      
      	- uprobe_register() called by TRACE_REG_PERF_REGISTER becomes a
      	  heavy nop, consumer->filter() always returns F at this stage.
      
      	  As it was already discussed we need uprobe_register_only() to
      	  avoid the costly register_for_each_vma() when possible.
      
      	- uprobe_apply() is oftenly overkill. Unless "nr_systemwide != 0"
      	  changes we need uprobe_apply_mm(), unapply_uprobe() is almost
      	  what we need.
      
      	- uprobe_apply() can be simply avoided sometimes, see the next
      	  changes.
      
      Testing:
      
      	# perf probe -x /lib/libc.so.6 syscall
      
      	# perl -e 'syscall -1 while 1' &
      	[1] 530
      
      	# perf record -e probe_libc:syscall perl -e 'syscall -1 for 1..10; sleep 1'
      
      	# perf report --show-total-period
      		100.00%            10     perl  libc-2.8.so    [.] syscall
      
      Before this patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				79291
      
      A huge ->nrhit == 79291 reflects the fact that the background process
      530 constantly hits this breakpoint too, even if doesn't contribute to
      the output.
      
      After the patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				10
      
      This shows that only the target process was punished by int3.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      31ba3348
    • O
      uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's · 736288ba
      Oleg Nesterov 提交于
      Introduce "struct trace_uprobe_filter" which records the "active"
      perf_event's attached to ftrace_event_call. For the start we simply
      use list_head, we can optimize this later if needed. For example, we
      do not really need to record an event with ->parent != NULL, we can
      rely on parent->child_list. And we can certainly do some optimizations
      for the case when 2 events have the same ->tp_target or tp_target->mm.
      
      Change trace_uprobe_register() to process TRACE_REG_PERF_OPEN/CLOSE
      and add/del this perf_event to the list.
      
      We can probably avoid any locking, but lets start with the "obvioulsy
      correct" trace_uprobe_filter->rwlock which protects everything.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      736288ba
    • O
      uprobes: Introduce uprobe_apply() · bdf8647c
      Oleg Nesterov 提交于
      Currently it is not possible to change the filtering constraints after
      uprobe_register(), so a consumer can not, say, start to trace a task/mm
      which was previously filtered out, or remove the no longer needed bp's.
      
      Introduce uprobe_apply() which simply does register_for_each_vma() again
      to consult uprobe_consumer->filter() and install/remove the breakpoints.
      The only complication is that register_for_each_vma() can no longer
      assume that uprobe->consumers should be consulter if is_register == T,
      so we change it to accept "struct uprobe_consumer *new" instead.
      
      Unlike uprobe_register(), uprobe_apply(true) doesn't do "unregister" if
      register_for_each_vma() fails, it is up to caller to handle the error.
      
      Note: we probably need to cleanup the current interface, it is strange
      that uprobe_apply/unregister need inode/offset. We should either change
      uprobe_register() to return "struct uprobe *", or add a private ->uprobe
      member in uprobe_consumer. And in the long term uprobe_apply() should
      take a single argument, uprobe or consumer, even "bool add" should go
      away.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      bdf8647c
    • O
      perf: Introduce hw_perf_event->tp_target and ->tp_list · f22c1bb6
      Oleg Nesterov 提交于
      sys_perf_event_open()->perf_init_event(event) is called before
      find_get_context(event), this means that event->ctx == NULL when
      class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
      can't know if this event is per-task or system-wide.
      
      This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
      this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
      The patch also moves ->bp_target up so that it can overlap with the
      new member, this can help the compiler to generate the better code.
      
      trace_uprobe_register() will use it for prefiltering to avoid the
      unnecessary breakpoints in mm's we do not want to trace.
      
      ->tp_target doesn't have its own reference, but we can rely on the
      fact that either sys_perf_event_open() holds a reference, or it is
      equal to event->ctx->task. So this pointer is always valid until
      free_event().
      
      Also add the "struct list_head tp_list" into this union. It is not
      strictly necessary, but it can simplify the next changes and we can
      add it for free.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f22c1bb6
    • O
      uprobes/perf: Always increment trace_uprobe->nhit · 1b47aefd
      Oleg Nesterov 提交于
      Move tu->nhit++ from uprobe_trace_func() to uprobe_dispatcher().
      
      ->nhit counts how many time we hit the breakpoint inserted by this
      uprobe, we do not want to loose this info if uprobe was enabled by
      sys_perf_event_open().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      1b47aefd
    • O
      uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe · a932b738
      Oleg Nesterov 提交于
      trace_uprobe->consumer and "struct uprobe_trace_consumer" add the
      unnecessary indirection and complicate the code for no reason.
      
      This patch simply embeds uprobe_consumer into "struct trace_uprobe",
      all other changes only fix the compilation errors.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      a932b738
    • O
      uprobes/tracing: Introduce is_trace_uprobe_enabled() · b64b0077
      Oleg Nesterov 提交于
      probe_event_enable/disable() check tu->consumer != NULL to avoid the
      wrong uprobe_register/unregister().
      
      We are going to kill this pointer and "struct uprobe_trace_consumer",
      so we add the new helper, is_trace_uprobe_enabled(), which can rely
      on TP_FLAG_TRACE/TP_FLAG_PROFILE instead.
      
      Note: the current logic doesn't look optimal, it is not clear why
      TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive, we will probably
      change this later.
      
      Also kill the unused TP_FLAG_UPROBE.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      b64b0077
    • O
      uprobes/tracing: Ensure inode != NULL in create_trace_uprobe() · 7e4e28c5
      Oleg Nesterov 提交于
      probe_event_enable/disable() check tu->inode != NULL at the start.
      This is ugly, if igrab() can fail create_trace_uprobe() should not
      succeed and "postpone" the failure.
      
      And S_ISREG(inode->i_mode) check added by d24d7dbf is not safe.
      
      Note: alloc_uprobe() should probably check igrab() != NULL as well.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      7e4e28c5
    • O
      uprobes/tracing: Fully initialize uprobe_trace_consumer before uprobe_register() · 4161824f
      Oleg Nesterov 提交于
      probe_event_enable() does uprobe_register() and only after that sets
      utc->tu and tu->consumer/flags. This can race with uprobe_dispatcher()
      which can miss these assignments or see them out of order. Nothing
      really bad can happen, but this doesn't look clean/safe.
      
      And this does not allow to use uprobe_consumer->filter() we are going
      to add, it is called by uprobe_register() and it needs utc->tu.
      
      Change this code to initialize everything before uprobe_register(), and
      reset tu->consumer/flags if it fails. We can't race with event_disable(),
      the caller holds event_mutex, and if we could the code would be wrong
      anyway.
      
      In fact I think uprobe_trace_consumer should die, it buys nothing but
      complicates the code. We can simply add uprobe_consumer into trace_uprobe.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      4161824f
    • O
      uprobes/tracing: Fix dentry/mount leak in create_trace_uprobe() · 84d7ed79
      Oleg Nesterov 提交于
      create_trace_uprobe() does kern_path() to find ->d_inode, but forgets
      to do path_put(). We can do this right after igrab().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      84d7ed79
    • J
      uprobes: Add exports for module use · e8440c14
      Josh Stone 提交于
      The original pull message for uprobes (commit 654443e2) noted:
      
        This tree includes uprobes support in 'perf probe' - but SystemTap
        (and other tools) can take advantage of user probe points as well.
      
      In order to actually be usable in module-based tools like SystemTap, the
      interface needs to be exported.  This patch first adds the obvious
      exports for uprobe_register and uprobe_unregister.  Then it also adds
      one for task_user_regset_view, which is necessary to get the correct
      state of userspace registers.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      e8440c14
    • O
      uprobes: Kill the bogus IS_ERR_VALUE(xol_vaddr) check · af4355e9
      Oleg Nesterov 提交于
      utask->xol_vaddr is either zero or valid, remove the bogus
      IS_ERR_VALUE() check in xol_free_insn_slot().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      af4355e9
    • O
      uprobes: Do not allocate current->utask unnecessary · 608e7427
      Oleg Nesterov 提交于
      handle_swbp() does get_utask() before can_skip_sstep() for no reason,
      we do not need ->utask if can_skip_sstep() succeeds.
      
      Move get_utask() to pre_ssout() who actually starts to use it. Move
      the initialization of utask->active_uprobe/state as well. This way
      the whole initialization is consolidated in pre_ssout().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      608e7427
    • O
      uprobes: Fix utask->xol_vaddr leak in pre_ssout() · aba51024
      Oleg Nesterov 提交于
      pre_ssout() should do xol_free_insn_slot() if arch_uprobe_pre_xol()
      fails, otherwise nobody will free the allocated slot.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      aba51024
    • O
      uprobes: Do not play with utask in xol_get_insn_slot() · a6cb3f6d
      Oleg Nesterov 提交于
      pre_ssout()->xol_get_insn_slot() path is confusing and buggy. This patch
      cleanups the code, the next one fixes the bug.
      
      Change xol_get_insn_slot() to only allocate the slot and do nothing more,
      move the initialization of utask->xol_vaddr/vaddr into pre_ssout().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      a6cb3f6d
    • O
      uprobes: Turn add_utask() into get_utask() · 5a2df662
      Oleg Nesterov 提交于
      Rename add_utask() into get_utask() and change it to allocate on
      demand to simplify the caller. Like get_xol_area() it will have
      more users.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      5a2df662
    • O
      uprobes: Fold xol_alloc_area() into get_xol_area() · 9b545df8
      Oleg Nesterov 提交于
      Currently only xol_get_insn_slot() does get_xol_area() + xol_alloc_area(),
      but this will have more users and we do not want to copy-and-paste this
      code. This patch simply moves xol_alloc_area() into get_xol_area() to
      simplify the current and future code.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      9b545df8
    • O
      uprobes: Move alloc_page() from xol_add_vma() to xol_alloc_area() · c8a82538
      Oleg Nesterov 提交于
      Move alloc_page() from xol_add_vma() to xol_alloc_area() to cleanup
      the code. This separates the memory allocations and consolidates the
      -EALREADY cleanups and the error handling.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      c8a82538
    • O
      uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain() · 74e59dfc
      Oleg Nesterov 提交于
      Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
      what consumer->handler() needs but uprobe_get_swbp_addr() is not
      exported.
      
      This also simplifies the code and makes it more consistent across
      the supported architectures. handle_swbp() becomes the only caller
      of uprobe_get_swbp_addr().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      74e59dfc
    • O
      uprobes/x86: Change __skip_sstep() to actually skip the whole insn · cf31ec3f
      Oleg Nesterov 提交于
      __skip_sstep() doesn't update regs->ip. Currently this is correct
      but only "by accident" and it doesn't skip the whole insn. Change
      it to advance ->ip by the length of the detected 0x66*0x90 sequence.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      cf31ec3f
    • O
      uprobes: Teach handler_chain() to filter out the probed task · da1816b1
      Oleg Nesterov 提交于
      Currrently the are 2 problems with pre-filtering:
      
      1. It is not possible to add/remove a task (mm) after uprobe_register()
      
      2. A forked child inherits all breakpoints and uprobe_consumer can not
         control this.
      
      This patch does the first step to improve the filtering. handler_chain()
      removes the breakpoints installed by this uprobe from current->mm if all
      handlers return UPROBE_HANDLER_REMOVE.
      
      Note that handler_chain() relies on ->register_rwsem to avoid the race
      with uprobe_register/unregister which can add/del a consumer, or even
      remove and then insert the new uprobe at the same address.
      
      Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach
      copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes
      sense anyway.
      
      Note: instead of checking the retcode from uc->handler, we could add
      uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to
      call 2 hooks in a row. This buys nothing, and if handler/filter do
      something nontrivial they will probably do the same work twice.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      da1816b1
    • O
      uprobes: Reintroduce uprobe_consumer->filter() · 8a7f2fa0
      Oleg Nesterov 提交于
      Finally add uprobe_consumer->filter() and change consumer_filter()
      to actually call this method.
      
      Note that ->filter() accepts mm_struct, not task_struct. Because:
      
      	1. We do not have for_each_mm_user(mm, task).
      
      	2. Even if we implement for_each_mm_user(), ->filter() can
      	   use it itself.
      
      	3. It is not clear who will actually need this interface to
      	   do the "nontrivial" filtering.
      
      Another argument is "enum uprobe_filter_ctx", consumer->filter() can
      use it to figure out why/where it was called. For example, perhaps
      we can add UPROBE_FILTER_PRE_REGISTER used by build_map_info() to
      quickly "nack" the unwanted mm's. In this case consumer should know
      that it is called under ->i_mmap_mutex.
      
      See the previous discussion at http://marc.info/?t=135214229700002
      Perhaps we should pass more arguments, vma/vaddr?
      
      Note: this patch obviously can't help to filter out the child created
      by fork(), this will be addressed later.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      8a7f2fa0
    • O
      uprobes: Rationalize the usage of filter_chain() · 806a98bd
      Oleg Nesterov 提交于
      filter_chain() was added into install_breakpoint/remove_breakpoint to
      simplify the initial changes but this is sub-optimal.
      
      This patch shifts the callsite to the callers, register_for_each_vma()
      and uprobe_mmap(). This way:
      
      - It will be easier to add the new arguments. This is the main reason,
        we can do more optimizations later.
      
      - register_for_each_vma(is_register => true) can be optimized, we only
        need to consult the new consumer. The previous consumers were already
        asked when they called uprobe_register().
      
      This patch also moves the MMF_HAS_UPROBES check from remove_breakpoint(),
      this allows to avoid the potentionally costly filter_chain(). Note that
      register_for_each_vma(is_register => false) doesn't really need to take
      ->consumer_rwsem, but I don't think it makes sense to optimize this and
      introduce filter_chain_lockless().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      806a98bd
    • O
      uprobes: Kill uprobes_mutex[], separate alloc_uprobe() and __uprobe_register() · 66d06dff
      Oleg Nesterov 提交于
      uprobe_register() and uprobe_unregister() are the only users of
      mutex_lock(uprobes_hash(inode)), and the only reason why we can't
      simply remove it is that we need to ensure that delete_uprobe() is
      not possible after alloc_uprobe() and before consumer_add().
      
      IOW, we need to ensure that when we take uprobe->register_rwsem
      this uprobe is still valid and we didn't race with _unregister()
      which called delete_uprobe() in between.
      
      With this patch uprobe_register() simply checks uprobe_is_active()
      and retries if it hits this very unlikely race. uprobes_mutex[] is
      no longer needed and can be removed.
      
      There is another reason for this change, prepare_uprobe() should be
      folded into alloc_uprobe() and we do not want to hold the extra locks
      around read_mapping_page/etc.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      66d06dff
    • O
      uprobes: Introduce uprobe_is_active() · 06b7bcd8
      Oleg Nesterov 提交于
      The lifetime of uprobe->rb_node and uprobe->inode is not refcounted,
      delete_uprobe() is called when we detect that uprobe has no consumers,
      and it would be deadly wrong to do this twice.
      
      Change delete_uprobe() to WARN() if it was already called. We use
      RB_CLEAR_NODE() to mark uprobe "inactive", then RB_EMPTY_NODE() can
      be used to detect this case.
      
      RB_EMPTY_NODE() is not used directly, we add the trivial helper for
      the next change.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      06b7bcd8
    • O
      uprobes: Kill uprobe_events, use RB_EMPTY_ROOT() instead · 441f1eb7
      Oleg Nesterov 提交于
      uprobe_events counts the number of uprobes in uprobes_tree but
      it is used as a boolean. We can use RB_EMPTY_ROOT() instead.
      
      Probably no_uprobe_events() added by this patch can have more
      callers, say, mmf_recalc_uprobes().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      441f1eb7
    • O
      uprobes: Kill uprobe->copy_mutex · d4d3ccc6
      Oleg Nesterov 提交于
      Now that ->register_rwsem is safe under ->mmap_sem we can kill
      ->copy_mutex and abuse down_write(&uprobe->consumer_rwsem).
      
      This makes prepare_uprobe() even more ugly, but we should kill
      it anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      d4d3ccc6
    • O
      uprobes: Kill UPROBE_RUN_HANDLER flag · bb929284
      Oleg Nesterov 提交于
      Simply remove UPROBE_RUN_HANDLER and the corresponding code.
      
      It can only help if uprobe has a single consumer, and in fact
      it is no longer needed after handler_chain() was changed to use
      ->register_rwsem, we simply can not race with uprobe_register().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      bb929284
    • O
      uprobes: Change filter_chain() to iterate ->consumers list · 1ff6fee5
      Oleg Nesterov 提交于
      Now that it safe to use ->consumer_rwsem under ->mmap_sem we can
      almost finish the implementation of filter_chain(). It still lacks
      the actual uc->filter(...) call but othewrwise it is ready, just
      it pretends that ->filter() always returns true.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      1ff6fee5
    • O
      uprobes: Introduce uprobe->register_rwsem · e591c8d7
      Oleg Nesterov 提交于
      Introduce uprobe->register_rwsem. It is taken for writing around
      __uprobe_register/unregister.
      
      Change handler_chain() to use this sem rather than consumer_rwsem.
      
      The main reason for this change is that we have the nasty problem
      with mmap_sem/consumer_rwsem dependency. filter_chain() needs to
      protect uprobe->consumers like handler_chain(), but they can not
      use the same lock. filter_chain() can be called under ->mmap_sem
      (currently this is always true), but we want to allow ->handler()
      to play with the probed task's memory, and this needs ->mmap_sem.
      
      Alternatively we could use srcu, but synchronize_srcu() is very
      slow and ->register_rwsem allows us to do more. In particular, we
      can teach handler_chain() to do remove_breakpoint() if this bp is
      "nacked" by all consumers, we know that we can't race with the
      new consumer which does uprobe_register().
      
      See also the next patches. uprobes_mutex[] is almost ready to die.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      e591c8d7
    • O
      uprobes: _register() should always do register_for_each_vma(true) · 9a98e03c
      Oleg Nesterov 提交于
      To support the filtering uprobe_register() should do
      register_for_each_vma(true) every time the new consumer comes,
      we need to install the previously nacked breakpoints.
      
      Note:
      	- uprobes_mutex[] should die, what it actually protects is
      	  alloc_uprobe().
      
      	- UPROBE_RUN_HANDLER should die too, obviously it can't work
      	  unless uprobe has a single consumer. The consumer should
      	  serialize with _register/_unregister itself. Or this flag
      	  should live in uprobe_consumer->state.
      
      	- Perhaps we can do some optimizations later. For example, if
      	  filter_chain() never returns false uprobe can record this
      	  fact and avoid the unnecessary register_for_each_vma().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      9a98e03c
    • O
      uprobes: _unregister() should always do register_for_each_vma(false) · 04aab9b2
      Oleg Nesterov 提交于
      uprobe_unregister() removes the breakpoints only if the last consumer
      goes away. To support the filtering it should do this every time, we
      want to remove the breakpoints which nobody else want to keep.
      
      Note: given that filter_chain() is not actually implemented, this patch
      itself doesn't change the behaviour yet, register_for_each_vma(false)
      is a heavy "nop" unless there are no more consumers.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      04aab9b2
    • O
      uprobes: Introduce filter_chain() · 63633cbf
      Oleg Nesterov 提交于
      Add the new helper filter_chain(). Currently it is only placeholder,
      the comment explains what is should do. We will change it later to
      consult every consumer to decide whether we need to install the swbp.
      Until then it works as if any consumer returns true, this matches the
      current behavior.
      
      Change install_breakpoint() to call filter_chain() instead of checking
      uprobe->consumers != NULL. We obviously need this, and this equally
      closes the race with _unregister().
      
      Change remove_breakpoint() to call this helper too. Currently this is
      pointless because remove_breakpoint() is only called when the last
      consumer goes away, but we will change this.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      63633cbf
    • O
      uprobes: Kill uprobe_consumer->filter() · fe20d71f
      Oleg Nesterov 提交于
      uprobe_consumer->filter() is pointless in its current form, kill it.
      
      We will add it back, but with the different signature/semantics. Perhaps
      we will even re-introduce the callsite in handler_chain(), but not to
      just skip uc->handler().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      fe20d71f
    • O
      uprobes: Kill the pointless inode/uc checks in register/unregister · f0744af7
      Oleg Nesterov 提交于
      register/unregister verifies that inode/uc != NULL. For what?
      This really looks like "hide the potential problem", the caller
      should pass the valid data.
      
      register() also checks uc->next == NULL, probably to prevent the
      double-register but the caller can do other stupid/wrong things.
      If we do this check, then we should document that uc->next should
      be cleared before register() and add BUG_ON().
      
      Also add the small comment about the i_size_read() check.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      f0744af7
    • O
      uprobes: Move __set_bit(UPROBE_SKIP_SSTEP) into alloc_uprobe() · bbc33d05
      Oleg Nesterov 提交于
      Cosmetic. __set_bit(UPROBE_SKIP_SSTEP) is the part of initialization,
      it is not clear why it is set in insert_uprobe().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      bbc33d05
  3. 07 2月, 2013 1 次提交
    • I
      Merge tag 'perf-core-for-mingo' of... · 661e5915
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      . Check for flex and bison before continuing building, from Borislav Petkov.
      
      . Make event_copy local to mmaps, fixing buffer wrap around problems, from
        David Ahern.
      
      . Add option for runtime switching perf data file in perf report, just press
        's' and a menu with the valid files found in the current directory will be
        presented, from Feng Tang.
      
      . Add support to display whole group data for raw columns, from Jiri Olsa.
      
      . Fix SIGALRM and pipe read race for the rwtop perl script. from Jiri Olsa.
      
      . Fix perf_evsel::exclude_GH handling and add a test to catch regressions, from
        Jiri Olsa.
      
      . Error checking fixes, from Namhyung Kim.
      
      . Fix calloc argument ordering, from Paul Gortmaker.
      
      . Fix set event list leader, from Stephane Eranian.
      
      . Add per processor socket count aggregation in perf stat, from Stephane Eranian.
      
      . Fix perf python binding breakage.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      661e5915