1. 15 2月, 2013 6 次提交
  2. 11 2月, 2013 1 次提交
    • I
      Merge branch 'uprobes/core' of... · a3d4fd7a
      Ingo Molnar 提交于
      Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
      
      Improve uprobes performance by adding 'pre-filtering' support,
      by Oleg Nesterov:
      
      	# time perl -e 'syscall -1 for 1..100_000'
      	real    0m0.040s
      	user    0m0.027s
      	sys     0m0.010s
      
      	# perf probe -x /lib/libc.so.6 syscall
      	# perf record -e probe_libc:syscall sleep 100 &
      
      Before this series:
      
      	# time perl -e 'syscall -1 for 1..100_000'
      	real    0m1.714s
      	user    0m0.103s
      	sys     0m1.607s
      
      After:
      
      	# time perl -e 'syscall -1 for 1..100_000'
      	real    0m0.037s
      	user    0m0.013s
      	sys     0m0.023s
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a3d4fd7a
  3. 09 2月, 2013 33 次提交
    • O
      uprobes/perf: Avoid uprobe_apply() whenever possible · b2fe8ba6
      Oleg Nesterov 提交于
      uprobe_perf_open/close call the costly uprobe_apply() every time,
      we can avoid it if:
      
      	- "nr_systemwide != 0" is not changed.
      
      	- There is another process/thread with the same ->mm.
      
      	- copy_proccess() does inherit_event(). dup_mmap() preserves the
      	  inserted breakpoints.
      
      	- event->attr.enable_on_exec == T, we can rely on uprobe_mmap()
      	  called by exec/mmap paths.
      
      	- tp_target is exiting. Only _close() checks PF_EXITING, I don't
      	  think TRACE_REG_PERF_OPEN can hit the dying task too often.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      b2fe8ba6
    • O
      uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE · f42d24a1
      Oleg Nesterov 提交于
      Change uprobe_trace_func() and uprobe_perf_func() to return "int". Change
      uprobe_dispatcher() to return "trace_ret | perf_ret" although this is not
      needed, currently TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive.
      
      The only functional change is that uprobe_perf_func() checks the filtering
      too and returns UPROBE_HANDLER_REMOVE if nobody wants to trace current.
      
      Testing:
      
      	# perf probe -x /lib/libc.so.6 syscall
      
      	# perf record -e probe_libc:syscall -i perl -e 'fork; syscall -1 for 1..10; wait'
      
      	# perf report --show-total-period
      		100.00%            10     perl  libc-2.8.so    [.] syscall
      
      Before this patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				20
      
      A child process doesn't have a counter, but still it hits this breakoint
      "copied" by dup_mmap().
      
      After the patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				11
      
      The child process hits this int3 only once and does unapply_uprobe().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f42d24a1
    • O
      uprobes/perf: Teach trace_uprobe/perf code to pre-filter · 31ba3348
      Oleg Nesterov 提交于
      Finally implement uprobe_perf_filter() which checks ->nr_systemwide or
      ->perf_events to figure out whether we need to insert the breakpoint.
      
      uprobe_perf_open/close are changed to do uprobe_apply(true/false) when
      the new perf event comes or goes away.
      
      Note that currently this is very suboptimal:
      
      	- uprobe_register() called by TRACE_REG_PERF_REGISTER becomes a
      	  heavy nop, consumer->filter() always returns F at this stage.
      
      	  As it was already discussed we need uprobe_register_only() to
      	  avoid the costly register_for_each_vma() when possible.
      
      	- uprobe_apply() is oftenly overkill. Unless "nr_systemwide != 0"
      	  changes we need uprobe_apply_mm(), unapply_uprobe() is almost
      	  what we need.
      
      	- uprobe_apply() can be simply avoided sometimes, see the next
      	  changes.
      
      Testing:
      
      	# perf probe -x /lib/libc.so.6 syscall
      
      	# perl -e 'syscall -1 while 1' &
      	[1] 530
      
      	# perf record -e probe_libc:syscall perl -e 'syscall -1 for 1..10; sleep 1'
      
      	# perf report --show-total-period
      		100.00%            10     perl  libc-2.8.so    [.] syscall
      
      Before this patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				79291
      
      A huge ->nrhit == 79291 reflects the fact that the background process
      530 constantly hits this breakpoint too, even if doesn't contribute to
      the output.
      
      After the patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				10
      
      This shows that only the target process was punished by int3.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      31ba3348
    • O
      uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's · 736288ba
      Oleg Nesterov 提交于
      Introduce "struct trace_uprobe_filter" which records the "active"
      perf_event's attached to ftrace_event_call. For the start we simply
      use list_head, we can optimize this later if needed. For example, we
      do not really need to record an event with ->parent != NULL, we can
      rely on parent->child_list. And we can certainly do some optimizations
      for the case when 2 events have the same ->tp_target or tp_target->mm.
      
      Change trace_uprobe_register() to process TRACE_REG_PERF_OPEN/CLOSE
      and add/del this perf_event to the list.
      
      We can probably avoid any locking, but lets start with the "obvioulsy
      correct" trace_uprobe_filter->rwlock which protects everything.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      736288ba
    • O
      uprobes: Introduce uprobe_apply() · bdf8647c
      Oleg Nesterov 提交于
      Currently it is not possible to change the filtering constraints after
      uprobe_register(), so a consumer can not, say, start to trace a task/mm
      which was previously filtered out, or remove the no longer needed bp's.
      
      Introduce uprobe_apply() which simply does register_for_each_vma() again
      to consult uprobe_consumer->filter() and install/remove the breakpoints.
      The only complication is that register_for_each_vma() can no longer
      assume that uprobe->consumers should be consulter if is_register == T,
      so we change it to accept "struct uprobe_consumer *new" instead.
      
      Unlike uprobe_register(), uprobe_apply(true) doesn't do "unregister" if
      register_for_each_vma() fails, it is up to caller to handle the error.
      
      Note: we probably need to cleanup the current interface, it is strange
      that uprobe_apply/unregister need inode/offset. We should either change
      uprobe_register() to return "struct uprobe *", or add a private ->uprobe
      member in uprobe_consumer. And in the long term uprobe_apply() should
      take a single argument, uprobe or consumer, even "bool add" should go
      away.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      bdf8647c
    • O
      perf: Introduce hw_perf_event->tp_target and ->tp_list · f22c1bb6
      Oleg Nesterov 提交于
      sys_perf_event_open()->perf_init_event(event) is called before
      find_get_context(event), this means that event->ctx == NULL when
      class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
      can't know if this event is per-task or system-wide.
      
      This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
      this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
      The patch also moves ->bp_target up so that it can overlap with the
      new member, this can help the compiler to generate the better code.
      
      trace_uprobe_register() will use it for prefiltering to avoid the
      unnecessary breakpoints in mm's we do not want to trace.
      
      ->tp_target doesn't have its own reference, but we can rely on the
      fact that either sys_perf_event_open() holds a reference, or it is
      equal to event->ctx->task. So this pointer is always valid until
      free_event().
      
      Also add the "struct list_head tp_list" into this union. It is not
      strictly necessary, but it can simplify the next changes and we can
      add it for free.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f22c1bb6
    • O
      uprobes/perf: Always increment trace_uprobe->nhit · 1b47aefd
      Oleg Nesterov 提交于
      Move tu->nhit++ from uprobe_trace_func() to uprobe_dispatcher().
      
      ->nhit counts how many time we hit the breakpoint inserted by this
      uprobe, we do not want to loose this info if uprobe was enabled by
      sys_perf_event_open().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      1b47aefd
    • O
      uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe · a932b738
      Oleg Nesterov 提交于
      trace_uprobe->consumer and "struct uprobe_trace_consumer" add the
      unnecessary indirection and complicate the code for no reason.
      
      This patch simply embeds uprobe_consumer into "struct trace_uprobe",
      all other changes only fix the compilation errors.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      a932b738
    • O
      uprobes/tracing: Introduce is_trace_uprobe_enabled() · b64b0077
      Oleg Nesterov 提交于
      probe_event_enable/disable() check tu->consumer != NULL to avoid the
      wrong uprobe_register/unregister().
      
      We are going to kill this pointer and "struct uprobe_trace_consumer",
      so we add the new helper, is_trace_uprobe_enabled(), which can rely
      on TP_FLAG_TRACE/TP_FLAG_PROFILE instead.
      
      Note: the current logic doesn't look optimal, it is not clear why
      TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive, we will probably
      change this later.
      
      Also kill the unused TP_FLAG_UPROBE.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      b64b0077
    • O
      uprobes/tracing: Ensure inode != NULL in create_trace_uprobe() · 7e4e28c5
      Oleg Nesterov 提交于
      probe_event_enable/disable() check tu->inode != NULL at the start.
      This is ugly, if igrab() can fail create_trace_uprobe() should not
      succeed and "postpone" the failure.
      
      And S_ISREG(inode->i_mode) check added by d24d7dbf is not safe.
      
      Note: alloc_uprobe() should probably check igrab() != NULL as well.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      7e4e28c5
    • O
      uprobes/tracing: Fully initialize uprobe_trace_consumer before uprobe_register() · 4161824f
      Oleg Nesterov 提交于
      probe_event_enable() does uprobe_register() and only after that sets
      utc->tu and tu->consumer/flags. This can race with uprobe_dispatcher()
      which can miss these assignments or see them out of order. Nothing
      really bad can happen, but this doesn't look clean/safe.
      
      And this does not allow to use uprobe_consumer->filter() we are going
      to add, it is called by uprobe_register() and it needs utc->tu.
      
      Change this code to initialize everything before uprobe_register(), and
      reset tu->consumer/flags if it fails. We can't race with event_disable(),
      the caller holds event_mutex, and if we could the code would be wrong
      anyway.
      
      In fact I think uprobe_trace_consumer should die, it buys nothing but
      complicates the code. We can simply add uprobe_consumer into trace_uprobe.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      4161824f
    • O
      uprobes/tracing: Fix dentry/mount leak in create_trace_uprobe() · 84d7ed79
      Oleg Nesterov 提交于
      create_trace_uprobe() does kern_path() to find ->d_inode, but forgets
      to do path_put(). We can do this right after igrab().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      84d7ed79
    • J
      uprobes: Add exports for module use · e8440c14
      Josh Stone 提交于
      The original pull message for uprobes (commit 654443e2) noted:
      
        This tree includes uprobes support in 'perf probe' - but SystemTap
        (and other tools) can take advantage of user probe points as well.
      
      In order to actually be usable in module-based tools like SystemTap, the
      interface needs to be exported.  This patch first adds the obvious
      exports for uprobe_register and uprobe_unregister.  Then it also adds
      one for task_user_regset_view, which is necessary to get the correct
      state of userspace registers.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      e8440c14
    • O
      uprobes: Kill the bogus IS_ERR_VALUE(xol_vaddr) check · af4355e9
      Oleg Nesterov 提交于
      utask->xol_vaddr is either zero or valid, remove the bogus
      IS_ERR_VALUE() check in xol_free_insn_slot().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      af4355e9
    • O
      uprobes: Do not allocate current->utask unnecessary · 608e7427
      Oleg Nesterov 提交于
      handle_swbp() does get_utask() before can_skip_sstep() for no reason,
      we do not need ->utask if can_skip_sstep() succeeds.
      
      Move get_utask() to pre_ssout() who actually starts to use it. Move
      the initialization of utask->active_uprobe/state as well. This way
      the whole initialization is consolidated in pre_ssout().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      608e7427
    • O
      uprobes: Fix utask->xol_vaddr leak in pre_ssout() · aba51024
      Oleg Nesterov 提交于
      pre_ssout() should do xol_free_insn_slot() if arch_uprobe_pre_xol()
      fails, otherwise nobody will free the allocated slot.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      aba51024
    • O
      uprobes: Do not play with utask in xol_get_insn_slot() · a6cb3f6d
      Oleg Nesterov 提交于
      pre_ssout()->xol_get_insn_slot() path is confusing and buggy. This patch
      cleanups the code, the next one fixes the bug.
      
      Change xol_get_insn_slot() to only allocate the slot and do nothing more,
      move the initialization of utask->xol_vaddr/vaddr into pre_ssout().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      a6cb3f6d
    • O
      uprobes: Turn add_utask() into get_utask() · 5a2df662
      Oleg Nesterov 提交于
      Rename add_utask() into get_utask() and change it to allocate on
      demand to simplify the caller. Like get_xol_area() it will have
      more users.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      5a2df662
    • O
      uprobes: Fold xol_alloc_area() into get_xol_area() · 9b545df8
      Oleg Nesterov 提交于
      Currently only xol_get_insn_slot() does get_xol_area() + xol_alloc_area(),
      but this will have more users and we do not want to copy-and-paste this
      code. This patch simply moves xol_alloc_area() into get_xol_area() to
      simplify the current and future code.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      9b545df8
    • O
      uprobes: Move alloc_page() from xol_add_vma() to xol_alloc_area() · c8a82538
      Oleg Nesterov 提交于
      Move alloc_page() from xol_add_vma() to xol_alloc_area() to cleanup
      the code. This separates the memory allocations and consolidates the
      -EALREADY cleanups and the error handling.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      c8a82538
    • O
      uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain() · 74e59dfc
      Oleg Nesterov 提交于
      Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
      what consumer->handler() needs but uprobe_get_swbp_addr() is not
      exported.
      
      This also simplifies the code and makes it more consistent across
      the supported architectures. handle_swbp() becomes the only caller
      of uprobe_get_swbp_addr().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      74e59dfc
    • O
      uprobes/x86: Change __skip_sstep() to actually skip the whole insn · cf31ec3f
      Oleg Nesterov 提交于
      __skip_sstep() doesn't update regs->ip. Currently this is correct
      but only "by accident" and it doesn't skip the whole insn. Change
      it to advance ->ip by the length of the detected 0x66*0x90 sequence.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      cf31ec3f
    • O
      uprobes: Teach handler_chain() to filter out the probed task · da1816b1
      Oleg Nesterov 提交于
      Currrently the are 2 problems with pre-filtering:
      
      1. It is not possible to add/remove a task (mm) after uprobe_register()
      
      2. A forked child inherits all breakpoints and uprobe_consumer can not
         control this.
      
      This patch does the first step to improve the filtering. handler_chain()
      removes the breakpoints installed by this uprobe from current->mm if all
      handlers return UPROBE_HANDLER_REMOVE.
      
      Note that handler_chain() relies on ->register_rwsem to avoid the race
      with uprobe_register/unregister which can add/del a consumer, or even
      remove and then insert the new uprobe at the same address.
      
      Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach
      copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes
      sense anyway.
      
      Note: instead of checking the retcode from uc->handler, we could add
      uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to
      call 2 hooks in a row. This buys nothing, and if handler/filter do
      something nontrivial they will probably do the same work twice.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      da1816b1
    • O
      uprobes: Reintroduce uprobe_consumer->filter() · 8a7f2fa0
      Oleg Nesterov 提交于
      Finally add uprobe_consumer->filter() and change consumer_filter()
      to actually call this method.
      
      Note that ->filter() accepts mm_struct, not task_struct. Because:
      
      	1. We do not have for_each_mm_user(mm, task).
      
      	2. Even if we implement for_each_mm_user(), ->filter() can
      	   use it itself.
      
      	3. It is not clear who will actually need this interface to
      	   do the "nontrivial" filtering.
      
      Another argument is "enum uprobe_filter_ctx", consumer->filter() can
      use it to figure out why/where it was called. For example, perhaps
      we can add UPROBE_FILTER_PRE_REGISTER used by build_map_info() to
      quickly "nack" the unwanted mm's. In this case consumer should know
      that it is called under ->i_mmap_mutex.
      
      See the previous discussion at http://marc.info/?t=135214229700002
      Perhaps we should pass more arguments, vma/vaddr?
      
      Note: this patch obviously can't help to filter out the child created
      by fork(), this will be addressed later.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      8a7f2fa0
    • O
      uprobes: Rationalize the usage of filter_chain() · 806a98bd
      Oleg Nesterov 提交于
      filter_chain() was added into install_breakpoint/remove_breakpoint to
      simplify the initial changes but this is sub-optimal.
      
      This patch shifts the callsite to the callers, register_for_each_vma()
      and uprobe_mmap(). This way:
      
      - It will be easier to add the new arguments. This is the main reason,
        we can do more optimizations later.
      
      - register_for_each_vma(is_register => true) can be optimized, we only
        need to consult the new consumer. The previous consumers were already
        asked when they called uprobe_register().
      
      This patch also moves the MMF_HAS_UPROBES check from remove_breakpoint(),
      this allows to avoid the potentionally costly filter_chain(). Note that
      register_for_each_vma(is_register => false) doesn't really need to take
      ->consumer_rwsem, but I don't think it makes sense to optimize this and
      introduce filter_chain_lockless().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      806a98bd
    • O
      uprobes: Kill uprobes_mutex[], separate alloc_uprobe() and __uprobe_register() · 66d06dff
      Oleg Nesterov 提交于
      uprobe_register() and uprobe_unregister() are the only users of
      mutex_lock(uprobes_hash(inode)), and the only reason why we can't
      simply remove it is that we need to ensure that delete_uprobe() is
      not possible after alloc_uprobe() and before consumer_add().
      
      IOW, we need to ensure that when we take uprobe->register_rwsem
      this uprobe is still valid and we didn't race with _unregister()
      which called delete_uprobe() in between.
      
      With this patch uprobe_register() simply checks uprobe_is_active()
      and retries if it hits this very unlikely race. uprobes_mutex[] is
      no longer needed and can be removed.
      
      There is another reason for this change, prepare_uprobe() should be
      folded into alloc_uprobe() and we do not want to hold the extra locks
      around read_mapping_page/etc.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      66d06dff
    • O
      uprobes: Introduce uprobe_is_active() · 06b7bcd8
      Oleg Nesterov 提交于
      The lifetime of uprobe->rb_node and uprobe->inode is not refcounted,
      delete_uprobe() is called when we detect that uprobe has no consumers,
      and it would be deadly wrong to do this twice.
      
      Change delete_uprobe() to WARN() if it was already called. We use
      RB_CLEAR_NODE() to mark uprobe "inactive", then RB_EMPTY_NODE() can
      be used to detect this case.
      
      RB_EMPTY_NODE() is not used directly, we add the trivial helper for
      the next change.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      06b7bcd8
    • O
      uprobes: Kill uprobe_events, use RB_EMPTY_ROOT() instead · 441f1eb7
      Oleg Nesterov 提交于
      uprobe_events counts the number of uprobes in uprobes_tree but
      it is used as a boolean. We can use RB_EMPTY_ROOT() instead.
      
      Probably no_uprobe_events() added by this patch can have more
      callers, say, mmf_recalc_uprobes().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnton Arapov <anton@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      441f1eb7
    • O
      uprobes: Kill uprobe->copy_mutex · d4d3ccc6
      Oleg Nesterov 提交于
      Now that ->register_rwsem is safe under ->mmap_sem we can kill
      ->copy_mutex and abuse down_write(&uprobe->consumer_rwsem).
      
      This makes prepare_uprobe() even more ugly, but we should kill
      it anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      d4d3ccc6
    • O
      uprobes: Kill UPROBE_RUN_HANDLER flag · bb929284
      Oleg Nesterov 提交于
      Simply remove UPROBE_RUN_HANDLER and the corresponding code.
      
      It can only help if uprobe has a single consumer, and in fact
      it is no longer needed after handler_chain() was changed to use
      ->register_rwsem, we simply can not race with uprobe_register().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      bb929284
    • O
      uprobes: Change filter_chain() to iterate ->consumers list · 1ff6fee5
      Oleg Nesterov 提交于
      Now that it safe to use ->consumer_rwsem under ->mmap_sem we can
      almost finish the implementation of filter_chain(). It still lacks
      the actual uc->filter(...) call but othewrwise it is ready, just
      it pretends that ->filter() always returns true.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      1ff6fee5
    • O
      uprobes: Introduce uprobe->register_rwsem · e591c8d7
      Oleg Nesterov 提交于
      Introduce uprobe->register_rwsem. It is taken for writing around
      __uprobe_register/unregister.
      
      Change handler_chain() to use this sem rather than consumer_rwsem.
      
      The main reason for this change is that we have the nasty problem
      with mmap_sem/consumer_rwsem dependency. filter_chain() needs to
      protect uprobe->consumers like handler_chain(), but they can not
      use the same lock. filter_chain() can be called under ->mmap_sem
      (currently this is always true), but we want to allow ->handler()
      to play with the probed task's memory, and this needs ->mmap_sem.
      
      Alternatively we could use srcu, but synchronize_srcu() is very
      slow and ->register_rwsem allows us to do more. In particular, we
      can teach handler_chain() to do remove_breakpoint() if this bp is
      "nacked" by all consumers, we know that we can't race with the
      new consumer which does uprobe_register().
      
      See also the next patches. uprobes_mutex[] is almost ready to die.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      e591c8d7
    • O
      uprobes: _register() should always do register_for_each_vma(true) · 9a98e03c
      Oleg Nesterov 提交于
      To support the filtering uprobe_register() should do
      register_for_each_vma(true) every time the new consumer comes,
      we need to install the previously nacked breakpoints.
      
      Note:
      	- uprobes_mutex[] should die, what it actually protects is
      	  alloc_uprobe().
      
      	- UPROBE_RUN_HANDLER should die too, obviously it can't work
      	  unless uprobe has a single consumer. The consumer should
      	  serialize with _register/_unregister itself. Or this flag
      	  should live in uprobe_consumer->state.
      
      	- Perhaps we can do some optimizations later. For example, if
      	  filter_chain() never returns false uprobe can record this
      	  fact and avoid the unnecessary register_for_each_vma().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      9a98e03c