1. 09 2月, 2013 7 次提交
    • O
      uprobes: Introduce uprobe->register_rwsem · e591c8d7
      Oleg Nesterov 提交于
      Introduce uprobe->register_rwsem. It is taken for writing around
      __uprobe_register/unregister.
      
      Change handler_chain() to use this sem rather than consumer_rwsem.
      
      The main reason for this change is that we have the nasty problem
      with mmap_sem/consumer_rwsem dependency. filter_chain() needs to
      protect uprobe->consumers like handler_chain(), but they can not
      use the same lock. filter_chain() can be called under ->mmap_sem
      (currently this is always true), but we want to allow ->handler()
      to play with the probed task's memory, and this needs ->mmap_sem.
      
      Alternatively we could use srcu, but synchronize_srcu() is very
      slow and ->register_rwsem allows us to do more. In particular, we
      can teach handler_chain() to do remove_breakpoint() if this bp is
      "nacked" by all consumers, we know that we can't race with the
      new consumer which does uprobe_register().
      
      See also the next patches. uprobes_mutex[] is almost ready to die.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      e591c8d7
    • O
      uprobes: _register() should always do register_for_each_vma(true) · 9a98e03c
      Oleg Nesterov 提交于
      To support the filtering uprobe_register() should do
      register_for_each_vma(true) every time the new consumer comes,
      we need to install the previously nacked breakpoints.
      
      Note:
      	- uprobes_mutex[] should die, what it actually protects is
      	  alloc_uprobe().
      
      	- UPROBE_RUN_HANDLER should die too, obviously it can't work
      	  unless uprobe has a single consumer. The consumer should
      	  serialize with _register/_unregister itself. Or this flag
      	  should live in uprobe_consumer->state.
      
      	- Perhaps we can do some optimizations later. For example, if
      	  filter_chain() never returns false uprobe can record this
      	  fact and avoid the unnecessary register_for_each_vma().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      9a98e03c
    • O
      uprobes: _unregister() should always do register_for_each_vma(false) · 04aab9b2
      Oleg Nesterov 提交于
      uprobe_unregister() removes the breakpoints only if the last consumer
      goes away. To support the filtering it should do this every time, we
      want to remove the breakpoints which nobody else want to keep.
      
      Note: given that filter_chain() is not actually implemented, this patch
      itself doesn't change the behaviour yet, register_for_each_vma(false)
      is a heavy "nop" unless there are no more consumers.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      04aab9b2
    • O
      uprobes: Introduce filter_chain() · 63633cbf
      Oleg Nesterov 提交于
      Add the new helper filter_chain(). Currently it is only placeholder,
      the comment explains what is should do. We will change it later to
      consult every consumer to decide whether we need to install the swbp.
      Until then it works as if any consumer returns true, this matches the
      current behavior.
      
      Change install_breakpoint() to call filter_chain() instead of checking
      uprobe->consumers != NULL. We obviously need this, and this equally
      closes the race with _unregister().
      
      Change remove_breakpoint() to call this helper too. Currently this is
      pointless because remove_breakpoint() is only called when the last
      consumer goes away, but we will change this.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      63633cbf
    • O
      uprobes: Kill uprobe_consumer->filter() · fe20d71f
      Oleg Nesterov 提交于
      uprobe_consumer->filter() is pointless in its current form, kill it.
      
      We will add it back, but with the different signature/semantics. Perhaps
      we will even re-introduce the callsite in handler_chain(), but not to
      just skip uc->handler().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      fe20d71f
    • O
      uprobes: Kill the pointless inode/uc checks in register/unregister · f0744af7
      Oleg Nesterov 提交于
      register/unregister verifies that inode/uc != NULL. For what?
      This really looks like "hide the potential problem", the caller
      should pass the valid data.
      
      register() also checks uc->next == NULL, probably to prevent the
      double-register but the caller can do other stupid/wrong things.
      If we do this check, then we should document that uc->next should
      be cleared before register() and add BUG_ON().
      
      Also add the small comment about the i_size_read() check.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      f0744af7
    • O
      uprobes: Move __set_bit(UPROBE_SKIP_SSTEP) into alloc_uprobe() · bbc33d05
      Oleg Nesterov 提交于
      Cosmetic. __set_bit(UPROBE_SKIP_SSTEP) is the part of initialization,
      it is not clear why it is set in insert_uprobe().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      bbc33d05
  2. 25 1月, 2013 1 次提交
  3. 16 11月, 2012 1 次提交
    • O
      uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race · 32cdba1e
      Oleg Nesterov 提交于
      This was always racy, but 26872090
      "uprobes: Rework register_for_each_vma() to make it O(n)" should be
      blamed anyway, it made everything worse and I didn't notice.
      
      register/unregister call build_map_info() and then do install/remove
      breakpoint for every mm which mmaps inode/offset. This can obviously
      race with fork()->dup_mmap() in between and we can miss the child.
      
      uprobe_register() could be easily fixed but unregister is much worse,
      the new mm inherits "int3" from parent and there is no way to detect
      this if uprobe goes away.
      
      So this patch simply adds percpu_down_read/up_read around dup_mmap(),
      and percpu_down_write/up_write into register_for_each_vma().
      
      This adds 2 new hooks into dup_mmap() but we can kill uprobe_dup_mmap()
      and fold it into uprobe_end_dup_mmap().
      Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      32cdba1e
  4. 15 11月, 2012 1 次提交
  5. 04 11月, 2012 2 次提交
  6. 09 10月, 2012 2 次提交
    • H
      mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end · 6bdb913f
      Haggai Eran 提交于
      In order to allow sleeping during invalidate_page mmu notifier calls, we
      need to avoid calling when holding the PT lock.  In addition to its direct
      calls, invalidate_page can also be called as a substitute for a change_pte
      call, in case the notifier client hasn't implemented change_pte.
      
      This patch drops the invalidate_page call from change_pte, and instead
      wraps all calls to change_pte with invalidate_range_start and
      invalidate_range_end calls.
      
      Note that change_pte still cannot sleep after this patch, and that clients
      implementing change_pte should not take action on it in case the number of
      outstanding invalidate_range_start calls is larger than one, otherwise
      they might miss a later invalidation.
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Cc: Andrea Arcangeli <andrea@qumranet.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Liran Liss <liranl@mellanox.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6bdb913f
    • M
      mm: replace vma prio_tree with an interval tree · 6b2dbba8
      Michel Lespinasse 提交于
      Implement an interval tree as a replacement for the VMA prio_tree.  The
      algorithms are similar to lib/interval_tree.c; however that code can't be
      directly reused as the interval endpoints are not explicitly stored in the
      VMA.  So instead, the common algorithm is moved into a template and the
      details (node type, how to get interval endpoints from the node, etc) are
      filled in using the C preprocessor.
      
      Once the interval tree functions are available, using them as a
      replacement to the VMA prio tree is a relatively simple, mechanical job.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b2dbba8
  7. 08 10月, 2012 6 次提交
    • O
      uprobes: Fix the racy uprobe->flags manipulation · 71434f2f
      Oleg Nesterov 提交于
      Multiple threads can manipulate uprobe->flags, this is obviously
      unsafe. For example mmap can set UPROBE_COPY_INSN while register
      tries to set UPROBE_RUN_HANDLER, the latter can also race with
      can_skip_sstep() which clears UPROBE_SKIP_SSTEP.
      
      Change this code to use bitops.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      71434f2f
    • O
      uprobes: Fix prepare_uprobe() race with itself · 4710f05f
      Oleg Nesterov 提交于
      install_breakpoint() is called under mm->mmap_sem, this protects
      set_swbp() but not prepare_uprobe(). Two or more different tasks
      can call install_breakpoint()->prepare_uprobe() at the same time,
      this leads to numerous problems if UPROBE_COPY_INSN is not set.
      
      Just for example, the second copy_insn() can corrupt the already
      analyzed/fixuped uprobe->arch.insn and race with handle_swbp().
      
      This patch simply adds uprobe->copy_mutex to serialize this code.
      We could probably reuse ->consumer_rwsem, but this would mean that
      consumer->handler() can not use mm->mmap_sem, not good.
      
      Note: this is another temporary ugly hack until we move this logic
      into uprobe_register().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      4710f05f
    • O
      uprobes: Introduce prepare_uprobe() · cb9a19fe
      Oleg Nesterov 提交于
      Preparation. Extract the copy_insn/arch_uprobe_analyze_insn code
      from install_breakpoint() into the new helper, prepare_uprobe().
      
      And move uprobe->flags defines from uprobes.h to uprobes.c, nobody
      else can use them anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      cb9a19fe
    • O
      uprobes: Fix handle_swbp() vs unregister() + register() race · 142b18dd
      Oleg Nesterov 提交于
      Strictly speaking this race was added by me in 56bb4cf6. However
      I think that this bug is just another indication that we should
      move copy_insn/uprobe_analyze_insn code from install_breakpoint()
      to uprobe_register(), there are a lot of other reasons for that.
      Until then, add a hack to close the race.
      
      A task can hit uprobe U1, but before it calls find_uprobe() this
      uprobe can be unregistered *AND* another uprobe U2 can be added to
      uprobes_tree at the same inode/offset. In this case handle_swbp()
      will use the not-fully-initialized U2, in particular its arch.insn
      for xol.
      
      Add the additional !UPROBE_COPY_INSN check into handle_swbp(),
      if this flag is not set we simply restart as if the new uprobe was
      not inserted yet. This is not very nice, we need barriers, but we
      will remove this hack when we change uprobe_register().
      
      Note: with or without this patch install_breakpoint() can race with
      itself, yet another reson to kill UPROBE_COPY_INSN altogether. And
      even the usage of uprobe->flags is not safe. See the next patches.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      142b18dd
    • O
      uprobes: Do not delete uprobe if uprobe_unregister() fails · 076a365b
      Oleg Nesterov 提交于
      delete_uprobe() must not be called if register_for_each_vma(false)
      fails to remove all breakpoints, __uprobe_unregister() is correct.
      The problem is that register_for_each_vma(false) always returns 0
      and thus this logic does not work.
      
      1. Change verify_opcode() to return 0 rather than -EINVAL when
         unregister detects the !is_swbp insn, we can treat this case
         as success and currently unregister paths ignore the error
         code anyway.
      
      2. Change remove_breakpoint() to propagate the error code from
         write_opcode().
      
      3. Change register_for_each_vma(is_register => false) to remove
         as much breakpoints as possible but return non-zero if
         remove_breakpoint() fails at least once.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      076a365b
    • O
      uprobes: Don't return success if alloc_uprobe() fails · a5f658b7
      Oleg Nesterov 提交于
      If alloc_uprobe() fails uprobe_register() should return ENOMEM, not 0.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      a5f658b7
  8. 30 9月, 2012 12 次提交
  9. 15 9月, 2012 5 次提交
  10. 29 8月, 2012 3 次提交