1. 09 2月, 2013 2 次提交
  2. 25 1月, 2013 1 次提交
  3. 16 11月, 2012 1 次提交
    • O
      uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race · 32cdba1e
      Oleg Nesterov 提交于
      This was always racy, but 26872090
      "uprobes: Rework register_for_each_vma() to make it O(n)" should be
      blamed anyway, it made everything worse and I didn't notice.
      
      register/unregister call build_map_info() and then do install/remove
      breakpoint for every mm which mmaps inode/offset. This can obviously
      race with fork()->dup_mmap() in between and we can miss the child.
      
      uprobe_register() could be easily fixed but unregister is much worse,
      the new mm inherits "int3" from parent and there is no way to detect
      this if uprobe goes away.
      
      So this patch simply adds percpu_down_read/up_read around dup_mmap(),
      and percpu_down_write/up_write into register_for_each_vma().
      
      This adds 2 new hooks into dup_mmap() but we can kill uprobe_dup_mmap()
      and fold it into uprobe_end_dup_mmap().
      Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      32cdba1e
  4. 15 11月, 2012 1 次提交
  5. 04 11月, 2012 2 次提交
  6. 09 10月, 2012 2 次提交
    • H
      mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end · 6bdb913f
      Haggai Eran 提交于
      In order to allow sleeping during invalidate_page mmu notifier calls, we
      need to avoid calling when holding the PT lock.  In addition to its direct
      calls, invalidate_page can also be called as a substitute for a change_pte
      call, in case the notifier client hasn't implemented change_pte.
      
      This patch drops the invalidate_page call from change_pte, and instead
      wraps all calls to change_pte with invalidate_range_start and
      invalidate_range_end calls.
      
      Note that change_pte still cannot sleep after this patch, and that clients
      implementing change_pte should not take action on it in case the number of
      outstanding invalidate_range_start calls is larger than one, otherwise
      they might miss a later invalidation.
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Cc: Andrea Arcangeli <andrea@qumranet.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Liran Liss <liranl@mellanox.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6bdb913f
    • M
      mm: replace vma prio_tree with an interval tree · 6b2dbba8
      Michel Lespinasse 提交于
      Implement an interval tree as a replacement for the VMA prio_tree.  The
      algorithms are similar to lib/interval_tree.c; however that code can't be
      directly reused as the interval endpoints are not explicitly stored in the
      VMA.  So instead, the common algorithm is moved into a template and the
      details (node type, how to get interval endpoints from the node, etc) are
      filled in using the C preprocessor.
      
      Once the interval tree functions are available, using them as a
      replacement to the VMA prio tree is a relatively simple, mechanical job.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b2dbba8
  7. 08 10月, 2012 6 次提交
    • O
      uprobes: Fix the racy uprobe->flags manipulation · 71434f2f
      Oleg Nesterov 提交于
      Multiple threads can manipulate uprobe->flags, this is obviously
      unsafe. For example mmap can set UPROBE_COPY_INSN while register
      tries to set UPROBE_RUN_HANDLER, the latter can also race with
      can_skip_sstep() which clears UPROBE_SKIP_SSTEP.
      
      Change this code to use bitops.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      71434f2f
    • O
      uprobes: Fix prepare_uprobe() race with itself · 4710f05f
      Oleg Nesterov 提交于
      install_breakpoint() is called under mm->mmap_sem, this protects
      set_swbp() but not prepare_uprobe(). Two or more different tasks
      can call install_breakpoint()->prepare_uprobe() at the same time,
      this leads to numerous problems if UPROBE_COPY_INSN is not set.
      
      Just for example, the second copy_insn() can corrupt the already
      analyzed/fixuped uprobe->arch.insn and race with handle_swbp().
      
      This patch simply adds uprobe->copy_mutex to serialize this code.
      We could probably reuse ->consumer_rwsem, but this would mean that
      consumer->handler() can not use mm->mmap_sem, not good.
      
      Note: this is another temporary ugly hack until we move this logic
      into uprobe_register().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      4710f05f
    • O
      uprobes: Introduce prepare_uprobe() · cb9a19fe
      Oleg Nesterov 提交于
      Preparation. Extract the copy_insn/arch_uprobe_analyze_insn code
      from install_breakpoint() into the new helper, prepare_uprobe().
      
      And move uprobe->flags defines from uprobes.h to uprobes.c, nobody
      else can use them anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      cb9a19fe
    • O
      uprobes: Fix handle_swbp() vs unregister() + register() race · 142b18dd
      Oleg Nesterov 提交于
      Strictly speaking this race was added by me in 56bb4cf6. However
      I think that this bug is just another indication that we should
      move copy_insn/uprobe_analyze_insn code from install_breakpoint()
      to uprobe_register(), there are a lot of other reasons for that.
      Until then, add a hack to close the race.
      
      A task can hit uprobe U1, but before it calls find_uprobe() this
      uprobe can be unregistered *AND* another uprobe U2 can be added to
      uprobes_tree at the same inode/offset. In this case handle_swbp()
      will use the not-fully-initialized U2, in particular its arch.insn
      for xol.
      
      Add the additional !UPROBE_COPY_INSN check into handle_swbp(),
      if this flag is not set we simply restart as if the new uprobe was
      not inserted yet. This is not very nice, we need barriers, but we
      will remove this hack when we change uprobe_register().
      
      Note: with or without this patch install_breakpoint() can race with
      itself, yet another reson to kill UPROBE_COPY_INSN altogether. And
      even the usage of uprobe->flags is not safe. See the next patches.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      142b18dd
    • O
      uprobes: Do not delete uprobe if uprobe_unregister() fails · 076a365b
      Oleg Nesterov 提交于
      delete_uprobe() must not be called if register_for_each_vma(false)
      fails to remove all breakpoints, __uprobe_unregister() is correct.
      The problem is that register_for_each_vma(false) always returns 0
      and thus this logic does not work.
      
      1. Change verify_opcode() to return 0 rather than -EINVAL when
         unregister detects the !is_swbp insn, we can treat this case
         as success and currently unregister paths ignore the error
         code anyway.
      
      2. Change remove_breakpoint() to propagate the error code from
         write_opcode().
      
      3. Change register_for_each_vma(is_register => false) to remove
         as much breakpoints as possible but return non-zero if
         remove_breakpoint() fails at least once.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      076a365b
    • O
      uprobes: Don't return success if alloc_uprobe() fails · a5f658b7
      Oleg Nesterov 提交于
      If alloc_uprobe() fails uprobe_register() should return ENOMEM, not 0.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      a5f658b7
  8. 30 9月, 2012 12 次提交
  9. 15 9月, 2012 5 次提交
  10. 29 8月, 2012 8 次提交
    • O
      uprobes: Remove "verify" argument from set_orig_insn() · ded86e7c
      Oleg Nesterov 提交于
      Nobody does set_orig_insn(verify => false), and I think nobody will.
      Remove this argument. IIUC set_orig_insn(verify => false) was needed
      to single-step without xol area.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      ded86e7c
    • O
      uprobes: Fold uprobe_reset_state() into uprobe_dup_mmap() · 61559a81
      Oleg Nesterov 提交于
      Now that we have uprobe_dup_mmap() we can fold uprobe_reset_state()
      into the new hook and remove it. mmput()->uprobe_clear_state() can't
      be called before dup_mmap().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      61559a81
    • O
      uprobes: Introduce MMF_HAS_UPROBES · f8ac4ec9
      Oleg Nesterov 提交于
      Add the new MMF_HAS_UPROBES flag. It is set by install_breakpoint()
      and it is copied by dup_mmap(), uprobe_pre_sstep_notifier() checks
      it to avoid the slow path if the task was never probed. Perhaps it
      makes sense to check it in valid_vma(is_register => false) as well.
      
      This needs the new dup_mmap()->uprobe_dup_mmap() hook. We can't use
      uprobe_reset_state() or put MMF_HAS_UPROBES into MMF_INIT_MASK, we
      need oldmm->mmap_sem to avoid the race with uprobe_register() or
      mmap() from another thread.
      
      Currently we never clear this bit, it can be false-positive after
      uprobe_unregister() or uprobe_munmap() or if dup_mmap() hits the
      probed VM_DONTCOPY vma. But this is fine correctness-wise and has
      no effect unless the task hits the non-uprobe breakpoint.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      f8ac4ec9
    • O
      uprobes: Do not use -EEXIST in install_breakpoint() paths · 78f74116
      Oleg Nesterov 提交于
      -EEXIST from install_breakpoint() no longer makes sense, all
      callers should simply treat it as "success". Change the code
      to return zero and simplify register_for_each_vma().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      78f74116
    • O
      uprobes: Change uprobe_mmap() to ignore the errors but check fatal_signal_pending() · 5e5be71a
      Oleg Nesterov 提交于
      Once install_breakpoint() fails uprobe_mmap() "ignores" all other
      uprobes and returns the error.
      
      It was never really needed to to stop after the first error, and
      in fact it was always wrong at least in -ENOTSUPP case.
      
      Change uprobe_mmap() to ignore the errors and always return 0.
      This is not what we want in the long term, but until we teach
      the callers to handle the failure it would be better to remove
      the pointless complications. And this doesn't look too bad, the
      only "reasonable" error is ENOMEM but in this case the caller
      should be oom-killed in the likely case or the system has more
      serious problems.
      
      However it makes sense to stop if fatal_signal_pending() == T.
      In particular this helps if the task was oom-killed.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      5e5be71a
    • O
      uprobes: Kill dup_mmap()->uprobe_mmap(), simplify uprobe_mmap/munmap · f1a45d02
      Oleg Nesterov 提交于
      1. Kill dup_mmap()->uprobe_mmap(), it was only needed to calculate
         new_mm->uprobes_state.count removed by the previous patch.
      
         If the forking process has a pending uprobe (int3) in vma, it will
         be copied by copy_page_range(), note that it checks vma->anon_vma
         so "Don't copy ptes" is not possible after install_breakpoint()
         which does anon_vma_prepare().
      
      2. Remove is_swbp_at_addr() and "int count" in uprobe_mmap(). Again,
         this was needed for uprobes_state.count.
      
         As a side effect this fixes the bug pointed out by Srikar,
         this code lacked the necessary put_uprobe().
      
      3. uprobe_munmap() becomes a nop after the previous patch. Remove the
         meaningless code but do not remove the helper, we will need it.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      f1a45d02
    • O
      uprobes: Kill uprobes_state->count · 647c42df
      Oleg Nesterov 提交于
      uprobes_state->count is only needed to avoid the slow path in
      uprobe_pre_sstep_notifier(). It is also checked in uprobe_munmap()
      but ironically its only goal to decrement this counter. However,
      it is very broken. Just some examples:
      
      - uprobe_mmap() can race with uprobe_unregister() and wrongly
        increment the counter if it hits the non-uprobe "int3". Note
        that install_breakpoint() checks ->consumers first and returns
        -EEXIST if it is NULL.
      
        "atomic_sub() if error" in uprobe_mmap() looks obviously wrong
        too.
      
      - uprobe_munmap() can race with uprobe_register() and wrongly
        decrement the counter by the same reason.
      
      - Suppose an appication tries to increase the mmapped area via
        sys_mremap(). vma_adjust() does uprobe_munmap(whole_vma) first,
        this can nullify the counter temporarily and race with another
        thread which can hit the bp, the application will be killed by
        SIGTRAP.
      
      - Suppose an application mmaps 2 consecutive areas in the same file
        and one (or both) of these areas has uprobes. In the likely case
        mmap_region()->vma_merge() suceeds. Like above, this leads to
        uprobe_munmap/uprobe_mmap from vma_merge()->vma_adjust() but then
        mmap_region() does another uprobe_mmap(resulting_vma) and doubles
        the counter.
      
      This patch only removes this counter and fixes the compile errors,
      then we will try to cleanup the changed code and add something else
      instead.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      647c42df
    • S
      uprobes: Remove check for uprobe variable in handle_swbp() · 8bd87445
      Sebastian Andrzej Siewior 提交于
      by the time we get here (after we pass cleanup_ret) uprobe is always is
      set. If it is NULL we leave very early in the code.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      8bd87445