1. 06 2月, 2018 1 次提交
  2. 24 1月, 2018 2 次提交
    • S
      tracing: Update stack trace skipping for ORC unwinder · 2ee5b92a
      Steven Rostedt (VMware) 提交于
      With the addition of ORC unwinder and FRAME POINTER unwinder, the stack
      trace skipping requirements have changed.
      
      I went through the tracing stack trace dumps with ORC and with frame
      pointers and recalculated the proper values.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2ee5b92a
    • S
      ftrace, orc, x86: Handle ftrace dynamically allocated trampolines · 6be7fa3c
      Steven Rostedt (VMware) 提交于
      The function tracer can create a dynamically allocated trampoline that is
      called by the function mcount or fentry hook that is used to call the
      function callback that is registered. The problem is that the orc undwinder
      will bail if it encounters one of these trampolines. This breaks the stack
      trace of function callbacks, which include the stack tracer and setting the
      stack trace for individual functions.
      
      Since these dynamic trampolines are basically copies of the static ftrace
      trampolines defined in ftrace_*.S, we do not need to create new orc entries
      for the dynamic trampolines. Finding the return address on the stack will be
      identical as the functions that were copied to create the dynamic
      trampolines. When encountering a ftrace dynamic trampoline, we can just use
      the orc entry of the ftrace static function that was copied for that
      trampoline.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6be7fa3c
  3. 19 1月, 2018 2 次提交
    • S
      tracing: Fix converting enum's from the map in trace_event_eval_update() · 1ebe1eaf
      Steven Rostedt (VMware) 提交于
      Since enums do not get converted by the TRACE_EVENT macro into their values,
      the event format displaces the enum name and not the value. This breaks
      tools like perf and trace-cmd that need to interpret the raw binary data. To
      solve this, an enum map was created to convert these enums into their actual
      numbers on boot up. This is done by TRACE_EVENTS() adding a
      TRACE_DEFINE_ENUM() macro.
      
      Some enums were not being converted. This was caused by an optization that
      had a bug in it.
      
      All calls get checked against this enum map to see if it should be converted
      or not, and it compares the call's system to the system that the enum map
      was created under. If they match, then they call is processed.
      
      To cut down on the number of iterations needed to find the maps with a
      matching system, since calls and maps are grouped by system, when a match is
      made, the index into the map array is saved, so that the next call, if it
      belongs to the same system as the previous call, could start right at that
      array index and not have to scan all the previous arrays.
      
      The problem was, the saved index was used as the variable to know if this is
      a call in a new system or not. If the index was zero, it was assumed that
      the call is in a new system and would keep incrementing the saved index
      until it found a matching system. The issue arises when the first matching
      system was at index zero. The next map, if it belonged to the same system,
      would then think it was the first match and increment the index to one. If
      the next call belong to the same system, it would begin its search of the
      maps off by one, and miss the first enum that should be converted. This left
      a single enum not converted properly.
      
      Also add a comment to describe exactly what that index was for. It took me a
      bit too long to figure out what I was thinking when debugging this issue.
      
      Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com
      
      Cc: stable@vger.kernel.org
      Fixes: 0c564a53 ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values")
      Reported-by: NChuck Lever <chuck.lever@oracle.com>
      Teste-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1ebe1eaf
    • S
      ring-buffer: Fix duplicate results in mapping context to bits in recursive lock · 0164e0d7
      Steven Rostedt (VMware) 提交于
      In bringing back the context checks, the code checks first if its normal
      (non-interrupt) context, and then for NMI then IRQ then softirq. The final
      check is redundant. Since the if branch is only hit if the context is one of
      NMI, IRQ, or SOFTIRQ, if it's not NMI or IRQ there's no reason to check if
      it is SOFTIRQ. The current code returns the same result even if its not a
      SOFTIRQ. Which is confusing.
      
        pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ
      
      Is redundant as RB_CTX_SOFTIRQ *is* 2!
      
      Fixes: a0e3a18f ("ring-buffer: Bring back context level recursive checks")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0164e0d7
  4. 16 1月, 2018 2 次提交
    • R
      tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y · 68e76e03
      Randy Dunlap 提交于
      I regularly get 50 MB - 60 MB files during kernel randconfig builds.
      These large files mostly contain (many repeats of; e.g., 124,594):
      
      In file included from ../include/linux/string.h:6:0,
                       from ../include/linux/uuid.h:20,
                       from ../include/linux/mod_devicetable.h:13,
                       from ../scripts/mod/devicetable-offsets.c:3:
      ../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
          ______f = {     \
          ^
      ../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
                             ^
      ../include/linux/string.h:425:2: note: in expansion of macro 'if'
        if (p_size == (size_t)-1 && q_size == (size_t)-1)
        ^
      
      This only happens when CONFIG_FORTIFY_SOURCE=y and
      CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
      FORTIFY_SOURCE=y.
      
      Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      68e76e03
    • S
      ring-buffer: Bring back context level recursive checks · a0e3a18f
      Steven Rostedt (VMware) 提交于
      Commit 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
      simpler") replaced the context level recursion checks with a simple counter.
      This would prevent the ring buffer code from recursively calling itself more
      than the max number of contexts that exist (Normal, softirq, irq, nmi). But
      this change caused a lockup in a specific case, which was during suspend and
      resume using a global clock. Adding a stack dump to see where this occurred,
      the issue was in the trace global clock itself:
      
        trace_buffer_lock_reserve+0x1c/0x50
        __trace_graph_entry+0x2d/0x90
        trace_graph_entry+0xe8/0x200
        prepare_ftrace_return+0x69/0xc0
        ftrace_graph_caller+0x78/0xa8
        queued_spin_lock_slowpath+0x5/0x1d0
        trace_clock_global+0xb0/0xc0
        ring_buffer_lock_reserve+0xf9/0x390
      
      The function graph tracer traced queued_spin_lock_slowpath that was called
      by trace_clock_global. This pointed out that the trace_clock_global() is not
      reentrant, as it takes a spin lock. It depended on the ring buffer recursive
      lock from letting that happen.
      
      By removing the context detection and adding just a max number of allowable
      recursions, it allowed the trace_clock_global() to be entered again and try
      to retake the spinlock it already held, causing a deadlock.
      
      Fixes: 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
      Reported-by: NDavid Weinehall <david.weinehall@gmail.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a0e3a18f
  5. 28 12月, 2017 5 次提交
    • S
      tracing: Fix possible double free on failure of allocating trace buffer · 4397f045
      Steven Rostedt (VMware) 提交于
      Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
      tracing buffer, memory is freed, but the pointers that point to them are not
      initialized back to NULL, and later paths may try to free the freed memory
      again. Jing and Chunyan fixed one of the locations that does this, but
      missed a spot.
      
      Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
      
      Cc: stable@vger.kernel.org
      Fixes: 737223fb ("tracing: Consolidate buffer allocation code")
      Reported-by: NJing Xia <jing.xia@spreadtrum.com>
      Reported-by: NChunyan Zhang <chunyan.zhang@spreadtrum.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      4397f045
    • J
      tracing: Fix crash when it fails to alloc ring buffer · 24f2aaf9
      Jing Xia 提交于
      Double free of the ring buffer happens when it fails to alloc new
      ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
      The root cause is that the pointer is not set to NULL after the buffer
      is freed in allocate_trace_buffers(), and the freeing of the ring
      buffer is invoked again later if the pointer is not equal to Null,
      as:
      
      instance_mkdir()
          |-allocate_trace_buffers()
              |-allocate_trace_buffer(tr, &tr->trace_buffer...)
      	|-allocate_trace_buffer(tr, &tr->max_buffer...)
      
                // allocate fail(-ENOMEM),first free
                // and the buffer pointer is not set to null
              |-ring_buffer_free(tr->trace_buffer.buffer)
      
             // out_free_tr
          |-free_trace_buffers()
              |-free_trace_buffer(&tr->trace_buffer);
      
      	      //if trace_buffer is not null, free again
      	    |-ring_buffer_free(buf->buffer)
                      |-rb_free_cpu_buffer(buffer->buffers[cpu])
                          // ring_buffer_per_cpu is null, and
                          // crash in ring_buffer_per_cpu->pages
      
      Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
      
      Cc: stable@vger.kernel.org
      Fixes: 737223fb ("tracing: Consolidate buffer allocation code")
      Signed-off-by: NJing Xia <jing.xia@spreadtrum.com>
      Signed-off-by: NChunyan Zhang <chunyan.zhang@spreadtrum.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      24f2aaf9
    • S
      ring-buffer: Do no reuse reader page if still in use · ae415fa4
      Steven Rostedt (VMware) 提交于
      To free the reader page that is allocated with ring_buffer_alloc_read_page(),
      ring_buffer_free_read_page() must be called. For faster performance, this
      page can be reused by the ring buffer to avoid having to free and allocate
      new pages.
      
      The issue arises when the page is used with a splice pipe into the
      networking code. The networking code may up the page counter for the page,
      and keep it active while sending it is queued to go to the network. The
      incrementing of the page ref does not prevent it from being reused in the
      ring buffer, and this can cause the page that is being sent out to the
      network to be modified before it is sent by reading new data.
      
      Add a check to the page ref counter, and only reuse the page if it is not
      being used anywhere else.
      
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ae415fa4
    • S
      tracing: Remove extra zeroing out of the ring buffer page · 6b7e633f
      Steven Rostedt (VMware) 提交于
      The ring_buffer_read_page() takes care of zeroing out any extra data in the
      page that it returns. There's no need to zero it out again from the
      consumer. It was removed from one consumer of this function, but
      read_buffers_splice_read() did not remove it, and worse, it contained a
      nasty bug because of it.
      
      Cc: stable@vger.kernel.org
      Fixes: 2711ca23 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6b7e633f
    • S
      ring-buffer: Mask out the info bits when returning buffer page length · 45d8b80c
      Steven Rostedt (VMware) 提交于
      Two info bits were added to the "commit" part of the ring buffer data page
      when returned to be consumed. This was to inform the user space readers that
      events have been missed, and that the count may be stored at the end of the
      page.
      
      What wasn't handled, was the splice code that actually called a function to
      return the length of the data in order to zero out the rest of the page
      before sending it up to user space. These data bits were returned with the
      length making the value negative, and that negative value was not checked.
      It was compared to PAGE_SIZE, and only used if the size was less than
      PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
      unsigned compare, meaning the negative size value did not end up causing a
      large portion of memory to be randomly zeroed out.
      
      Cc: stable@vger.kernel.org
      Fixes: 66a8cb95 ("ring-buffer: Add place holder recording of dropped events")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      45d8b80c
  6. 15 12月, 2017 1 次提交
  7. 13 12月, 2017 1 次提交
    • D
      bpf: fix corruption on concurrent perf_event_output calls · 283ca526
      Daniel Borkmann 提交于
      When tracing and networking programs are both attached in the
      system and both use event-output helpers that eventually call
      into perf_event_output(), then we could end up in a situation
      where the tracing attached program runs in user context while
      a cls_bpf program is triggered on that same CPU out of softirq
      context.
      
      Since both rely on the same per-cpu perf_sample_data, we could
      potentially corrupt it. This can only ever happen in a combination
      of the two types; all tracing programs use a bpf_prog_active
      counter to bail out in case a program is already running on
      that CPU out of a different context. XDP and cls_bpf programs
      by themselves don't have this issue as they run in the same
      context only. Therefore, split both perf_sample_data so they
      cannot be accessed from each other.
      
      Fixes: 20b9d7ac ("bpf: avoid excessive stack usage for perf_sample_data")
      Reported-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      283ca526
  8. 05 12月, 2017 1 次提交
  9. 04 12月, 2017 5 次提交
  10. 01 12月, 2017 1 次提交
  11. 28 11月, 2017 2 次提交
    • J
      blktrace: fix trace mutex deadlock · 2967acbb
      Jens Axboe 提交于
      A previous commit changed the locking around registration/cleanup,
      but direct callers of blk_trace_remove() were missed. This means
      that if we hit the error path in setup, we will deadlock on
      attempting to re-acquire the queue trace mutex.
      
      Fixes: 1f2cac10 ("blktrace: fix unlocked access to init/start-stop/teardown")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2967acbb
    • P
      rcu: Eliminate rcu_irq_enter_disabled() · 844ccdd7
      Paul E. McKenney 提交于
      Now that the irq path uses the rcu_nmi_{enter,exit}() algorithm,
      rcu_irq_enter() and rcu_irq_exit() may be used from any context.  There is
      thus no need for rcu_irq_enter_disabled() and for the checks using it.
      This commit therefore eliminates rcu_irq_enter_disabled().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      844ccdd7
  12. 23 11月, 2017 3 次提交
    • G
      bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO · a60dd35d
      Gianluca Borello 提交于
      Commit 9fd29c08 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
      semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
      the compiler generates optimized BPF code when checking boundaries of an
      argument from C code. A typical example of this optimized code can be
      generated using the bpf_perf_event_output helper when operating on variable
      memory:
      
      /* len is a generic scalar */
      if (len > 0 && len <= 0x7fff)
              bpf_perf_event_output(ctx, &perf_map, 0, buf, len);
      
      110: (79) r5 = *(u64 *)(r10 -40)
      111: (bf) r1 = r5
      112: (07) r1 += -1
      113: (25) if r1 > 0x7ffe goto pc+6
      114: (bf) r1 = r6
      115: (18) r2 = 0xffff94e5f166c200
      117: (b7) r3 = 0
      118: (bf) r4 = r7
      119: (85) call bpf_perf_event_output#25
      R5 min value is negative, either use unsigned or 'var &= const'
      
      With this code, the verifier loses track of the variable.
      
      Replacing arg5 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
      avoids this quite common case which leads to usability issues, and the
      compiler generates code that the verifier can more easily test:
      
      if (len <= 0x7fff)
              bpf_perf_event_output(ctx, &perf_map, 0, buf, len);
      
      or
      
      bpf_perf_event_output(ctx, &perf_map, 0, buf, len & 0x7fff);
      
      No changes to the bpf_perf_event_output helper are necessary since it can
      handle a case where size is 0, and an empty frame is pushed.
      Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a60dd35d
    • G
      bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO · 5c4e1201
      Gianluca Borello 提交于
      Commit 9fd29c08 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
      semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
      the compiler generates optimized BPF code when checking boundaries of an
      argument from C code. A typical example of this optimized code can be
      generated using the bpf_probe_read_str helper when operating on variable
      memory:
      
      /* len is a generic scalar */
      if (len > 0 && len <= 0x7fff)
              bpf_probe_read_str(p, len, s);
      
      251: (79) r1 = *(u64 *)(r10 -88)
      252: (07) r1 += -1
      253: (25) if r1 > 0x7ffe goto pc-42
      254: (bf) r1 = r7
      255: (79) r2 = *(u64 *)(r10 -88)
      256: (bf) r8 = r4
      257: (85) call bpf_probe_read_str#45
      R2 min value is negative, either use unsigned or 'var &= const'
      
      With this code, the verifier loses track of the variable.
      
      Replacing arg2 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
      avoids this quite common case which leads to usability issues, and the
      compiler generates code that the verifier can more easily test:
      
      if (len <= 0x7fff)
              bpf_probe_read_str(p, len, s);
      
      or
      
      bpf_probe_read_str(p, len & 0x7fff, s);
      
      No changes to the bpf_probe_read_str helper are necessary since
      strncpy_from_unsafe itself immediately returns if the size passed is 0.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5c4e1201
    • G
      bpf: remove explicit handling of 0 for arg2 in bpf_probe_read · eb33f2cc
      Gianluca Borello 提交于
      Commit 9c019e2b ("bpf: change helper bpf_probe_read arg2 type to
      ARG_CONST_SIZE_OR_ZERO") changed arg2 type to ARG_CONST_SIZE_OR_ZERO to
      simplify writing bpf programs by taking advantage of the new semantics
      introduced for ARG_CONST_SIZE_OR_ZERO which allows <!NULL, 0> arguments.
      
      In order to prevent the helper from actually passing a NULL pointer to
      probe_kernel_read, which can happen when <NULL, 0> is passed to the helper,
      the commit also introduced an explicit check against size == 0.
      
      After the recent introduction of the ARG_PTR_TO_MEM_OR_NULL type,
      bpf_probe_read can not receive a pair of <NULL, 0> arguments anymore, thus
      the check is not needed anymore and can be removed, since probe_kernel_read
      can correctly handle a <!NULL, 0> call. This also fixes the semantics of
      the helper before it gets officially released and bpf programs start
      relying on this check.
      
      Fixes: 9c019e2b ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO")
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      eb33f2cc
  13. 20 11月, 2017 1 次提交
  14. 16 11月, 2017 1 次提交
  15. 14 11月, 2017 1 次提交
  16. 11 11月, 2017 4 次提交
    • D
      bpf: Revert bpf_overrid_function() helper changes. · f3edacbd
      David S. Miller 提交于
      NACK'd by x86 maintainer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3edacbd
    • J
      bpf: add a bpf_override_function helper · dd0bb688
      Josef Bacik 提交于
      Error injection is sloppy and very ad-hoc.  BPF could fill this niche
      perfectly with it's kprobe functionality.  We could make sure errors are
      only triggered in specific call chains that we care about with very
      specific situations.  Accomplish this with the bpf_override_funciton
      helper.  This will modify the probe'd callers return value to the
      specified value and set the PC to an override function that simply
      returns, bypassing the originally probed function.  This gives us a nice
      clean way to implement systematic error injection for all of our code
      paths.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd0bb688
    • J
      blktrace: fix unlocked registration of tracepoints · a6da0024
      Jens Axboe 提交于
      We need to ensure that tracepoints are registered and unregistered
      with the users of them. The existing atomic count isn't enough for
      that. Add a lock around the tracepoints, so we serialize access
      to them.
      
      This fixes cases where we have multiple users setting up and
      tearing down tracepoints, like this:
      
      CPU: 0 PID: 2995 Comm: syzkaller857118 Not tainted
      4.14.0-rc5-next-20171018+ #36
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:16 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:52
        panic+0x1e4/0x41c kernel/panic.c:183
        __warn+0x1c4/0x1e0 kernel/panic.c:546
        report_bug+0x211/0x2d0 lib/bug.c:183
        fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
        do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
        do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
        do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
        do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
        invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:tracepoint_add_func kernel/tracepoint.c:210 [inline]
      RIP: 0010:tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283
      RSP: 0018:ffff8801d1d1f6c0 EFLAGS: 00010293
      RAX: ffff8801d22e8540 RBX: 00000000ffffffef RCX: ffffffff81710f07
      RDX: 0000000000000000 RSI: ffffffff85b679c0 RDI: ffff8801d5f19818
      RBP: ffff8801d1d1f7c8 R08: ffffffff81710c10 R09: 0000000000000004
      R10: ffff8801d1d1f6b0 R11: 0000000000000003 R12: ffffffff817597f0
      R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8801d1d1f7a0
        tracepoint_probe_register+0x2a/0x40 kernel/tracepoint.c:304
        register_trace_block_rq_insert include/trace/events/block.h:191 [inline]
        blk_register_tracepoints+0x1e/0x2f0 kernel/trace/blktrace.c:1043
        do_blk_trace_setup+0xa10/0xcf0 kernel/trace/blktrace.c:542
        blk_trace_setup+0xbd/0x180 kernel/trace/blktrace.c:564
        sg_ioctl+0xc71/0x2d90 drivers/scsi/sg.c:1089
        vfs_ioctl fs/ioctl.c:45 [inline]
        do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
        SYSC_ioctl fs/ioctl.c:700 [inline]
        SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x444339
      RSP: 002b:00007ffe05bb5b18 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000006d66c0 RCX: 0000000000444339
      RDX: 000000002084cf90 RSI: 00000000c0481273 RDI: 0000000000000009
      RBP: 0000000000000082 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
      R13: 00000000c0481273 R14: 0000000000000000 R15: 0000000000000000
      
      since we can now run these in parallel. Ensure that the exported helpers
      for doing this are grabbing the queue trace mutex.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a6da0024
    • J
      blktrace: fix unlocked access to init/start-stop/teardown · 1f2cac10
      Jens Axboe 提交于
      sg.c calls into the blktrace functions without holding the proper queue
      mutex for doing setup, start/stop, or teardown.
      
      Add internal unlocked variants, and export the ones that do the proper
      locking.
      
      Fixes: 6da127ad ("blktrace: Add blktrace ioctls to SCSI generic devices")
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1f2cac10
  17. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  18. 01 11月, 2017 1 次提交
  19. 27 10月, 2017 2 次提交
    • G
      bpf: remove tail_call and get_stackid helper declarations from bpf.h · 035226b9
      Gianluca Borello 提交于
      commit afdb09c7 ("security: bpf: Add LSM hooks for bpf object related
      syscall") included linux/bpf.h in linux/security.h. As a result, bpf
      programs including bpf_helpers.h and some other header that ends up
      pulling in also security.h, such as several examples under samples/bpf,
      fail to compile because bpf_tail_call and bpf_get_stackid are now
      "redefined as different kind of symbol".
      
      >From bpf.h:
      
      u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
      u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
      
      Whereas in bpf_helpers.h they are:
      
      static void (*bpf_tail_call)(void *ctx, void *map, int index);
      static int (*bpf_get_stackid)(void *ctx, void *map, int flags);
      
      Fix this by removing the unused declaration of bpf_tail_call and moving
      the declaration of bpf_get_stackid in bpf_trace.c, which is the only
      place where it's needed.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      035226b9
    • Y
      perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf... · 7d9285e8
      Yonghong Song 提交于
      perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"
      
      eBPF programs would like access to the (perf) event enabled and
      running times along with the event value, such that they can deal with
      event multiplexing (among other things).
      
      This patch extends the interface; a future eBPF patch will utilize
      the new functionality.
      
      [ Note, there's a same-content commit with a poor changelog and a meaningless
        title in the networking tree as well - but we need this change for subsequent
        perf work, so apply it here as well, with a proper changelog. Hopefully Git
        will be able to sort out this somewhat messy workflow, if there are no other,
        conflicting changes to these files. ]
      Signed-off-by: NYonghong Song <yhs@fb.com>
      [ Rewrote the changelog. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <ast@fb.com>
      Cc: <daniel@iogearbox.net>
      Cc: <rostedt@goodmis.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David S. Miller <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20171005161923.332790-2-yhs@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7d9285e8
  20. 25 10月, 2017 2 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
    • Y
      bpf: permit multiple bpf attachments for a single perf event · e87c6bc3
      Yonghong Song 提交于
      This patch enables multiple bpf attachments for a
      kprobe/uprobe/tracepoint single trace event.
      Each trace_event keeps a list of attached perf events.
      When an event happens, all attached bpf programs will
      be executed based on the order of attachment.
      
      A global bpf_event_mutex lock is introduced to protect
      prog_array attaching and detaching. An alternative will
      be introduce a mutex lock in every trace_event_call
      structure, but it takes a lot of extra memory.
      So a global bpf_event_mutex lock is a good compromise.
      
      The bpf prog detachment involves allocation of memory.
      If the allocation fails, a dummy do-nothing program
      will replace to-be-detached program in-place.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e87c6bc3
  21. 18 10月, 2017 1 次提交