1. 26 11月, 2022 3 次提交
  2. 26 10月, 2022 4 次提交
  3. 05 9月, 2022 1 次提交
    • Y
      ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead · ddffe882
      Yang Jihong 提交于
      commit c3b0f72e upstream.
      
      ftrace_startup does not remove ops from ftrace_ops_list when
      ftrace_startup_enable fails:
      
      register_ftrace_function
        ftrace_startup
          __register_ftrace_function
            ...
            add_ftrace_ops(&ftrace_ops_list, ops)
            ...
          ...
          ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
          ...
        return 0 // ops is in the ftrace_ops_list.
      
      When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
      unregister_ftrace_function
        ftrace_shutdown
          if (unlikely(ftrace_disabled))
                  return -ENODEV;  // return here, __unregister_ftrace_function is not executed,
                                   // as a result, ops is still in the ftrace_ops_list
          __unregister_ftrace_function
          ...
      
      If ops is dynamically allocated, it will be free later, in this case,
      is_ftrace_trampoline accesses NULL pointer:
      
      is_ftrace_trampoline
        ftrace_ops_trampoline
          do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!
      
      Syzkaller reports as follows:
      [ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b
      [ 1203.508039] #PF: supervisor read access in kernel mode
      [ 1203.508798] #PF: error_code(0x0000) - not-present page
      [ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0
      [ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI
      [ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G    B   W         5.10.0 #8
      [ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0
      [ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00
      [ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246
      [ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866
      [ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b
      [ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07
      [ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399
      [ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008
      [ 1203.525634] FS:  00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000
      [ 1203.526801] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0
      [ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Therefore, when ftrace_startup_enable fails, we need to rollback registration
      process and remove ops from ftrace_ops_list.
      
      Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.comSuggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddffe882
  4. 25 8月, 2022 2 次提交
  5. 22 7月, 2022 1 次提交
    • Z
      tracing/histograms: Fix memory leak problem · eb622d55
      Zheng Yejian 提交于
      commit 7edc3945 upstream.
      
      This reverts commit 46bbe5c6.
      
      As commit 46bbe5c6 ("tracing: fix double free") said, the
      "double free" problem reported by clang static analyzer is:
        > In parse_var_defs() if there is a problem allocating
        > var_defs.expr, the earlier var_defs.name is freed.
        > This free is duplicated by free_var_defs() which frees
        > the rest of the list.
      
      However, if there is a problem allocating N-th var_defs.expr:
        + in parse_var_defs(), the freed 'earlier var_defs.name' is
          actually the N-th var_defs.name;
        + then in free_var_defs(), the names from 0th to (N-1)-th are freed;
      
                              IF ALLOCATING PROBLEM HAPPENED HERE!!! -+
                                                                       \
                                                                        |
                0th           1th                 (N-1)-th      N-th    V
                +-------------+-------------+-----+-------------+-----------
      var_defs: | name | expr | name | expr | ... | name | expr | name | ///
                +-------------+-------------+-----+-------------+-----------
      
      These two frees don't act on same name, so there was no "double free"
      problem before. Conversely, after that commit, we get a "memory leak"
      problem because the above "N-th var_defs.name" is not freed.
      
      If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th
      var_defs.expr allocated, then execute on shell like:
        $ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \
      /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
      
      Then kmemleak reports:
        unreferenced object 0xffff8fb100ef3518 (size 8):
          comm "bash", pid 196, jiffies 4295681690 (age 28.538s)
          hex dump (first 8 bytes):
            76 31 00 00 b1 8f ff ff                          v1......
          backtrace:
            [<0000000038fe4895>] kstrdup+0x2d/0x60
            [<00000000c99c049a>] event_hist_trigger_parse+0x206f/0x20e0
            [<00000000ae70d2cc>] trigger_process_regex+0xc0/0x110
            [<0000000066737a4c>] event_trigger_write+0x75/0xd0
            [<000000007341e40c>] vfs_write+0xbb/0x2a0
            [<0000000087fde4c2>] ksys_write+0x59/0xd0
            [<00000000581e9cdf>] do_syscall_64+0x3a/0x80
            [<00000000cf3b065c>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Link: https://lkml.kernel.org/r/20220711014731.69520-1-zhengyejian1@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: 46bbe5c6 ("tracing: fix double free")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb622d55
  6. 14 6月, 2022 3 次提交
  7. 27 4月, 2022 1 次提交
  8. 16 3月, 2022 1 次提交
  9. 09 3月, 2022 1 次提交
    • S
      tracing/histogram: Fix sorting on old "cpu" value · 44a99561
      Steven Rostedt (Google) 提交于
      commit 1d1898f6 upstream.
      
      When trying to add a histogram against an event with the "cpu" field, it
      was impossible due to "cpu" being a keyword to key off of the running CPU.
      So to fix this, it was changed to "common_cpu" to match the other generic
      fields (like "common_pid"). But since some scripts used "cpu" for keying
      off of the CPU (for events that did not have "cpu" as a field, which is
      most of them), a backward compatibility trick was added such that if "cpu"
      was used as a key, and the event did not have "cpu" as a field name, then
      it would fallback and switch over to "common_cpu".
      
      This fix has a couple of subtle bugs. One was that when switching over to
      "common_cpu", it did not change the field name, it just set a flag. But
      the code still found a "cpu" field. The "cpu" field is used for filtering
      and is returned when the event does not have a "cpu" field.
      
      This was found by:
      
        # cd /sys/kernel/tracing
        # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
        # cat events/sched/sched_wakeup/hist
      
      Which showed the histogram unsorted:
      
      { cpu:         19, pid:       1175 } hitcount:          1
      { cpu:          6, pid:        239 } hitcount:          2
      { cpu:         23, pid:       1186 } hitcount:         14
      { cpu:         12, pid:        249 } hitcount:          2
      { cpu:          3, pid:        994 } hitcount:          5
      
      Instead of hard coding the "cpu" checks, take advantage of the fact that
      trace_event_field_field() returns a special field for "cpu" and "CPU" if
      the event does not have "cpu" as a field. This special field has the
      "filter_type" of "FILTER_CPU". Check that to test if the returned field is
      of the CPU type instead of doing the string compare.
      
      Also, fix the sorting bug by testing for the hist_field flag of
      HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
      the special CPU field to know what compare routine to use, and since that
      special field does not have a size, it returns tracing_map_cmp_none.
      
      Cc: stable@vger.kernel.org
      Fixes: 1e3bac71 ("tracing/histogram: Rename "cpu" to "common_cpu"")
      Reported-by: NDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44a99561
  10. 02 3月, 2022 1 次提交
  11. 23 2月, 2022 1 次提交
  12. 11 1月, 2022 2 次提交
  13. 22 12月, 2021 1 次提交
    • C
      tracing: Fix a kmemleak false positive in tracing_map · a501f250
      Chen Jun 提交于
      [ Upstream commit f25667e5 ]
      
      Doing the command:
        echo 'hist:key=common_pid.execname,common_timestamp' > /sys/kernel/debug/tracing/events/xxx/trigger
      
      Triggers many kmemleak reports:
      
      unreferenced object 0xffff0000c7ea4980 (size 128):
        comm "bash", pid 338, jiffies 4294912626 (age 9339.324s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0
          [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178
          [<00000000633bd154>] tracing_map_init+0x1f8/0x268
          [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0
          [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128
          [<00000000f549355a>] event_trigger_write+0x7c/0x120
          [<00000000b80f898d>] vfs_write+0xc4/0x380
          [<00000000823e1055>] ksys_write+0x74/0xf8
          [<000000008a9374aa>] __arm64_sys_write+0x24/0x30
          [<0000000087124017>] do_el0_svc+0x88/0x1c0
          [<00000000efd0dcd1>] el0_svc+0x1c/0x28
          [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0
          [<00000000e7399680>] el0_sync+0x148/0x180
      unreferenced object 0xffff0000c7ea4980 (size 128):
        comm "bash", pid 338, jiffies 4294912626 (age 9339.324s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0
          [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178
          [<00000000633bd154>] tracing_map_init+0x1f8/0x268
          [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0
          [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128
          [<00000000f549355a>] event_trigger_write+0x7c/0x120
          [<00000000b80f898d>] vfs_write+0xc4/0x380
          [<00000000823e1055>] ksys_write+0x74/0xf8
          [<000000008a9374aa>] __arm64_sys_write+0x24/0x30
          [<0000000087124017>] do_el0_svc+0x88/0x1c0
          [<00000000efd0dcd1>] el0_svc+0x1c/0x28
          [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0
          [<00000000e7399680>] el0_sync+0x148/0x180
      
      The reason is elts->pages[i] is alloced by get_zeroed_page.
      and kmemleak will not scan the area alloced by get_zeroed_page.
      The address stored in elts->pages will be regarded as leaked.
      
      That is, the elts->pages[i] will have pointers loaded onto it as well, and
      without telling kmemleak about it, those pointers will look like memory
      without a reference.
      
      To fix this, call kmemleak_alloc to tell kmemleak to scan elts->pages[i]
      
      Link: https://lkml.kernel.org/r/20211124140801.87121-1-chenjun102@huawei.comSigned-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a501f250
  14. 01 12月, 2021 2 次提交
  15. 26 11月, 2021 1 次提交
  16. 27 10月, 2021 1 次提交
    • S
      tracing: Have all levels of checks prevent recursion · 3de1ed12
      Steven Rostedt (VMware) 提交于
      commit ed65df63 upstream.
      
      While writing an email explaining the "bit = 0" logic for a discussion on
      making ftrace_test_recursion_trylock() disable preemption, I discovered a
      path that makes the "not do the logic if bit is zero" unsafe.
      
      The recursion logic is done in hot paths like the function tracer. Thus,
      any code executed causes noticeable overhead. Thus, tricks are done to try
      to limit the amount of code executed. This included the recursion testing
      logic.
      
      Having recursion testing is important, as there are many paths that can
      end up in an infinite recursion cycle when tracing every function in the
      kernel. Thus protection is needed to prevent that from happening.
      
      Because it is OK to recurse due to different running context levels (e.g.
      an interrupt preempts a trace, and then a trace occurs in the interrupt
      handler), a set of bits are used to know which context one is in (normal,
      softirq, irq and NMI). If a recursion occurs in the same level, it is
      prevented*.
      
      Then there are infrastructure levels of recursion as well. When more than
      one callback is attached to the same function to trace, it calls a loop
      function to iterate over all the callbacks. Both the callbacks and the
      loop function have recursion protection. The callbacks use the
      "ftrace_test_recursion_trylock()" which has a "function" set of context
      bits to test, and the loop function calls the internal
      trace_test_and_set_recursion() directly, with an "internal" set of bits.
      
      If an architecture does not implement all the features supported by ftrace
      then the callbacks are never called directly, and the loop function is
      called instead, which will implement the features of ftrace.
      
      Since both the loop function and the callbacks do recursion protection, it
      was seemed unnecessary to do it in both locations. Thus, a trick was made
      to have the internal set of recursion bits at a more significant bit
      location than the function bits. Then, if any of the higher bits were set,
      the logic of the function bits could be skipped, as any new recursion
      would first have to go through the loop function.
      
      This is true for architectures that do not support all the ftrace
      features, because all functions being traced must first go through the
      loop function before going to the callbacks. But this is not true for
      architectures that support all the ftrace features. That's because the
      loop function could be called due to two callbacks attached to the same
      function, but then a recursion function inside the callback could be
      called that does not share any other callback, and it will be called
      directly.
      
      i.e.
      
       traced_function_1: [ more than one callback tracing it ]
         call loop_func
      
       loop_func:
         trace_recursion set internal bit
         call callback
      
       callback:
         trace_recursion [ skipped because internal bit is set, return 0 ]
         call traced_function_2
      
       traced_function_2: [ only traced by above callback ]
         call callback
      
       callback:
         trace_recursion [ skipped because internal bit is set, return 0 ]
         call traced_function_2
      
       [ wash, rinse, repeat, BOOM! out of shampoo! ]
      
      Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
      call for all functions.
      
      Since we want to encourage architectures to implement all ftrace features,
      having them slow down due to this extra logic may encourage the
      maintainers to update to the latest ftrace features. And because this
      logic is only safe for them, remove it completely.
      
       [*] There is on layer of recursion that is allowed, and that is to allow
           for the transition between interrupt context (normal -> softirq ->
           irq -> NMI), because a trace may occur before the context update is
           visible to the trace recursion logic.
      
      Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
      Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Jisheng Zhang <jszhang@kernel.org>
      Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: edc15caf ("tracing: Avoid unnecessary multiple recursion checks")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3de1ed12
  17. 06 10月, 2021 1 次提交
    • Z
      blktrace: Fix uaf in blk_trace access after removing by sysfs · 677e362b
      Zhihao Cheng 提交于
      [ Upstream commit 5afedf67 ]
      
      There is an use-after-free problem triggered by following process:
      
            P1(sda)				P2(sdb)
      			echo 0 > /sys/block/sdb/trace/enable
      			  blk_trace_remove_queue
      			    synchronize_rcu
      			    blk_trace_free
      			      relay_close
      rcu_read_lock
      __blk_add_trace
        trace_note_tsk
        (Iterate running_trace_list)
      			        relay_close_buf
      				  relay_destroy_buf
      				    kfree(buf)
          trace_note(sdb's bt)
            relay_reserve
              buf->offset <- nullptr deference (use-after-free) !!!
      rcu_read_unlock
      
      [  502.714379] BUG: kernel NULL pointer dereference, address:
      0000000000000010
      [  502.715260] #PF: supervisor read access in kernel mode
      [  502.715903] #PF: error_code(0x0000) - not-present page
      [  502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0
      [  502.717252] Oops: 0000 [#1] SMP
      [  502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360
      [  502.732872] Call Trace:
      [  502.733193]  __blk_add_trace.cold+0x137/0x1a3
      [  502.733734]  blk_add_trace_rq+0x7b/0xd0
      [  502.734207]  blk_add_trace_rq_issue+0x54/0xa0
      [  502.734755]  blk_mq_start_request+0xde/0x1b0
      [  502.735287]  scsi_queue_rq+0x528/0x1140
      ...
      [  502.742704]  sg_new_write.isra.0+0x16e/0x3e0
      [  502.747501]  sg_ioctl+0x466/0x1100
      
      Reproduce method:
        ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
        ioctl(/dev/sda, BLKTRACESTART)
        ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
        ioctl(/dev/sdb, BLKTRACESTART)
      
        echo 0 > /sys/block/sdb/trace/enable &
        // Add delay(mdelay/msleep) before kernel enters blk_trace_free()
      
        ioctl$SG_IO(/dev/sda, SG_IO, ...)
        // Enters trace_note_tsk() after blk_trace_free() returned
        // Use mdelay in rcu region rather than msleep(which may schedule out)
      
      Remove blk_trace from running_list before calling blk_trace_free() by
      sysfs if blk_trace is at Blktrace_running state.
      
      Fixes: c71a8961 ("blktrace: add ftrace plugin")
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      677e362b
  18. 26 9月, 2021 1 次提交
  19. 26 8月, 2021 1 次提交
    • S
      tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name · 2d349b0a
      Steven Rostedt (VMware) 提交于
      [ Upstream commit 5acce0bf ]
      
      The following commands:
      
       # echo 'read_max u64 size;' > synthetic_events
       # echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger
      
      Causes:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
      v03.03 07/14/2016
       RIP: 0010:strcmp+0xc/0x20
       Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0
      c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07
      3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
       RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8
       R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210
       FS:  00007f1fa6785740(0000) GS:ffff9714da400000(0000)
      knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0
       Call Trace:
        __find_event_file+0x4e/0x80
        action_create+0x6b7/0xeb0
        ? kstrdup+0x44/0x60
        event_hist_trigger_func+0x1a07/0x2130
        trigger_process_regex+0xbd/0x110
        event_trigger_write+0x71/0xd0
        vfs_write+0xe9/0x310
        ksys_write+0x68/0xe0
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f1fa6879e87
      
      The problem was the "trace(read_max,count)" where the "count" should be
      "$count" as "onmax()" only handles variables (although it really should be
      able to figure out that "count" is a field of sys_enter_read). But there's
      a path that does not find the variable and ends up passing a NULL for the
      event, which ends up getting passed to "strcmp()".
      
      Add a check for NULL to return and error on the command with:
      
       # cat error_log
        hist:syscalls:sys_enter_read: error: Couldn't create or find variable
        Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)
                                      ^
      Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 50450603 tracing: Add 'onmax' hist trigger action support
      Reviewed-by: NTom Zanussi <zanussi@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2d349b0a
  20. 15 8月, 2021 1 次提交
  21. 12 8月, 2021 2 次提交
    • S
      tracing/histogram: Rename "cpu" to "common_cpu" · c0add455
      Steven Rostedt (VMware) 提交于
      commit 1e3bac71 upstream.
      
      Currently the histogram logic allows the user to write "cpu" in as an
      event field, and it will record the CPU that the event happened on.
      
      The problem with this is that there's a lot of events that have "cpu"
      as a real field, and using "cpu" as the CPU it ran on, makes it
      impossible to run histograms on the "cpu" field of events.
      
      For example, if I want to have a histogram on the count of the
      workqueue_queue_work event on its cpu field, running:
      
       ># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger
      
      Gives a misleading and wrong result.
      
      Change the command to "common_cpu" as no event should have "common_*"
      fields as that's a reserved name for fields used by all events. And
      this makes sense here as common_cpu would be a field used by all events.
      
      Now we can even do:
      
       ># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger
       ># cat events/workqueue/workqueue_queue_work/hist
       # event histogram
       #
       # trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active]
       #
      
       { common_cpu:          0, cpu:          2 } hitcount:          1
       { common_cpu:          0, cpu:          4 } hitcount:          1
       { common_cpu:          7, cpu:          7 } hitcount:          1
       { common_cpu:          0, cpu:          7 } hitcount:          1
       { common_cpu:          0, cpu:          1 } hitcount:          1
       { common_cpu:          0, cpu:          6 } hitcount:          2
       { common_cpu:          0, cpu:          5 } hitcount:          2
       { common_cpu:          1, cpu:          1 } hitcount:          4
       { common_cpu:          6, cpu:          6 } hitcount:          4
       { common_cpu:          5, cpu:          5 } hitcount:         14
       { common_cpu:          4, cpu:          4 } hitcount:         26
       { common_cpu:          0, cpu:          0 } hitcount:         39
       { common_cpu:          2, cpu:          2 } hitcount:        184
      
      Now for backward compatibility, I added a trick. If "cpu" is used, and
      the field is not found, it will fall back to "common_cpu" and work as
      it did before. This way, it will still work for old programs that use
      "cpu" to get the actual CPU, but if the event has a "cpu" as a field, it
      will get that event's "cpu" field, which is probably what it wants
      anyway.
      
      I updated the tracefs/README to include documentation about both the
      common_timestamp and the common_cpu. This way, if that text is present in
      the README, then an application can know that common_cpu is supported over
      just plain "cpu".
      
      Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home
      
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: 8b7622bf ("tracing: Add cpu field for hist triggers")
      Reviewed-by: NTom Zanussi <zanussi@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0add455
    • S
      tracing / histogram: Give calculation hist_fields a size · 43cba13f
      Steven Rostedt (VMware) 提交于
      commit 2c05caa7 upstream.
      
      When working on my user space applications, I found a bug in the synthetic
      event code where the automated synthetic event field was not matching the
      event field calculation it was attached to. Looking deeper into it, it was
      because the calculation hist_field was not given a size.
      
      The synthetic event fields are matched to their hist_fields either by
      having the field have an identical string type, or if that does not match,
      then the size and signed values are used to match the fields.
      
      The problem arose when I tried to match a calculation where the fields
      were "unsigned int". My tool created a synthetic event of type "u32". But
      it failed to match. The string was:
      
        diff=field1-field2:onmatch(event).trace(synth,$diff)
      
      Adding debugging into the kernel, I found that the size of "diff" was 0.
      And since it was given "unsigned int" as a type, the histogram fallback
      code used size and signed. The signed matched, but the size of u32 (4) did
      not match zero, and the event failed to be created.
      
      This can be worse if the field you want to match is not one of the
      acceptable fields for a synthetic event. As event fields can have any type
      that is supported in Linux, this can cause an issue. For example, if a
      type is an enum. Then there's no way to use that with any calculations.
      
      Have the calculation field simply take on the size of what it is
      calculating.
      
      Link: https://lkml.kernel.org/r/20210730171951.59c7743f@oasis.local.home
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: 100719dc ("tracing: Add simple expression support to hist triggers")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43cba13f
  22. 28 7月, 2021 1 次提交
    • H
      tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. · 6a99bfee
      Haoran Luo 提交于
      commit 67f0d6d9 upstream.
      
      The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when
      "head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to
      the same buffer page, whose "buffer_data_page" is empty and "read" field
      is non-zero.
      
      An error scenario could be constructed as followed (kernel perspective):
      
      1. All pages in the buffer has been accessed by reader(s) so that all of
      them will have non-zero "read" field.
      
      2. Read and clear all buffer pages so that "rb_num_of_entries()" will
      return 0 rendering there's no more data to read. It is also required
      that the "read_page", "commit_page" and "tail_page" points to the same
      page, while "head_page" is the next page of them.
      
      3. Invoke "ring_buffer_lock_reserve()" with large enough "length"
      so that it shot pass the end of current tail buffer page. Now the
      "head_page", "commit_page" and "tail_page" points to the same page.
      
      4. Discard current event with "ring_buffer_discard_commit()", so that
      "head_page", "commit_page" and "tail_page" points to a page whose buffer
      data page is now empty.
      
      When the error scenario has been constructed, "tracing_read_pipe" will
      be trapped inside a deadloop: "trace_empty()" returns 0 since
      "rb_per_cpu_empty()" returns 0 when it hits the CPU containing such
      constructed ring buffer. Then "trace_find_next_entry_inc()" always
      return NULL since "rb_num_of_entries()" reports there's no more entry
      to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking
      "tracing_read_pipe" back to the start of the "waitagain" loop.
      
      I've also written a proof-of-concept script to construct the scenario
      and trigger the bug automatically, you can use it to trace and validate
      my reasoning above:
      
        https://github.com/aegistudio/RingBufferDetonator.git
      
      Tests has been carried out on linux kernel 5.14-rc2
      (2734d6c1), my fixed version
      of kernel (for testing whether my update fixes the bug) and
      some older kernels (for range of affected kernels). Test result is
      also attached to the proof-of-concept repository.
      
      Link: https://lore.kernel.org/linux-trace-devel/YPaNxsIlb2yjSi5Y@aegistudio/
      Link: https://lore.kernel.org/linux-trace-devel/YPgrN85WL9VyrZ55@aegistudio
      
      Cc: stable@vger.kernel.org
      Fixes: bf41a158 ("ring-buffer: make reentrant")
      Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org>
      Signed-off-by: NHaoran Luo <www@aegistudio.net>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a99bfee
  23. 20 7月, 2021 5 次提交
    • S
      tracing: Do not reference char * as a string in histograms · 1834fbbd
      Steven Rostedt (VMware) 提交于
      commit 704adfb5 upstream.
      
      The histogram logic was allowing events with char * pointers to be used as
      normal strings. But it was easy to crash the kernel with:
      
       # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger
      
      And open some files, and boom!
      
       BUG: unable to handle page fault for address: 00007f2ced0c3280
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
      v03.03 07/14/2016
       RIP: 0010:strlen+0x0/0x20
       Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b
      a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74
      10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3
      
       RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
       RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
       RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
       RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
       R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
       R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
       FS:  00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
      knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
       Call Trace:
        event_hist_trigger+0x463/0x5f0
        ? find_held_lock+0x32/0x90
        ? sched_clock_cpu+0xe/0xd0
        ? lock_release+0x155/0x440
        ? kernel_init_free_pages+0x6d/0x90
        ? preempt_count_sub+0x9b/0xd0
        ? kernel_init_free_pages+0x6d/0x90
        ? get_page_from_freelist+0x12c4/0x1680
        ? __rb_reserve_next+0xe5/0x460
        ? ring_buffer_lock_reserve+0x12a/0x3f0
        event_triggers_call+0x52/0xe0
        ftrace_syscall_enter+0x264/0x2c0
        syscall_trace_enter.constprop.0+0x1ee/0x210
        do_syscall_64+0x1c/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Where it triggered a fault on strlen(key) where key was the filename.
      
      The reason is that filename is a char * to user space, and the histogram
      code just blindly dereferenced it, with obvious bad results.
      
      I originally tried to use strncpy_from_user/kernel_nofault() but found
      that there's other places that its dereferenced and not worth the effort.
      
      Just do not allow "char *" to act like strings.
      
      Link: https://lkml.kernel.org/r/20210715000206.025df9d2@rorschach.local.home
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: stable@vger.kernel.org
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NTom Zanussi <zanussi@kernel.org>
      Fixes: 79e577cb ("tracing: Support string type key properly")
      Fixes: 5967bd5c ("tracing: Let filter_assign_type() detect FILTER_PTR_STRING")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1834fbbd
    • P
      tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT · e1105c64
      Paul Burton 提交于
      commit 4030a6e6 upstream.
      
      Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that
      on systems where pid_max is configured higher than PID_MAX_DEFAULT the
      ftrace record-tgid option doesn't work so well. Any tasks with PIDs
      higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and
      don't show up in the saved_tgids file.
      
      In particular since systemd v243 & above configure pid_max to its
      highest possible 1<<22 value by default on 64 bit systems this renders
      the record-tgids option of little use.
      
      Increase the size of tgid_map to the configured pid_max instead,
      allowing it to cover the full range of PIDs up to the maximum value of
      PID_MAX_LIMIT if the system is configured that way.
      
      On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the
      size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in
      memory overhead sounds significant 64 bit systems are presumably best
      placed to accommodate it, and since tgid_map is only allocated when the
      record-tgid option is actually used presumably the user would rather it
      spends sufficient memory to actually record the tgids they expect.
      
      The size of tgid_map could also increase for CONFIG_BASE_SMALL=y
      configurations, but these seem unlikely to be systems upon which people
      are both configuring a large pid_max and running ftrace with record-tgid
      anyway.
      
      Of note is that we only allocate tgid_map once, the first time that the
      record-tgid option is enabled. Therefore its size is only set once, to
      the value of pid_max at the time the record-tgid option is first
      enabled. If a user increases pid_max after that point, the saved_tgids
      file will not contain entries for any tasks with pids beyond the earlier
      value of pid_max.
      
      Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com
      
      Fixes: d914ba37 ("tracing: Add support for recording tgid of tasks")
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPaul Burton <paulburton@google.com>
      [ Fixed comment coding style ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1105c64
    • P
      tracing: Simplify & fix saved_tgids logic · 44896b31
      Paul Burton 提交于
      commit b81b3e95 upstream.
      
      The tgid_map array records a mapping from pid to tgid, where the index
      of an entry within the array is the pid & the value stored at that index
      is the tgid.
      
      The saved_tgids_next() function iterates over pointers into the tgid_map
      array & dereferences the pointers which results in the tgid, but then it
      passes that dereferenced value to trace_find_tgid() which treats it as a
      pid & does a further lookup within the tgid_map array. It seems likely
      that the intent here was to skip over entries in tgid_map for which the
      recorded tgid is zero, but instead we end up skipping over entries for
      which the thread group leader hasn't yet had its own tgid recorded in
      tgid_map.
      
      A minimal fix would be to remove the call to trace_find_tgid, turning:
      
        if (trace_find_tgid(*ptr))
      
      into:
      
        if (*ptr)
      
      ..but it seems like this logic can be much simpler if we simply let
      seq_read() iterate over the whole tgid_map array & filter out empty
      entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
      approach, removing the incorrect logic here entirely.
      
      Link: https://lkml.kernel.org/r/20210630003406.4013668-1-paulburton@google.com
      
      Fixes: d914ba37 ("tracing: Add support for recording tgid of tasks")
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPaul Burton <paulburton@google.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44896b31
    • S
      tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing · 47ab2c74
      Steven Rostedt (VMware) 提交于
      commit 9913d574 upstream.
      
      All internal use cases for tracepoint_probe_register() is set to not ever
      be called with the same function and data. If it is, it is considered a
      bug, as that means the accounting of handling tracepoints is corrupted.
      If the function and data for a tracepoint is already registered when
      tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and
      return with EEXISTS.
      
      The BPF system call can end up calling tracepoint_probe_register() with
      the same data, which now means that this can trigger the warning because
      of a user space process. As WARN_ON_ONCE() should not be called because
      user space called a system call with bad data, there needs to be a way to
      register a tracepoint without triggering a warning.
      
      Enter tracepoint_probe_register_may_exist(), which can be called, but will
      not cause a WARN_ON() if the probe already exists. It will still error out
      with EEXIST, which will then be sent to the user space that performed the
      BPF system call.
      
      This keeps the previous testing for issues with other users of the
      tracepoint code, while letting BPF call it with duplicated data and not
      warn about it.
      
      Link: https://lore.kernel.org/lkml/20210626135845.4080-1-penguin-kernel@I-love.SAKURA.ne.jp/
      Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153
      
      Cc: stable@vger.kernel.org
      Fixes: c4f6699d ("bpf: introduce BPF_RAW_TRACEPOINT")
      Reported-by: Nsyzbot <syzbot+721aa903751db87aa244@syzkaller.appspotmail.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Tested-by: syzbot+721aa903751db87aa244@syzkaller.appspotmail.com
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47ab2c74
    • S
      tracing/histograms: Fix parsing of "sym-offset" modifier · b0519528
      Steven Rostedt (VMware) 提交于
      commit 26c56373 upstream.
      
      With the addition of simple mathematical operations (plus and minus), the
      parsing of the "sym-offset" modifier broke, as it took the '-' part of the
      "sym-offset" as a minus, and tried to break it up into a mathematical
      operation of "field.sym - offset", in which case it failed to parse
      (unless the event had a field called "offset").
      
      Both .sym and .sym-offset modifiers should not be entered into
      mathematical calculations anyway. If ".sym-offset" is found in the
      modifier, then simply make it not an operation that can be calculated on.
      
      Link: https://lkml.kernel.org/r/20210707110821.188ae255@oasis.local.home
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 100719dc ("tracing: Add simple expression support to hist triggers")
      Reviewed-by: NTom Zanussi <zanussi@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0519528
  24. 30 6月, 2021 2 次提交
    • S
      tracing: Do no increment trace_clock_global() by one · 79197fb7
      Steven Rostedt (VMware) 提交于
      commit 89529d8b upstream.
      
      The trace_clock_global() tries to make sure the events between CPUs is
      somewhat in order. A global value is used and updated by the latest read
      of a clock. If one CPU is ahead by a little, and is read by another CPU, a
      lock is taken, and if the timestamp of the other CPU is behind, it will
      simply use the other CPUs timestamp.
      
      The lock is also only taken with a "trylock" due to tracing, and strange
      recursions can happen. The lock is not taken at all in NMI context.
      
      In the case where the lock is not able to be taken, the non synced
      timestamp is returned. But it will not be less than the saved global
      timestamp.
      
      The problem arises because when the time goes "backwards" the time
      returned is the saved timestamp plus 1. If the lock is not taken, and the
      plus one to the timestamp is returned, there's a small race that can cause
      the time to go backwards!
      
      	CPU0				CPU1
      	----				----
      				trace_clock_global() {
      				    ts = clock() [ 1000 ]
      				    trylock(clock_lock) [ success ]
      				    global_ts = ts; [ 1000 ]
      
      				    <interrupted by NMI>
       trace_clock_global() {
          ts = clock() [ 999 ]
          if (ts < global_ts)
      	ts = global_ts + 1 [ 1001 ]
      
          trylock(clock_lock) [ fail ]
      
          return ts [ 1001]
       }
      				    unlock(clock_lock);
      				    return ts; [ 1000 ]
      				}
      
       trace_clock_global() {
          ts = clock() [ 1000 ]
          if (ts < global_ts) [ false 1000 == 1000 ]
      
          trylock(clock_lock) [ success ]
          global_ts = ts; [ 1000 ]
          unlock(clock_lock)
      
          return ts; [ 1000 ]
       }
      
      The above case shows to reads of trace_clock_global() on the same CPU, but
      the second read returns one less than the first read. That is, time when
      backwards, and this is not what is allowed by trace_clock_global().
      
      This was triggered by heavy tracing and the ring buffer checker that tests
      for the clock going backwards:
      
       Ring buffer clock went backwards: 20613921464 -> 20613921463
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 0 at kernel/trace/ring_buffer.c:3412 check_buffer+0x1b9/0x1c0
       Modules linked in:
       [..]
       [CPU: 2]TIME DOES NOT MATCH expected:20620711698 actual:20620711697 delta:6790234 before:20613921463 after:20613921463
         [20613915818] PAGE TIME STAMP
         [20613915818] delta:0
         [20613915819] delta:1
         [20613916035] delta:216
         [20613916465] delta:430
         [20613916575] delta:110
         [20613916749] delta:174
         [20613917248] delta:499
         [20613917333] delta:85
         [20613917775] delta:442
         [20613917921] delta:146
         [20613918321] delta:400
         [20613918568] delta:247
         [20613918768] delta:200
         [20613919306] delta:538
         [20613919353] delta:47
         [20613919980] delta:627
         [20613920296] delta:316
         [20613920571] delta:275
         [20613920862] delta:291
         [20613921152] delta:290
         [20613921464] delta:312
         [20613921464] delta:0 TIME EXTEND
         [20613921464] delta:0
      
      This happened more than once, and always for an off by one result. It also
      started happening after commit aafe104a was added.
      
      Cc: stable@vger.kernel.org
      Fixes: aafe104a ("tracing: Restructure trace_clock_global() to never block")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79197fb7
    • S
      tracing: Do not stop recording comms if the trace file is being read · 04e7a7c9
      Steven Rostedt (VMware) 提交于
      commit 4fdd595e upstream.
      
      A while ago, when the "trace" file was opened, tracing was stopped, and
      code was added to stop recording the comms to saved_cmdlines, for mapping
      of the pids to the task name.
      
      Code has been added that only records the comm if a trace event occurred,
      and there's no reason to not trace it if the trace file is opened.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ffbd48d ("tracing: Cache comms only after an event occurred")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04e7a7c9