1. 10 11月, 2019 1 次提交
    • Z
      tracing: Fix "gfp_t" format for synthetic events · fa18f803
      Zhengjun Xing 提交于
      [ Upstream commit 9fa8c9c647be624e91b09ecffa7cd97ee0600b40 ]
      
      In the format of synthetic events, the "gfp_t" is shown as "signed:1",
      but in fact the "gfp_t" is "unsigned", should be shown as "signed:0".
      
      The issue can be reproduced by the following commands:
      
      echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events
      cat  /sys/kernel/debug/tracing/events/synthetic/memlatency/format
      
      name: memlatency
      ID: 2233
      format:
              field:unsigned short common_type;       offset:0;       size:2; signed:0;
              field:unsigned char common_flags;       offset:2;       size:1; signed:0;
              field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
              field:int common_pid;   offset:4;       size:4; signed:1;
      
              field:u64 lat;  offset:8;       size:8; signed:0;
              field:unsigned int order;       offset:16;      size:4; signed:0;
              field:gfp_t gfp_flags;  offset:24;      size:4; signed:1;
              field:int migratetype;  offset:32;      size:4; signed:1;
      
      print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype
      
      Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.comReviewed-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NZhengjun Xing <zhengjun.xing@linux.intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fa18f803
  2. 06 11月, 2019 1 次提交
    • P
      tracing: Initialize iter->seq after zeroing in tracing_read_pipe() · 394c90d9
      Petr Mladek 提交于
      [ Upstream commit d303de1fcf344ff7c15ed64c3f48a991c9958775 ]
      
      A customer reported the following softlockup:
      
      [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
      [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
      [899688.160002] RIP: 0010:up_write+0x1a/0x30
      [899688.160002] Kernel panic - not syncing: softlockup: hung tasks
      [899688.160002] RIP: 0010:up_write+0x1a/0x30
      [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
      [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
      [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
      [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
      [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
      [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
      [899688.160002]  tracing_read_pipe+0x336/0x3c0
      [899688.160002]  __vfs_read+0x26/0x140
      [899688.160002]  vfs_read+0x87/0x130
      [899688.160002]  SyS_read+0x42/0x90
      [899688.160002]  do_syscall_64+0x74/0x160
      
      It caught the process in the middle of trace_access_unlock(). There is
      no loop. So, it must be looping in the caller tracing_read_pipe()
      via the "waitagain" label.
      
      Crashdump analyze uncovered that iter->seq was completely zeroed
      at this point, including iter->seq.seq.size. It means that
      print_trace_line() was never able to print anything and
      there was no forward progress.
      
      The culprit seems to be in the code:
      
      	/* reset all but tr, trace, and overruns */
      	memset(&iter->seq, 0,
      	       sizeof(struct trace_iterator) -
      	       offsetof(struct trace_iterator, seq));
      
      It was added by the commit 53d0aa77 ("ftrace:
      add logic to record overruns"). It was v2.6.27-rc1.
      It was the time when iter->seq looked like:
      
           struct trace_seq {
      	unsigned char		buffer[PAGE_SIZE];
      	unsigned int		len;
           };
      
      There was no "size" variable and zeroing was perfectly fine.
      
      The solution is to reinitialize the structure after or without
      zeroing.
      
      Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      394c90d9
  3. 29 10月, 2019 1 次提交
    • P
      tracing: Fix race in perf_trace_buf initialization · 5ce7528c
      Prateek Sood 提交于
      commit 6b1340cc00edeadd52ebd8a45171f38c8de2a387 upstream.
      
      A race condition exists while initialiazing perf_trace_buf from
      perf_trace_init() and perf_kprobe_init().
      
            CPU0                                        CPU1
      perf_trace_init()
        mutex_lock(&event_mutex)
          perf_trace_event_init()
            perf_trace_event_reg()
              total_ref_count == 0
      	buf = alloc_percpu()
              perf_trace_buf[i] = buf
              tp_event->class->reg() //fails       perf_kprobe_init()
      	goto fail                              perf_trace_event_init()
                                                       perf_trace_event_reg()
              fail:
      	  total_ref_count == 0
      
                                                         total_ref_count == 0
                                                         buf = alloc_percpu()
                                                         perf_trace_buf[i] = buf
                                                         tp_event->class->reg()
                                                         total_ref_count++
      
                free_percpu(perf_trace_buf[i])
                perf_trace_buf[i] = NULL
      
      Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
      causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
      getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
      event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
      fix this race.
      
      The race caused the following bug:
      
       Unable to handle kernel paging request at virtual address 0000003106f2003c
       Mem abort info:
         ESR = 0x96000045
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000045
         CM = 0, WnR = 1
       user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
       [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
       Internal error: Oops: 96000045 [#1] PREEMPT SMP
       Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
       pstate: 80400005 (Nzcv daif +PAN -UAO)
       pc : __memset+0x20/0x1ac
       lr : memset+0x3c/0x50
       sp : ffffffc09319fc50
      
        __memset+0x20/0x1ac
        perf_trace_buf_alloc+0x140/0x1a0
        perf_trace_sys_enter+0x158/0x310
        syscall_trace_enter+0x348/0x7c0
        el0_svc_common+0x11c/0x368
        el0_svc_handler+0x12c/0x198
        el0_svc+0x8/0xc
      
      Ramdumps showed the following:
        total_ref_count = 3
        perf_trace_buf = (
            0x0 -> NULL,
            0x0 -> NULL,
            0x0 -> NULL,
            0x0 -> NULL)
      
      Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org
      
      Cc: stable@vger.kernel.org
      Fixes: e12f03d7 ("perf/core: Implement the 'perf_kprobe' PMU")
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NPrateek Sood <prsood@codeaurora.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ce7528c
  4. 18 10月, 2019 4 次提交
  5. 12 10月, 2019 1 次提交
    • T
      tracing: Make sure variable reference alias has correct var_ref_idx · e010c983
      Tom Zanussi 提交于
      commit 17f8607a1658a8e70415eef67909f990d13017b5 upstream.
      
      Original changelog from Steve Rostedt (except last sentence which
      explains the problem, and the Fixes: tag):
      
      I performed a three way histogram with the following commands:
      
      echo 'irq_lat u64 lat pid_t pid' > synthetic_events
      echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events
      echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger
      echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger
      echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger
      echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger
      echo 1 > events/synthetic/wake_lat/enable
      
      Basically I wanted to see:
      
       hrtimer_start (calling function tick_sched_timer)
      
      Note:
      
        # grep tick_sched_timer /proc/kallsyms
      ffffffff81200580 t tick_sched_timer
      
      And save the time of that, and then record sched_waking if it is called
      in interrupt context and with the same pid as the hrtimer_start, it
      will record the latency between that and the waking event.
      
      I then look at when the task that is woken is scheduled in, and record
      the latency between the wakeup and the task running.
      
      At the end, the wake_lat synthetic event will show the wakeup to
      scheduled latency, as well as the irq latency in from hritmer_start to
      the wakeup. The problem is that I found this:
      
                <idle>-0     [007] d...   190.485261: wake_lat: lat=27 irqlat=190485230 pid=698
                <idle>-0     [005] d...   190.485283: wake_lat: lat=40 irqlat=190485239 pid=10
                <idle>-0     [002] d...   190.488327: wake_lat: lat=56 irqlat=190488266 pid=335
                <idle>-0     [005] d...   190.489330: wake_lat: lat=64 irqlat=190489262 pid=10
                <idle>-0     [003] d...   190.490312: wake_lat: lat=43 irqlat=190490265 pid=77
                <idle>-0     [005] d...   190.493322: wake_lat: lat=54 irqlat=190493262 pid=10
                <idle>-0     [005] d...   190.497305: wake_lat: lat=35 irqlat=190497267 pid=10
                <idle>-0     [005] d...   190.501319: wake_lat: lat=50 irqlat=190501264 pid=10
      
      The irqlat seemed quite large! Investigating this further, if I had
      enabled the irq_lat synthetic event, I noticed this:
      
                <idle>-0     [002] d.s.   249.429308: irq_lat: lat=164968 pid=335
                <idle>-0     [002] d...   249.429369: wake_lat: lat=55 irqlat=249429308 pid=335
      
      Notice that the timestamp of the irq_lat "249.429308" is awfully
      similar to the reported irqlat variable. In fact, all instances were
      like this. It appeared that:
      
        irqlat=$irqlat
      
      Wasn't assigning the old $irqlat to the new irqlat variable, but
      instead was assigning the $irqts to it.
      
      The issue is that assigning the old $irqlat to the new irqlat variable
      creates a variable reference alias, but the alias creation code
      forgets to make sure the alias uses the same var_ref_idx to access the
      reference.
      
      Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org
      
      Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
      Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 7e8b88a3 ("tracing: Add hist trigger support for variable reference aliases")
      Reported-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NTom Zanussi <zanussi@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e010c983
  6. 06 9月, 2019 3 次提交
  7. 07 8月, 2019 1 次提交
    • C
      ftrace: Enable trampoline when rec count returns back to one · f486088d
      Cheng Jian 提交于
      [ Upstream commit a124692b698b00026a58d89831ceda2331b2e1d0 ]
      
      Custom trampolines can only be enabled if there is only a single ops
      attached to it. If there's only a single callback registered to a function,
      and the ops has a trampoline registered for it, then we can call the
      trampoline directly. This is very useful for improving the performance of
      ftrace and livepatch.
      
      If more than one callback is registered to a function, the general
      trampoline is used, and the custom trampoline is not restored back to the
      direct call even if all the other callbacks were unregistered and we are
      back to one callback for the function.
      
      To fix this, set FTRACE_FL_TRAMP flag if rec count is decremented
      to one, and the ops that left has a trampoline.
      
      Testing After this patch :
      
      insmod livepatch_unshare_files.ko
      cat /sys/kernel/debug/tracing/enabled_functions
      
      	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0
      
      echo unshare_files > /sys/kernel/debug/tracing/set_ftrace_filter
      echo function > /sys/kernel/debug/tracing/current_tracer
      cat /sys/kernel/debug/tracing/enabled_functions
      
      	unshare_files (2) R I ->ftrace_ops_list_func+0x0/0x150
      
      echo nop > /sys/kernel/debug/tracing/current_tracer
      cat /sys/kernel/debug/tracing/enabled_functions
      
      	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0
      
      Link: http://lkml.kernel.org/r/1556969979-111047-1-git-send-email-cj.chengjian@huawei.comSigned-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f486088d
  8. 10 7月, 2019 5 次提交
    • P
      ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code() · c854d9b6
      Petr Mladek 提交于
      commit d5b844a2cf507fc7642c9ae80a9d585db3065c28 upstream.
      
      The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text
      permissions race") causes a possible deadlock between register_kprobe()
      and ftrace_run_update_code() when ftrace is using stop_machine().
      
      The existing dependency chain (in reverse order) is:
      
      -> #1 (text_mutex){+.+.}:
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             __mutex_lock+0x88/0x908
             mutex_lock_nested+0x32/0x40
             register_kprobe+0x254/0x658
             init_kprobes+0x11a/0x168
             do_one_initcall+0x70/0x318
             kernel_init_freeable+0x456/0x508
             kernel_init+0x22/0x150
             ret_from_fork+0x30/0x34
             kernel_thread_starter+0x0/0xc
      
      -> #0 (cpu_hotplug_lock.rw_sem){++++}:
             check_prev_add+0x90c/0xde0
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             cpus_read_lock+0x62/0xd0
             stop_machine+0x2e/0x60
             arch_ftrace_update_code+0x2e/0x40
             ftrace_run_update_code+0x40/0xa0
             ftrace_startup+0xb2/0x168
             register_ftrace_function+0x64/0x88
             klp_patch_object+0x1a2/0x290
             klp_enable_patch+0x554/0x980
             do_one_initcall+0x70/0x318
             do_init_module+0x6e/0x250
             load_module+0x1782/0x1990
             __s390x_sys_finit_module+0xaa/0xf0
             system_call+0xd8/0x2d0
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(text_mutex);
                                     lock(cpu_hotplug_lock.rw_sem);
                                     lock(text_mutex);
        lock(cpu_hotplug_lock.rw_sem);
      
      It is similar problem that has been solved by the commit 2d1e38f5
      ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
      To be on the safe side, text_mutex must become a low level lock taken
      after cpu_hotplug_lock.rw_sem.
      
      This can't be achieved easily with the current ftrace design.
      For example, arm calls set_all_modules_text_rw() already in
      ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
      This functions is called:
      
        + outside stop_machine() from ftrace_run_update_code()
        + without stop_machine() from ftrace_module_enable()
      
      Fortunately, the problematic fix is needed only on x86_64. It is
      the only architecture that calls set_all_modules_text_rw()
      in ftrace path and supports livepatching at the same time.
      
      Therefore it is enough to move text_mutex handling from the generic
      kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:
      
         ftrace_arch_code_modify_prepare()
         ftrace_arch_code_modify_post_process()
      
      This patch basically reverts the ftrace part of the problematic
      commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module
      text permissions race"). And provides x86_64 specific-fix.
      
      Some refactoring of the ftrace code will be needed when livepatching
      is implemented for arm or nds32. These architectures call
      set_all_modules_text_rw() and use stop_machine() at the same time.
      
      Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com
      
      Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race")
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      [
        As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
        ftrace_run_update_code() as it is a void function.
      ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c854d9b6
    • E
      tracing/snapshot: Resize spare buffer if size changed · c8790d7f
      Eiichi Tsukata 提交于
      commit 46cc0b44428d0f0e81f11ea98217fc0edfbeab07 upstream.
      
      Current snapshot implementation swaps two ring_buffers even though their
      sizes are different from each other, that can cause an inconsistency
      between the contents of buffer_size_kb file and the current buffer size.
      
      For example:
      
        # cat buffer_size_kb
        7 (expanded: 1408)
        # echo 1 > events/enable
        # grep bytes per_cpu/cpu0/stats
        bytes: 1441020
        # echo 1 > snapshot             // current:1408, spare:1408
        # echo 123 > buffer_size_kb     // current:123,  spare:1408
        # echo 1 > snapshot             // current:1408, spare:123
        # grep bytes per_cpu/cpu0/stats
        bytes: 1443700
        # cat buffer_size_kb
        123                             // != current:1408
      
      And also, a similar per-cpu case hits the following WARNING:
      
      Reproducer:
      
        # echo 1 > per_cpu/cpu0/snapshot
        # echo 123 > buffer_size_kb
        # echo 1 > per_cpu/cpu0/snapshot
      
      WARNING:
      
        WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380
        Modules linked in:
        CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
        RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380
        Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48
        RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093
        RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8
        RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005
        RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b
        R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000
        R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060
        FS:  00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0
        Call Trace:
         ? trace_array_printk_buf+0x140/0x140
         ? __mutex_lock_slowpath+0x10/0x10
         tracing_snapshot_write+0x4c8/0x7f0
         ? trace_printk_init_buffers+0x60/0x60
         ? selinux_file_permission+0x3b/0x540
         ? tracer_preempt_off+0x38/0x506
         ? trace_printk_init_buffers+0x60/0x60
         __vfs_write+0x81/0x100
         vfs_write+0x1e1/0x560
         ksys_write+0x126/0x250
         ? __ia32_sys_read+0xb0/0xb0
         ? do_syscall_64+0x1f/0x390
         do_syscall_64+0xc1/0x390
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This patch adds resize_buffer_duplicate_size() to check if there is a
      difference between current/spare buffer sizes and resize a spare buffer
      if necessary.
      
      Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com
      
      Cc: stable@vger.kernel.org
      Fixes: ad909e21 ("tracing: Add internal tracing_snapshot() functions")
      Signed-off-by: NEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8790d7f
    • W
      ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() · 2b39351e
      Wei Li 提交于
      [ Upstream commit 04e03d9a616c19a47178eaca835358610e63a1dd ]
      
      The mapper may be NULL when called from register_ftrace_function_probe()
      with probe->data == NULL.
      
      This issue can be reproduced as follow (it may be covered by compiler
      optimization sometime):
      
      / # cat /sys/kernel/debug/tracing/set_ftrace_filter
      #### all functions enabled ####
      / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
      [  206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  206.952402] Mem abort info:
      [  206.952819]   ESR = 0x96000006
      [  206.955326]   Exception class = DABT (current EL), IL = 32 bits
      [  206.955844]   SET = 0, FnV = 0
      [  206.956272]   EA = 0, S1PTW = 0
      [  206.956652] Data abort info:
      [  206.957320]   ISV = 0, ISS = 0x00000006
      [  206.959271]   CM = 0, WnR = 0
      [  206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
      [  206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
      [  206.964953] Internal error: Oops: 96000006 [#1] SMP
      [  206.971122] Dumping ftrace buffer:
      [  206.973677]    (ftrace buffer empty)
      [  206.975258] Modules linked in:
      [  206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
      [  206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
      [  206.978955] Hardware name: linux,dummy-virt (DT)
      [  206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
      [  206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
      [  206.980874] lr : ftrace_count_free+0x68/0x80
      [  206.982539] sp : ffff0000182f3ab0
      [  206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
      [  206.983632] x27: ffff000013054b40 x26: 0000000000000001
      [  206.984000] x25: ffff00001385f000 x24: 0000000000000000
      [  206.984394] x23: ffff000013453000 x22: ffff000013054000
      [  206.984775] x21: 0000000000000000 x20: ffff00001385fe28
      [  206.986575] x19: ffff000013872c30 x18: 0000000000000000
      [  206.987111] x17: 0000000000000000 x16: 0000000000000000
      [  206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
      [  206.987850] x13: 000000000017430e x12: 0000000000000580
      [  206.988251] x11: 0000000000000000 x10: cccccccccccccccc
      [  206.988740] x9 : 0000000000000000 x8 : ffff000013917550
      [  206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
      [  206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
      [  206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
      [  206.991771] x1 : 0000000000000000 x0 : 0000000000000000
      [  206.992557] Call trace:
      [  206.993101]  free_ftrace_func_mapper+0x2c/0x118
      [  206.994827]  ftrace_count_free+0x68/0x80
      [  206.995238]  release_probe+0xfc/0x1d0
      [  206.995555]  register_ftrace_function_probe+0x4a8/0x868
      [  206.995923]  ftrace_trace_probe_callback.isra.4+0xb8/0x180
      [  206.996330]  ftrace_dump_callback+0x50/0x70
      [  206.996663]  ftrace_regex_write.isra.29+0x290/0x3a8
      [  206.997157]  ftrace_filter_write+0x44/0x60
      [  206.998971]  __vfs_write+0x64/0xf0
      [  206.999285]  vfs_write+0x14c/0x2f0
      [  206.999591]  ksys_write+0xbc/0x1b0
      [  206.999888]  __arm64_sys_write+0x3c/0x58
      [  207.000246]  el0_svc_common.constprop.0+0x408/0x5f0
      [  207.000607]  el0_svc_handler+0x144/0x1c8
      [  207.000916]  el0_svc+0x8/0xc
      [  207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
      [  207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
      [  207.010126] Kernel panic - not syncing: Fatal exception
      [  207.011322] SMP: stopping secondary CPUs
      [  207.013956] Dumping ftrace buffer:
      [  207.014595]    (ftrace buffer empty)
      [  207.015632] Kernel Offset: disabled
      [  207.017187] CPU features: 0x002,20006008
      [  207.017985] Memory Limit: none
      [  207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.comSigned-off-by: NWei Li <liwei391@huawei.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2b39351e
    • J
      module: Fix livepatch/ftrace module text permissions race · 93804417
      Josh Poimboeuf 提交于
      [ Upstream commit 9f255b632bf12c4dd7fc31caee89aa991ef75176 ]
      
      It's possible for livepatch and ftrace to be toggling a module's text
      permissions at the same time, resulting in the following panic:
      
        BUG: unable to handle page fault for address: ffffffffc005b1d9
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0003) - permissions violation
        PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
        Oops: 0003 [#1] PREEMPT SMP PTI
        CPU: 1 PID: 453 Comm: insmod Tainted: G           O  K   5.2.0-rc1-a188339ca5 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
        RIP: 0010:apply_relocate_add+0xbe/0x14c
        Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
        RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
        RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
        RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
        RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
        R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
        R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
        FS:  00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         klp_init_object_loaded+0x10f/0x219
         ? preempt_latency_start+0x21/0x57
         klp_enable_patch+0x662/0x809
         ? virt_to_head_page+0x3a/0x3c
         ? kfree+0x8c/0x126
         patch_init+0x2ed/0x1000 [livepatch_test02]
         ? 0xffffffffc0060000
         do_one_initcall+0x9f/0x1c5
         ? kmem_cache_alloc_trace+0xc4/0xd4
         ? do_init_module+0x27/0x210
         do_init_module+0x5f/0x210
         load_module+0x1c41/0x2290
         ? fsnotify_path+0x3b/0x42
         ? strstarts+0x2b/0x2b
         ? kernel_read+0x58/0x65
         __do_sys_finit_module+0x9f/0xc3
         ? __do_sys_finit_module+0x9f/0xc3
         __x64_sys_finit_module+0x1a/0x1c
         do_syscall_64+0x52/0x61
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The above panic occurs when loading two modules at the same time with
      ftrace enabled, where at least one of the modules is a livepatch module:
      
      CPU0					CPU1
      klp_enable_patch()
        klp_init_object_loaded()
          module_disable_ro()
          					ftrace_module_enable()
      					  ftrace_arch_code_modify_post_process()
      				    	    set_all_modules_text_ro()
            klp_write_object_relocations()
              apply_relocate_add()
      	  *patches read-only code* - BOOM
      
      A similar race exists when toggling ftrace while loading a livepatch
      module.
      
      Fix it by ensuring that the livepatch and ftrace code patching
      operations -- and their respective permissions changes -- are protected
      by the text_mutex.
      
      Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.comReported-by: NJohannes Erdfelt <johannes@erdfelt.com>
      Fixes: 444d13ff ("modules: add ro_after_init support")
      Acked-by: NJessica Yu <jeyu@kernel.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      93804417
    • V
      tracing: avoid build warning with HAVE_NOP_MCOUNT · 220adcc0
      Vasily Gorbik 提交于
      [ Upstream commit cbdaeaf050b730ea02e9ab4ff844ce54d85dbe1d ]
      
      Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
      and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
      testing CC_USING_* defines) to avoid conditional compilation and fix
      the following gcc 9 warning on s390:
      
      kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
      but not used [-Wunused-function]
      
      Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours
      
      Fixes: 2f4df001 ("tracing: Add -mcount-nop option support")
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      220adcc0
  9. 03 7月, 2019 2 次提交
  10. 25 6月, 2019 1 次提交
    • M
      tracing: Silence GCC 9 array bounds warning · c493ead3
      Miguel Ojeda 提交于
      commit 0c97bf863efce63d6ab7971dad811601e6171d2f upstream.
      
      Starting with GCC 9, -Warray-bounds detects cases when memset is called
      starting on a member of a struct but the size to be cleared ends up
      writing over further members.
      
      Such a call happens in the trace code to clear, at once, all members
      after and including `seq` on struct trace_iterator:
      
          In function 'memset',
              inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
          ./include/linux/string.h:344:9: warning: '__builtin_memset' offset
          [8505, 8560] from the object at 'iter' is out of the bounds of
          referenced subobject 'seq' with type 'struct trace_seq' at offset
          4368 [-Warray-bounds]
            344 |  return __builtin_memset(p, c, size);
                |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      In order to avoid GCC complaining about it, we compute the address
      ourselves by adding the offsetof distance instead of referring
      directly to the member.
      
      Since there are two places doing this clear (trace.c and trace_kdb.c),
      take the chance to move the workaround into a single place in
      the internal header.
      
      Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.comSigned-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      [ Removed unnecessary parenthesis around "iter" ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c493ead3
  11. 19 6月, 2019 1 次提交
  12. 09 6月, 2019 1 次提交
  13. 31 5月, 2019 1 次提交
  14. 26 5月, 2019 1 次提交
    • E
      tracing: Fix partial reading of trace event's id file · fb8c9c90
      Elazar Leibovich 提交于
      commit cbe08bcbbe787315c425dde284dcb715cfbf3f39 upstream.
      
      When reading only part of the id file, the ppos isn't tracked correctly.
      This is taken care by simple_read_from_buffer.
      
      Reading a single byte, and then the next byte would result EOF.
      
      While this seems like not a big deal, this breaks abstractions that
      reads information from files unbuffered. See for example
      https://github.com/golang/go/issues/29399
      
      This code was mentioned as problematic in
      commit cd458ba9
      ("tracing: Do not (ab)use trace_seq in event_id_read()")
      
      An example C code that show this bug is:
      
        #include <stdio.h>
        #include <stdint.h>
      
        #include <sys/types.h>
        #include <sys/stat.h>
        #include <fcntl.h>
        #include <unistd.h>
      
        int main(int argc, char **argv) {
          if (argc < 2)
            return 1;
          int fd = open(argv[1], O_RDONLY);
          char c;
          read(fd, &c, 1);
          printf("First  %c\n", c);
          read(fd, &c, 1);
          printf("Second %c\n", c);
        }
      
      Then run with, e.g.
      
        sudo ./a.out /sys/kernel/debug/tracing/events/tcp/tcp_set_state/id
      
      You'll notice you're getting the first character twice, instead of the
      first two characters in the id file.
      
      Link: http://lkml.kernel.org/r/20181231115837.4932-1-elazar@lightbitslabs.com
      
      Cc: Orit Wasserman <orit.was@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 23725aee ("ftrace: provide an id file for each event")
      Signed-off-by: NElazar Leibovich <elazar@lightbitslabs.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb8c9c90
  15. 04 5月, 2019 1 次提交
  16. 02 5月, 2019 3 次提交
    • P
      trace: Fix preempt_enable_no_resched() abuse · d0264d10
      Peter Zijlstra 提交于
      commit d6097c9e4454adf1f8f2c9547c2fa6060d55d952 upstream.
      
      Unless the very next line is schedule(), or implies it, one must not use
      preempt_enable_no_resched(). It can cause a preemption to go missing and
      thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
      
      Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net
      
      Cc: Waiman Long <longman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: the arch/x86 maintainers <x86@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: huang ying <huang.ying.caritas@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: stable@vger.kernel.org
      Fixes: 2c2d7329 ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0264d10
    • J
      tracing: Fix buffer_ref pipe ops · cffeb9c8
      Jann Horn 提交于
      commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream.
      
      This fixes multiple issues in buffer_pipe_buf_ops:
      
       - The ->steal() handler must not return zero unless the pipe buffer has
         the only reference to the page. But generic_pipe_buf_steal() assumes
         that every reference to the pipe is tracked by the page's refcount,
         which isn't true for these buffers - buffer_pipe_buf_get(), which
         duplicates a buffer, doesn't touch the page's refcount.
         Fix it by using generic_pipe_buf_nosteal(), which refuses every
         attempted theft. It should be easy to actually support ->steal, but the
         only current users of pipe_buf_steal() are the virtio console and FUSE,
         and they also only use it as an optimization. So it's probably not worth
         the effort.
       - The ->get() and ->release() handlers can be invoked concurrently on pipe
         buffers backed by the same struct buffer_ref. Make them safe against
         concurrency by using refcount_t.
       - The pointers stored in ->private were only zeroed out when the last
         reference to the buffer_ref was dropped. As far as I know, this
         shouldn't be necessary anyway, but if we do it, let's always do it.
      
      Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cffeb9c8
    • W
      tracing: Fix a memory leak by early error exit in trace_pid_write() · 0e78e92d
      Wenwen Wang 提交于
      commit 91862cc7867bba4ee5c8fcf0ca2f1d30427b6129 upstream.
      
      In trace_pid_write(), the buffer for trace parser is allocated through
      kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
      it is then freed through kfree() in trace_parser_put(). However, it is
      possible that trace_pid_write() is terminated due to unexpected errors,
      e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
      is a memory leak bug.
      
      To fix this issue, free the allocated buffer when an error is encountered.
      
      Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu
      
      Fixes: f4d34a87 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
      Cc: stable@vger.kernel.org
      Signed-off-by: NWenwen Wang <wang6495@umn.edu>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e78e92d
  17. 27 4月, 2019 1 次提交
  18. 06 4月, 2019 1 次提交
    • D
      tracing: kdb: Fix ftdump to not sleep · b73c7d02
      Douglas Anderson 提交于
      [ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ]
      
      As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
      BUG for "sleeping function called from invalid context".
      
      kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
      atomic context.  A very simple solution for this is to add allocation
      flags to ring_buffer_read_prepare() so kdb can call it without
      triggering the allocation error.  This patch does that.
      
      Note that in the original email thread about this, it was suggested
      that perhaps the solution for kdb was to either preallocate the buffer
      ahead of time or create our own iterator.  I'm hoping that this
      alternative of adding allocation flags to ring_buffer_read_prepare()
      can be considered since it means I don't need to duplicate more of the
      core trace code into "trace_kdb.c" (for either creating my own
      iterator or re-preparing a ring allocator whose memory was already
      allocated).
      
      NOTE: another option for kdb is to actually figure out how to make it
      reuse the existing ftrace_dump() function and totally eliminate the
      duplication.  This sounds very appealing and actually works (the "sr
      z" command can be seen to properly dump the ftrace buffer).  The
      downside here is that ftrace_dump() fully consumes the trace buffer.
      Unless that is changed I'd rather not use it because it means "ftdump
      | grep xyz" won't be very useful to search the ftrace buffer since it
      will throw away the whole trace on the first grep.  A future patch to
      dump only the last few lines of the buffer will also be hard to
      implement.
      
      [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
      
      Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.orgReported-by: NBrian Norris <briannorris@chromium.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b73c7d02
  19. 24 3月, 2019 3 次提交
  20. 14 3月, 2019 1 次提交
    • A
      bpf: fix potential deadlock in bpf_prog_register · 3bbe6a42
      Alexei Starovoitov 提交于
      [ Upstream commit e16ec34039c701594d55d08a5aa49ee3e1abc821 ]
      
      Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
      [   13.007000] WARNING: possible circular locking dependency detected
      [   13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
      [   13.008124] ------------------------------------------------------
      [   13.008624] test_progs/246 is trying to acquire lock:
      [   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
      [   13.009770]
      [   13.009770] but task is already holding lock:
      [   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.010877]
      [   13.010877] which lock already depends on the new lock.
      [   13.010877]
      [   13.011532]
      [   13.011532] the existing dependency chain (in reverse order) is:
      [   13.012129]
      [   13.012129] -> #4 (bpf_event_mutex){+.+.}:
      [   13.012582]        perf_event_query_prog_array+0x9b/0x130
      [   13.013016]        _perf_ioctl+0x3aa/0x830
      [   13.013354]        perf_ioctl+0x2e/0x50
      [   13.013668]        do_vfs_ioctl+0x8f/0x6a0
      [   13.014003]        ksys_ioctl+0x70/0x80
      [   13.014320]        __x64_sys_ioctl+0x16/0x20
      [   13.014668]        do_syscall_64+0x4a/0x180
      [   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.015469]
      [   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
      [   13.015910]        perf_event_init_cpu+0x5a/0x90
      [   13.016291]        perf_event_init+0x1b2/0x1de
      [   13.016654]        start_kernel+0x2b8/0x42a
      [   13.016995]        secondary_startup_64+0xa4/0xb0
      [   13.017382]
      [   13.017382] -> #2 (pmus_lock){+.+.}:
      [   13.017794]        perf_event_init_cpu+0x21/0x90
      [   13.018172]        cpuhp_invoke_callback+0xb3/0x960
      [   13.018573]        _cpu_up+0xa7/0x140
      [   13.018871]        do_cpu_up+0xa4/0xc0
      [   13.019178]        smp_init+0xcd/0xd2
      [   13.019483]        kernel_init_freeable+0x123/0x24f
      [   13.019878]        kernel_init+0xa/0x110
      [   13.020201]        ret_from_fork+0x24/0x30
      [   13.020541]
      [   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
      [   13.021051]        static_key_slow_inc+0xe/0x20
      [   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
      [   13.021891]        perf_trace_event_init+0x11f/0x250
      [   13.022297]        perf_trace_init+0x6b/0xa0
      [   13.022644]        perf_tp_event_init+0x25/0x40
      [   13.023011]        perf_try_init_event+0x6b/0x90
      [   13.023386]        perf_event_alloc+0x9a8/0xc40
      [   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
      [   13.024173]        do_syscall_64+0x4a/0x180
      [   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.024968]
      [   13.024968] -> #0 (tracepoints_mutex){+.+.}:
      [   13.025434]        __mutex_lock+0x86/0x970
      [   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
      [   13.026215]        bpf_probe_register+0x40/0x60
      [   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.027042]        __do_sys_bpf+0x94f/0x1a90
      [   13.027389]        do_syscall_64+0x4a/0x180
      [   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.028171]
      [   13.028171] other info that might help us debug this:
      [   13.028171]
      [   13.028807] Chain exists of:
      [   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
      [   13.028807]
      [   13.029666]  Possible unsafe locking scenario:
      [   13.029666]
      [   13.030140]        CPU0                    CPU1
      [   13.030510]        ----                    ----
      [   13.030875]   lock(bpf_event_mutex);
      [   13.031166]                                lock(&cpuctx_mutex);
      [   13.031645]                                lock(bpf_event_mutex);
      [   13.032135]   lock(tracepoints_mutex);
      [   13.032441]
      [   13.032441]  *** DEADLOCK ***
      [   13.032441]
      [   13.032911] 1 lock held by test_progs/246:
      [   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.033909]
      [   13.033909] stack backtrace:
      [   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
      [   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   13.035657] Call Trace:
      [   13.035859]  dump_stack+0x5f/0x8b
      [   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
      [   13.036526]  __lock_acquire+0x1158/0x1350
      [   13.036852]  ? lock_acquire+0x98/0x190
      [   13.037154]  lock_acquire+0x98/0x190
      [   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.037876]  __mutex_lock+0x86/0x970
      [   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.039028]  ? __mutex_lock+0x86/0x970
      [   13.039337]  ? __mutex_lock+0x24a/0x970
      [   13.039649]  ? bpf_probe_register+0x1d/0x60
      [   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
      [   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
      [   13.041325]  bpf_probe_register+0x40/0x60
      [   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.042068]  ? __might_fault+0x3e/0x90
      [   13.042374]  __do_sys_bpf+0x94f/0x1a90
      [   13.042678]  do_syscall_64+0x4a/0x180
      [   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.043382] RIP: 0033:0x7f23b10a07f9
      [   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
      [   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
      [   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
      [   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
      [   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
      [   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
      
      Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
      there is no need to take bpf_event_mutex too.
      bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
      bpf_raw_tracepoints don't need to take this mutex.
      
      Fixes: c4f6699d ("bpf: introduce BPF_RAW_TRACEPOINT")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3bbe6a42
  21. 10 3月, 2019 1 次提交
  22. 27 2月, 2019 1 次提交
  23. 20 2月, 2019 1 次提交
    • A
      tracing/uprobes: Fix output for multiple string arguments · 45649b99
      Andreas Ziegler 提交于
      commit 0722069a5374b904ec1a67f91249f90e1cfae259 upstream.
      
      When printing multiple uprobe arguments as strings the output for the
      earlier arguments would also include all later string arguments.
      
      This is best explained in an example:
      
      Consider adding a uprobe to a function receiving two strings as
      parameters which is at offset 0xa0 in strlib.so and we want to print
      both parameters when the uprobe is hit (on x86_64):
      
      $ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
          /sys/kernel/debug/tracing/uprobe_events
      
      When the function is called as func("foo", "bar") and we hit the probe,
      the trace file shows a line like the following:
      
        [...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"
      
      Note the extra "bar" printed as part of arg1. This behaviour stacks up
      for additional string arguments.
      
      The strings are stored in a dynamically growing part of the uprobe
      buffer by fetch_store_string() after copying them from userspace via
      strncpy_from_user(). The return value of strncpy_from_user() is then
      directly used as the required size for the string. However, this does
      not take the terminating null byte into account as the documentation
      for strncpy_from_user() cleary states that it "[...] returns the
      length of the string (not including the trailing NUL)" even though the
      null byte will be copied to the destination.
      
      Therefore, subsequent calls to fetch_store_string() will overwrite
      the terminating null byte of the most recently fetched string with
      the first character of the current string, leading to the
      "accumulation" of strings in earlier arguments in the output.
      
      Fix this by incrementing the return value of strncpy_from_user() by
      one if we did not hit the maximum buffer size.
      
      Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 5baaa59e ("tracing/probes: Implement 'memory' fetch method for uprobes")
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NAndreas Ziegler <andreas.ziegler@fau.de>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45649b99
  24. 15 2月, 2019 1 次提交
  25. 20 12月, 2018 2 次提交