1. 21 4月, 2017 2 次提交
  2. 19 4月, 2017 5 次提交
    • S
      ftrace: Move the probe function into the tracing directory · ec19b859
      Steven Rostedt (VMware) 提交于
      As nothing outside the tracing directory uses the function probes mechanism,
      I'm moving the prototypes out of the include/linux/ftrace.h and into the
      local kernel/trace/trace.h header. I plan on making them hook to the
      trace_array structure which is local to kernel/trace, and I do not want to
      expose it to the rest of the kernel. This requires that the probe functions
      must also be local to tracing. But luckily nothing else uses them.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ec19b859
    • S
      selftests: ftrace: Add test to test reading of set_ftrace_file · a9064f67
      Steven Rostedt (VMware) 提交于
      The set_ftrace_file lists both functions that are filtered, as well as
      function probes (triggers) that are attached to a function, like traceon or
      stacktrace, etc. The reading of this file is not as trivial as most pseudo
      files are, and there's been various bugs that have appeared in the past
      when there's a mix of probes and functions listed. There's also a difference
      when reading the file using dd with a block size of 1.
      
      This test performs the following:
      
       o Resets set_ftrace_filter
      
       o Makes sure only "#### all functions enabled ####" is listed
      
          (All checks uses cat, and dd with bs=1 and bs=100)
      
       o Adds a traceon trigger to schedule
      
       o Checks if only "#### all function enabled ####" and the trigger is there.
      
       o Adds tracing of schedule
      
       o Checks if only schedule and the trigger is there
      
       o Adds tracing of do_IRQ as well
      
       o Checks if only schedule, do_IRQ and the trigger is there
      
       o Adds a traceon trigger to do_IRQ
      
       o Checks if only schedule, do_IRQ and both triggers are there
      
       o Removes tracing of do_IRQ
      
       o Checks if only schedule and both triggers are there
      
       o Removes tracing of schedule
      
       o Checks if only  "#### all functions enabled ####" and both triggers are there
      
       o Removes the triggers
      
       o Checks if only "#### all functions enabled ####" is there
      
       o Adds tracing of schedule
      
       o Checks if only schedule is there
      
       o Adds tracing of do_IRQ
      
       o Checks if only schedule and do_IRQ are there
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a9064f67
    • S
      selftests: ftrace: Add a test to test function triggers to start and stop tracing · d8b39e1d
      Steven Rostedt (VMware) 提交于
      This adds a test to test the function tiggers traceon and traceoff to make
      sure that it starts and stops tracing when a function is hit.
      
      The test performs the following:
      
       o Enables all events
      
       o Writes schedule:traceoff into set_ftrace_filter
      
       o Makes sure the tigger exists in the file
      
       o Makes sure the trace file no longer grows
      
       o Makes sure that tracing_on is now zero
      
       o Clears the trace file
      
       o Makes sure it's still empty
      
       o Removes the trigger
      
       o Makes sure tracing is still off (tracing_on is zero)
      
       o Writes schedule:traceon into set_ftrace_filter
      
       o Makes sure the trace file is no longer empty
      
       o Makes sure that tracing_on file is set to one
      
       o Removes the trigger
      
       o Makes sure the trigger is no longer there
      
       o Writes schedule:traceoff:3 into set_ftrace_filter
      
       o Makes sure that tracing_on turns off
      
         . Writes 1 into tracing_on
      
         . Makes sure that it turns off 2 more times
      
       o Writes 1 into tracing_on
      
       o Makes sure that tracing_on is still a one
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d8b39e1d
    • S
      selftests: ftrace: Add a selftest to test event enable/disable func trigger · 43bb45da
      Steven Rostedt (VMware) 提交于
      This adds a test to enable and disable trace events via the function
      triggers. It tests enabling and disabling the sched:sched_switch event via
      the the event_enable and event_disable function triggers attached to the
      schedule() kernel function.
      
      The test does the following:
      
       o disable all events
      
       o disables or enables the sched_switch event
      
       o writes schedule:event_enable/disable:sched:sched_switch into set_ftrace_filter
      
       o 5 times it checks to make sure:
      
          . Writes 0/1 into the sched_switch/enable
      
          . Checks that the sched_switch/enable goes back to 1/0
      
       o Resets the events
      
       o writes schedule:event_enable/disable:sched:sched_switch:3 into set_ftrace_filter
      
       o Does a loop of 3 to see that sched_switch/enable file gets updated
      
       o Makes sure the sched_switch/enable stops getting updated
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      43bb45da
    • S
      selftests: ftrace: Add a way to reset triggers in the set_ftrace_filter file · 8e5e19c1
      Steven Rostedt (VMware) 提交于
      Just writing into the set_ftrace_filter file does not reset triggers, even
      though it can reset the function list. Triggers require writing the trigger
      name with a "!" prepended. It's worse that it requires the number if the
      trigger has a count associated to it.
      
      Add a reset_ftrace_filter function to the ftrace self tests to allow for the
      tests to have a generic way to clear them. It also resets any functions that
      are listed in that file as well.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      8e5e19c1
  3. 18 4月, 2017 3 次提交
  4. 17 4月, 2017 1 次提交
    • S
      ftrace: Fix indexing of t_hash_start() from t_next() · fcdc7125
      Steven Rostedt (VMware) 提交于
      t_hash_start() does not increment *pos, where as t_next() must. But when
      t_next() does increment *pos, it must still pass in the original *pos to
      t_hash_start() otherwise it will skip the first instance:
      
       # cd /sys/kernel/debug/tracing
       # echo schedule:traceoff > set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
       # echo call_rcu > set_ftrace_filter
       # cat set_ftrace_filter
      call_rcu
      schedule:traceoff:unlimited
      do_IRQ:traceoff:unlimited
      
      The above called t_hash_start() from t_start() as there was only one
      function (call_rcu), but if we add another function:
      
       # echo xfrm_policy_destroy_rcu >> set_ftrace_filter
       # cat set_ftrace_filter
      call_rcu
      xfrm_policy_destroy_rcu
      do_IRQ:traceoff:unlimited
      
      The "schedule:traceoff" disappears.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      fcdc7125
  5. 16 4月, 2017 1 次提交
    • S
      ftrace: Fix removing of second function probe · acceb72e
      Steven Rostedt (VMware) 提交于
      When two function probes are added to set_ftrace_filter, and then one of
      them is removed, the update to the function locations is not performed, and
      the record keeping of the function states are corrupted, and causes an
      ftrace_bug() to occur.
      
      This is easily reproducable by adding two probes, removing one, and then
      adding it back again.
      
       # cd /sys/kernel/debug/tracing
       # echo schedule:traceoff > set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
       # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
      
      Causes:
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
       Modules linked in: [...]
       CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       Call Trace:
        dump_stack+0x68/0x9f
        __warn+0x111/0x130
        ? trace_irq_work_interrupt+0xa0/0xa0
        warn_slowpath_null+0x1d/0x20
        ftrace_get_addr_curr+0x143/0x220
        ? __fentry__+0x10/0x10
        ftrace_replace_code+0xe3/0x4f0
        ? ftrace_int3_handler+0x90/0x90
        ? printk+0x99/0xb5
        ? 0xffffffff81000000
        ftrace_modify_all_code+0x97/0x110
        arch_ftrace_update_code+0x10/0x20
        ftrace_run_update_code+0x1c/0x60
        ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
        register_ftrace_function_probe+0x4b6/0x590
        ? ftrace_startup+0x310/0x310
        ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
        ? update_stack_state+0x88/0x110
        ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
        ? preempt_count_sub+0x18/0xd0
        ? mutex_lock_nested+0x104/0x800
        ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
        ? __unwind_start+0x1c0/0x1c0
        ? _mutex_lock_nest_lock+0x800/0x800
        ftrace_trace_probe_callback.isra.3+0xc0/0x130
        ? func_set_flag+0xe0/0xe0
        ? __lock_acquire+0x642/0x1790
        ? __might_fault+0x1e/0x20
        ? trace_get_user+0x398/0x470
        ? strcmp+0x35/0x60
        ftrace_trace_onoff_callback+0x48/0x70
        ftrace_regex_write.isra.43.part.44+0x251/0x320
        ? match_records+0x420/0x420
        ftrace_filter_write+0x2b/0x30
        __vfs_write+0xd7/0x330
        ? do_loop_readv_writev+0x120/0x120
        ? locks_remove_posix+0x90/0x2f0
        ? do_lock_file_wait+0x160/0x160
        ? __lock_is_held+0x93/0x100
        ? rcu_read_lock_sched_held+0x5c/0xb0
        ? preempt_count_sub+0x18/0xd0
        ? __sb_start_write+0x10a/0x230
        ? vfs_write+0x222/0x240
        vfs_write+0xef/0x240
        SyS_write+0xab/0x130
        ? SyS_read+0x130/0x130
        ? trace_hardirqs_on_caller+0x182/0x280
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL_64_fastpath+0x18/0xad
       RIP: 0033:0x7fe61c157c30
       RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
       RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
       RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
       R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
        ? trace_hardirqs_off_caller+0xc0/0x110
       ---[ end trace 99fa09b3d9869c2c ]---
       Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)
      
      Cc: stable@vger.kernel.org
      Fixes: 59df055f ("ftrace: trace different functions with a different tracer")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      acceb72e
  6. 11 4月, 2017 6 次提交
  7. 07 4月, 2017 1 次提交
    • S
      ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines · 0598e4f0
      Steven Rostedt (VMware) 提交于
      The function tracer needs to be more careful than other subsystems when it
      comes to freeing data. Especially if that data is actually executable code.
      When a single function is traced, a trampoline can be dynamically allocated
      which is called to jump to the function trace callback. When the callback is
      no longer needed, the dynamic allocated trampoline needs to be freed. This
      is where the issues arise. The dynamically allocated trampoline must not be
      used again. As function tracing can trace all subsystems, including
      subsystems that are used to serialize aspects of freeing (namely RCU), it
      must take extra care when doing the freeing.
      
      Before synchronize_rcu_tasks() was around, there was no way for the function
      tracer to know that nothing was using the dynamically allocated trampoline
      when CONFIG_PREEMPT was enabled. That's because a task could be indefinitely
      preempted while sitting on the trampoline. Now with synchronize_rcu_tasks(),
      it will wait till all tasks have either voluntarily scheduled (not on the
      trampoline) or goes into userspace (not on the trampoline). Then it is safe
      to free the trampoline even with CONFIG_PREEMPT set.
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0598e4f0
  8. 04 4月, 2017 2 次提交
    • A
      tracing/kprobes: expose maxactive for kretprobe in kprobe_events · 696ced4f
      Alban Crequy 提交于
      When a kretprobe is installed on a kernel function, there is a maximum
      limit of how many calls in parallel it can catch (aka "maxactive"). A
      kernel module could call register_kretprobe() and initialize maxactive
      (see example in samples/kprobes/kretprobe_example.c).
      
      But that is not exposed to userspace and it is currently not possible to
      choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
      
      The default maxactive can be as low as 1 on single-core with a
      non-preemptive kernel. This is too low and we need to increase it not
      only for recursive functions, but for functions that sleep or resched.
      
      This patch updates the format of the command that can be written to
      kprobe_events so that maxactive can be optionally specified.
      
      I need this for a bpf program attached to the kretprobe of
      inet_csk_accept, which can sleep for a long time.
      
      This patch includes a basic selftest:
      
      > # ./ftracetest -v  test.d/kprobe/
      > === Ftrace unit tests ===
      > [1] Kprobe dynamic event - adding and removing	[PASS]
      > [2] Kprobe dynamic event - busy event check	[PASS]
      > [3] Kprobe dynamic event with arguments	[PASS]
      > [4] Kprobes event arguments with types	[PASS]
      > [5] Kprobe dynamic event with function tracer	[PASS]
      > [6] Kretprobe dynamic event with arguments	[PASS]
      > [7] Kretprobe dynamic event with maxactive	[PASS]
      >
      > # of passed:  7
      > # of failed:  0
      > # of unresolved:  0
      > # of untested:  0
      > # of unsupported:  0
      > # of xfailed:  0
      > # of undefined(test bug):  0
      
      BugLink: https://github.com/iovisor/bcc/issues/1072
      Link: http://lkml.kernel.org/r/1491215782-15490-1-git-send-email-alban@kinvolk.ioAcked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NAlban Crequy <alban@kinvolk.io>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      696ced4f
    • S
      ftrace: Have init/main.c call ftrace directly to free init memory · b80f0f6c
      Steven Rostedt (VMware) 提交于
      Relying on free_reserved_area() to call ftrace to free init memory proved to
      not be sufficient. The issue is that on x86, when debug_pagealloc is
      enabled, the init memory is not freed, but simply set as not present. Since
      ftrace was uninformed of this, starting function tracing still tries to
      update pages that are not present according to the page tables, causing
      ftrace to bug, as well as killing the kernel itself.
      
      Instead of relying on free_reserved_area(), have init/main.c call ftrace
      directly just before it frees the init memory. Then it needs to use
      __init_begin and __init_end to know where the init memory location is.
      Looking at all archs (and testing what I can), it appears that this should
      work for each of them.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b80f0f6c
  9. 01 4月, 2017 5 次提交
    • S
      ftrace: Create separate t_func_next() to simplify the function / hash logic · 5bd84629
      Steven Rostedt (VMware) 提交于
      I noticed that if I use dd to read the set_ftrace_filter file that the first
      hash command is repeated.
      
       # cd /sys/kernel/debug/tracing
       # echo schedule > set_ftrace_filter
       # echo do_IRQ >> set_ftrace_filter
       # echo schedule:traceoff >> set_ftrace_filter
       # echo do_IRQ:traceoff >> set_ftrace_filter
      
       # cat set_ftrace_filter
       schedule
       do_IRQ
       schedule:traceoff:unlimited
       do_IRQ:traceoff:unlimited
      
       # dd if=set_ftrace_filter bs=1
       schedule
       do_IRQ
       schedule:traceoff:unlimited
       schedule:traceoff:unlimited
       do_IRQ:traceoff:unlimited
       98+0 records in
       98+0 records out
       98 bytes copied, 0.00265011 s, 37.0 kB/s
      
      This is due to the way t_start() calls t_next() as well as the seq_file
      calls t_next() and the state is slightly different between the two. Namely,
      t_start() will call t_next() with a local "pos" variable.
      
      By separating out the function listing from t_next() into its own function,
      we can have better control of outputting the functions and the hash of
      triggers. This simplifies the code.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      5bd84629
    • S
      ftrace: Update func_pos in t_start() when all functions are enabled · 43ff926a
      Steven Rostedt (VMware) 提交于
      If all functions are enabled, there's a comment displayed in the file to
      denote that:
      
        # cd /sys/kernel/debug/tracing
        # cat set_ftrace_filter
       #### all functions enabled ####
      
      If a function trigger is set, those are displayed as well:
      
        # echo schedule:traceoff >> /debug/tracing/set_ftrace_filter
        # cat set_ftrace_filter
       #### all functions enabled ####
       schedule:traceoff:unlimited
      
      But if you read that file with dd, the output can change:
      
        # dd if=/debug/tracing/set_ftrace_filter bs=1
       #### all functions enabled ####
       32+0 records in
       32+0 records out
       32 bytes copied, 7.0237e-05 s, 456 kB/s
      
      This is because the "pos" variable is updated for the comment, but func_pos
      is not. "func_pos" is used by the triggers (or hashes) to know how many
      functions were printed and it bases its index from the pos - func_pos.
      func_pos should be 1 to count for the comment printed. But since it is not,
      t_hash_start() thinks that one trigger was already printed.
      
      The cat gets to t_hash_start() via t_next() and not t_start() which updates
      both pos and func_pos.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      43ff926a
    • S
      ftrace: Return NULL at end of t_start() instead of calling t_hash_start() · 2d71d989
      Steven Rostedt (VMware) 提交于
      The loop in t_start() of calling t_next() will call t_hash_start() if the
      pos is beyond the functions and enters the hash items. There's no reason to
      check if p is NULL and call t_hash_start(), as that would be redundant.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2d71d989
    • S
      ftrace: Assign iter->hash to filter or notrace hashes on seq read · c20489da
      Steven Rostedt (VMware) 提交于
      Instead of testing if the hash to use is the filter_hash or the notrace_hash
      at each iteration, do the test at open, and set the iter->hash to point to
      the corresponding filter or notrace hash. Then use that directly instead of
      testing which hash needs to be used each iteration.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      c20489da
    • S
      ftrace: Clean up __seq_open_private() return check · c1bc5919
      Steven Rostedt (VMware) 提交于
      The return status check of __seq_open_private() is rather strange:
      
      	iter = __seq_open_private();
      	if (iter) {
      		/* do stuff */
      	}
      
      	return iter ? 0 : -ENOMEM;
      
      It makes much more sense to do the return of failure right away:
      
      	iter = __seq_open_private();
      	if (!iter)
      		return -ENOMEM;
      
      	/* do stuff */
      
      	return 0;
      
      This clean up will make updates to this code a bit nicer.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      c1bc5919
  10. 29 3月, 2017 1 次提交
    • S
      ftrace/x86: Do no run CPU sync when there is only one CPU online · 2b87965a
      Steven Rostedt (VMware) 提交于
      Moving enabling of function tracing to early boot, even before scheduling is
      enabled, means that it is not safe to enable interrupts. When function
      tracing was enabled at boot up, it use to happen after scheduling and the
      other CPUs were brought up. That required running a sync across all CPUs
      when modifying the function hook locations in the code. To do the
      synchronization, interrupts had to be enabled. Now function tracing can be
      started before the other CPUs are brought up, and enabling interrupts in
      that case is dangerous. As only tho boot CPU is active, there is no reason
      to run the synchronization. If the online CPU count is one, do not bother
      doing the synchronization. This removes the need to enable interrupts.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2b87965a
  11. 25 3月, 2017 6 次提交
    • S
      tracing: Move trace_handle_return() out of line · af0009fc
      Steven Rostedt (VMware) 提交于
      Currently trace_handle_return() looks like this:
      
       static inline enum print_line_t trace_handle_return(struct trace_seq *s)
       {
              return trace_seq_has_overflowed(s) ?
                      TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
       }
      
      Where trace_seq_overflowed(s) is:
      
       static inline bool trace_seq_has_overflowed(struct trace_seq *s)
       {
      	return s->full || seq_buf_has_overflowed(&s->seq);
       }
      
      And seq_buf_has_overflowed(&s->seq) is:
      
       static inline bool
       seq_buf_has_overflowed(struct seq_buf *s)
       {
      	return s->len > s->size;
       }
      
      Making trace_handle_return() into:
      
       return (s->full || (s->seq->len > s->seq->size)) ?
                 TRACE_TYPE_PARTIAL_LINE :
                 TRACE_TYPE_HANDLED;
      
      One would think this is not an issue to keep as an inline. But because this
      is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
      the system. Taking a look at a single tracepoint x86_irq_vector (was the
      first one I randomly chosen). As trace_handle_return is used in the
      TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
      trace_raw_output_x86_irq_vector and do a diff:
      
      - is the original
      + is the out-of-line code
      
      I removed identical lines that were different just due to different
      addresses.
      
      --- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
      +++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
      @@ -6,27 +6,23 @@
              53                      push   %rbx
              48 89 fb                mov    %rdi,%rbx
              4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
              e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
              83 f8 01                cmp    $0x1,%eax
              74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
              5b                      pop    %rbx
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
              41 8b 54 24 08          mov    0x8(%r12),%edx
      -       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
      +       48 81 c3 98 10 00 00    add    $0x1098,%rbx
      -       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
      +       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
      -       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>
      
       === here's the start of the main difference ===
      
      +       48 89 df                mov    %rbx,%rdi
      +       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
      -       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
      -       31 c0                   xor    %eax,%eax
      -       85 d2                   test   %edx,%edx
      -       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
      -       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
      -       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
      -       0f 93 c0                setae  %al
      +       48 89 df                mov    %rbx,%rdi
      +       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
              5b                      pop    %rbx
      -       0f b6 c0                movzbl %al,%eax
      
       === end ===
      
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
      
      If you notice, the original has 22 bytes of text more than the out of line
      version. As this is for every TRACE_EVENT() defined in the system, this can
      become quite large.
      
         text	   data	    bss	    dec	    hex	filename
      8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
      8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle
      
      This change has a total of 8580 bytes in savings.
      
       $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      324
      
      That's 324 tracepoints. But this does not include modules (which contain
      many more tracepoints). For an allyesconfig build:
      
       $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      1401
      
      That's 1401 tracepoints giving us:
      
         text    data     bss     dec     hex filename
      137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
      137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle
      
      92920 bytes in savings!!!
      
      Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      af0009fc
    • S
      ftrace: Allow for function tracing to record init functions on boot up · 42c269c8
      Steven Rostedt (VMware) 提交于
      Adding a hook into free_reserve_area() that informs ftrace that boot up init
      text is being free, lets ftrace safely remove those init functions from its
      records, which keeps ftrace from trying to modify text that no longer
      exists.
      
      Note, this still does not allow for tracing .init text of modules, as
      modules require different work for freeing its init code.
      
      Link: http://lkml.kernel.org/r/1488502497.7212.24.camel@linux.intel.com
      
      Cc: linux-mm@kvack.org
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Requested-by: NTodd Brandt <todd.e.brandt@linux.intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      42c269c8
    • S
      ftrace: Have function tracing start in early boot up · dbeafd0d
      Steven Rostedt (VMware) 提交于
      Register the function tracer right after the tracing buffers are initialized
      in early boot up. This will allow function tracing to begin early if it is
      enabled via the kernel command line.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      dbeafd0d
    • S
      tracing: Postpone tracer start-up tests till the system is more robust · 9afecfbb
      Steven Rostedt (VMware) 提交于
      As tracing can now be enabled very early in boot up, even before some
      critical system services (like scheduling), do not run the tracer selftests
      until after early_initcall() is performed. If a tracer is registered before
      such time, it is saved off in a list and the test is run when the system is
      able to handle more diverse functions.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      9afecfbb
    • S
      ftrace: Move ftrace_init() to right after memory initialization · f631718d
      Steven Rostedt (VMware) 提交于
      Initialize the ftrace records immediately after memory initialization, as
      that is all that is required for the records to be created. This will allow
      for future work to get function tracing started earlier in the boot process.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      f631718d
    • S
      tracing: Split tracing initialization into two for early initialization · e725c731
      Steven Rostedt (VMware) 提交于
      Create an early_trace_init() function that will initialize the buffers and
      allow for ealier use of trace_printk(). This will also allow for future work
      to have function tracing start earlier at boot up.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      e725c731
  12. 20 3月, 2017 6 次提交
    • L
      Linux 4.11-rc3 · 97da3854
      Linus Torvalds 提交于
      97da3854
    • L
      mm/swap: don't BUG_ON() due to uninitialized swap slot cache · 452b94b8
      Linus Torvalds 提交于
      This BUG_ON() triggered for me once at shutdown, and I don't see a
      reason for the check.  The code correctly checks whether the swap slot
      cache is usable or not, so an uninitialized swap slot cache is not
      actually problematic afaik.
      
      I've temporarily just switched the BUG_ON() to a WARN_ON_ONCE(), since
      I'm not sure why that seemingly pointless check was there.  I suspect
      the real fix is to just remove it entirely, but for now we'll warn about
      it but not bring the machine down.
      
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      452b94b8
    • L
      Merge tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a07a6e41
      Linus Torvalds 提交于
      Pull more powerpc fixes from Michael Ellerman:
       "A couple of minor powerpc fixes for 4.11:
      
         - wire up statx() syscall
      
         - don't print a warning on memory hotplug when HPT resizing isn't
           available
      
        Thanks to: David Gibson, Chandan Rajendra"
      
      * tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/pseries: Don't give a warning when HPT resizing isn't available
        powerpc: Wire up statx() syscall
      a07a6e41
    • L
      Merge branch 'parisc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 4571bc5a
      Linus Torvalds 提交于
      Pull parisc fixes from Helge Deller:
      
       - Mikulas Patocka added support for R_PARISC_SECREL32 relocations in
         modules with CONFIG_MODVERSIONS.
      
       - Dave Anglin optimized the cache flushing for vmap ranges.
      
       - Arvind Yadav provided a fix for a potential NULL pointer dereference
         in the parisc perf code (and some code cleanups).
      
       - I wired up the new statx system call, fixed some compiler warnings
         with the access_ok() macro and fixed shutdown code to really halt a
         system at shutdown instead of crashing & rebooting.
      
      * 'parisc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix system shutdown halt
        parisc: perf: Fix potential NULL pointer dereference
        parisc: Avoid compiler warnings with access_ok()
        parisc: Wire up statx system call
        parisc: Optimize flush_kernel_vmap_range and invalidate_kernel_vmap_range
        parisc: support R_PARISC_SECREL32 relocation in modules
      4571bc5a
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 8aa34172
      Linus Torvalds 提交于
      Pull SCSI target fixes from Nicholas Bellinger:
       "The bulk of the changes are in qla2xxx target driver code to address
        various issues found during Cavium/QLogic's internal testing (stable
        CC's included), along with a few other stability and smaller
        miscellaneous improvements.
      
        There are also a couple of different patch sets from Mike Christie,
        which have been a result of his work to use target-core ALUA logic
        together with tcm-user backend driver.
      
        Finally, a patch to address some long standing issues with
        pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices,
        which will make folks using physical (or virtual) magnetic tape happy"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (28 commits)
        qla2xxx: Update driver version to 9.00.00.00-k
        qla2xxx: Fix delayed response to command for loop mode/direct connect.
        qla2xxx: Change scsi host lookup method.
        qla2xxx: Add DebugFS node to display Port Database
        qla2xxx: Use IOCB interface to submit non-critical MBX.
        qla2xxx: Add async new target notification
        qla2xxx: Export DIF stats via debugfs
        qla2xxx: Improve T10-DIF/PI handling in driver.
        qla2xxx: Allow relogin to proceed if remote login did not finish
        qla2xxx: Fix sess_lock & hardware_lock lock order problem.
        qla2xxx: Fix inadequate lock protection for ABTS.
        qla2xxx: Fix request queue corruption.
        qla2xxx: Fix memory leak for abts processing
        qla2xxx: Allow vref count to timeout on vport delete.
        tcmu: Convert cmd_time_out into backend device attribute
        tcmu: make cmd timeout configurable
        tcmu: add helper to check if dev was configured
        target: fix race during implicit transition work flushes
        target: allow userspace to set state to transitioning
        target: fix ALUA transition timeout handling
        ...
      8aa34172
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 1b8df619
      Linus Torvalds 提交于
      Pull device-dax fixes from Dan Williams:
       "The device-dax driver was not being careful to handle falling back to
        smaller fault-granularity sizes.
      
        The driver already fails fault attempts that are smaller than the
        device's alignment, but it also needs to handle the cases where a
        larger page mapping could be established. For simplicity of the
        immediate fix the implementation just signals VM_FAULT_FALLBACK until
        fault-size == device-alignment.
      
        One fix is for -stable to address pmd-to-pte fallback from the
        original implementation, another fix is for the new (introduced in
        4.11-rc1) pud-to-pmd regression, and a typo fix comes along for the
        ride.
      
        These have received a build success notification from the kbuild
        robot"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        device-dax: fix debug output typo
        device-dax: fix pud fault fallback handling
        device-dax: fix pmd/pte fault fallback handling
      1b8df619
  13. 19 3月, 2017 1 次提交