• S
    tracing: Move trace_handle_return() out of line · af0009fc
    Steven Rostedt (VMware) 提交于
    Currently trace_handle_return() looks like this:
    
     static inline enum print_line_t trace_handle_return(struct trace_seq *s)
     {
            return trace_seq_has_overflowed(s) ?
                    TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
     }
    
    Where trace_seq_overflowed(s) is:
    
     static inline bool trace_seq_has_overflowed(struct trace_seq *s)
     {
    	return s->full || seq_buf_has_overflowed(&s->seq);
     }
    
    And seq_buf_has_overflowed(&s->seq) is:
    
     static inline bool
     seq_buf_has_overflowed(struct seq_buf *s)
     {
    	return s->len > s->size;
     }
    
    Making trace_handle_return() into:
    
     return (s->full || (s->seq->len > s->seq->size)) ?
               TRACE_TYPE_PARTIAL_LINE :
               TRACE_TYPE_HANDLED;
    
    One would think this is not an issue to keep as an inline. But because this
    is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
    the system. Taking a look at a single tracepoint x86_irq_vector (was the
    first one I randomly chosen). As trace_handle_return is used in the
    TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
    trace_raw_output_x86_irq_vector and do a diff:
    
    - is the original
    + is the out-of-line code
    
    I removed identical lines that were different just due to different
    addresses.
    
    --- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
    +++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
    @@ -6,27 +6,23 @@
            53                      push   %rbx
            48 89 fb                mov    %rdi,%rbx
            4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
            e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
            83 f8 01                cmp    $0x1,%eax
            74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
            5b                      pop    %rbx
            41 5c                   pop    %r12
            5d                      pop    %rbp
            c3                      retq
            41 8b 54 24 08          mov    0x8(%r12),%edx
    -       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
    +       48 81 c3 98 10 00 00    add    $0x1098,%rbx
    -       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
    +       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
    -       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>
    
     === here's the start of the main difference ===
    
    +       48 89 df                mov    %rbx,%rdi
    +       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
    -       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
    -       31 c0                   xor    %eax,%eax
    -       85 d2                   test   %edx,%edx
    -       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
    -       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
    -       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
    -       0f 93 c0                setae  %al
    +       48 89 df                mov    %rbx,%rdi
    +       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
            5b                      pop    %rbx
    -       0f b6 c0                movzbl %al,%eax
    
     === end ===
    
            41 5c                   pop    %r12
            5d                      pop    %rbp
            c3                      retq
    
    If you notice, the original has 22 bytes of text more than the out of line
    version. As this is for every TRACE_EVENT() defined in the system, this can
    become quite large.
    
       text	   data	    bss	    dec	    hex	filename
    8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
    8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle
    
    This change has a total of 8580 bytes in savings.
    
     $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
    324
    
    That's 324 tracepoints. But this does not include modules (which contain
    many more tracepoints). For an allyesconfig build:
    
     $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
    1401
    
    That's 1401 tracepoints giving us:
    
       text    data     bss     dec     hex filename
    137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
    137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle
    
    92920 bytes in savings!!!
    
    Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
    Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
    af0009fc
trace.c 194.4 KB