1. 12 4月, 2018 1 次提交
  2. 06 4月, 2018 2 次提交
  3. 05 4月, 2018 2 次提交
  4. 03 4月, 2018 2 次提交
  5. 31 3月, 2018 6 次提交
    • A
      selftests/bpf: Selftest for sys_bind post-hooks. · 1d436885
      Andrey Ignatov 提交于
      Add selftest for attach types `BPF_CGROUP_INET4_POST_BIND` and
      `BPF_CGROUP_INET6_POST_BIND`.
      
      The main things tested are:
      * prog load behaves as expected (valid/invalid accesses in prog);
      * prog attach behaves as expected (load- vs attach-time attach types);
      * `BPF_CGROUP_INET_SOCK_CREATE` can be attached in a backward compatible
        way;
      * post-hooks return expected result and errno.
      
      Example:
        # ./test_sock
        Test case: bind4 load with invalid access: src_ip6 .. [PASS]
        Test case: bind4 load with invalid access: mark .. [PASS]
        Test case: bind6 load with invalid access: src_ip4 .. [PASS]
        Test case: sock_create load with invalid access: src_port .. [PASS]
        Test case: sock_create load w/o expected_attach_type (compat mode) ..
        [PASS]
        Test case: sock_create load w/ expected_attach_type .. [PASS]
        Test case: attach type mismatch bind4 vs bind6 .. [PASS]
        Test case: attach type mismatch bind6 vs bind4 .. [PASS]
        Test case: attach type mismatch default vs bind4 .. [PASS]
        Test case: attach type mismatch bind6 vs sock_create .. [PASS]
        Test case: bind4 reject all .. [PASS]
        Test case: bind6 reject all .. [PASS]
        Test case: bind6 deny specific IP & port .. [PASS]
        Test case: bind4 allow specific IP & port .. [PASS]
        Test case: bind4 allow all .. [PASS]
        Test case: bind6 allow all .. [PASS]
        Summary: 16 PASSED, 0 FAILED
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1d436885
    • A
      selftests/bpf: Selftest for sys_connect hooks · 622adafb
      Andrey Ignatov 提交于
      Add selftest for BPF_CGROUP_INET4_CONNECT and BPF_CGROUP_INET6_CONNECT
      attach types.
      
      Try to connect(2) to specified IP:port and test that:
      * remote IP:port pair is overridden;
      * local end of connection is bound to specified IP.
      
      All combinations of IPv4/IPv6 and TCP/UDP are tested.
      
      Example:
        # tcpdump -pn -i lo -w connect.pcap 2>/dev/null &
        [1] 478
        # strace -qqf -e connect -o connect.trace ./test_sock_addr.sh
        Wait for testing IPv4/IPv6 to become available ... OK
        Load bind4 with invalid type (can pollute stderr) ... REJECTED
        Load bind4 with valid type ... OK
        Attach bind4 with invalid type ... REJECTED
        Attach bind4 with valid type ... OK
        Load connect4 with invalid type (can pollute stderr) libbpf: load bpf \
          program failed: Permission denied
        libbpf: -- BEGIN DUMP LOG ---
        libbpf:
        0: (b7) r2 = 23569
        1: (63) *(u32 *)(r1 +24) = r2
        2: (b7) r2 = 16777343
        3: (63) *(u32 *)(r1 +4) = r2
        invalid bpf_context access off=4 size=4
        [ 1518.404609] random: crng init done
      
        libbpf: -- END LOG --
        libbpf: failed to load program 'cgroup/connect4'
        libbpf: failed to load object './connect4_prog.o'
        ... REJECTED
        Load connect4 with valid type ... OK
        Attach connect4 with invalid type ... REJECTED
        Attach connect4 with valid type ... OK
        Test case #1 (IPv4/TCP):
                Requested: bind(192.168.1.254, 4040) ..
                   Actual: bind(127.0.0.1, 4444)
                Requested: connect(192.168.1.254, 4040) from (*, *) ..
                   Actual: connect(127.0.0.1, 4444) from (127.0.0.4, 56068)
        Test case #2 (IPv4/UDP):
                Requested: bind(192.168.1.254, 4040) ..
                   Actual: bind(127.0.0.1, 4444)
                Requested: connect(192.168.1.254, 4040) from (*, *) ..
                   Actual: connect(127.0.0.1, 4444) from (127.0.0.4, 56447)
        Load bind6 with invalid type (can pollute stderr) ... REJECTED
        Load bind6 with valid type ... OK
        Attach bind6 with invalid type ... REJECTED
        Attach bind6 with valid type ... OK
        Load connect6 with invalid type (can pollute stderr) libbpf: load bpf \
          program failed: Permission denied
        libbpf: -- BEGIN DUMP LOG ---
        libbpf:
        0: (b7) r6 = 0
        1: (63) *(u32 *)(r1 +12) = r6
        invalid bpf_context access off=12 size=4
      
        libbpf: -- END LOG --
        libbpf: failed to load program 'cgroup/connect6'
        libbpf: failed to load object './connect6_prog.o'
        ... REJECTED
        Load connect6 with valid type ... OK
        Attach connect6 with invalid type ... REJECTED
        Attach connect6 with valid type ... OK
        Test case #3 (IPv6/TCP):
                Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
                   Actual: bind(::1, 6666)
                Requested: connect(face:b00c:1234:5678::abcd, 6060) from (*, *)
                   Actual: connect(::1, 6666) from (::6, 37458)
        Test case #4 (IPv6/UDP):
                Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
                   Actual: bind(::1, 6666)
                Requested: connect(face:b00c:1234:5678::abcd, 6060) from (*, *)
                   Actual: connect(::1, 6666) from (::6, 39315)
        ### SUCCESS
        # egrep 'connect\(.*AF_INET' connect.trace | \
        > egrep -vw 'htons\(1025\)' | fold -b -s -w 72
        502   connect(7, {sa_family=AF_INET, sin_port=htons(4040),
        sin_addr=inet_addr("192.168.1.254")}, 128) = 0
        502   connect(8, {sa_family=AF_INET, sin_port=htons(4040),
        sin_addr=inet_addr("192.168.1.254")}, 128) = 0
        502   connect(9, {sa_family=AF_INET6, sin6_port=htons(6060),
        inet_pton(AF_INET6, "face:b00c:1234:5678::abcd", &sin6_addr),
        sin6_flowinfo=0, sin6_scope_id=0}, 128) = 0
        502   connect(10, {sa_family=AF_INET6, sin6_port=htons(6060),
        inet_pton(AF_INET6, "face:b00c:1234:5678::abcd", &sin6_addr),
        sin6_flowinfo=0, sin6_scope_id=0}, 128) = 0
        # fg
        tcpdump -pn -i lo -w connect.pcap 2> /dev/null
        # tcpdump -r connect.pcap -n tcp | cut -c 1-72
        reading from file connect.pcap, link-type EN10MB (Ethernet)
        17:57:40.383533 IP 127.0.0.4.56068 > 127.0.0.1.4444: Flags [S], seq 1333
        17:57:40.383566 IP 127.0.0.1.4444 > 127.0.0.4.56068: Flags [S.], seq 112
        17:57:40.383589 IP 127.0.0.4.56068 > 127.0.0.1.4444: Flags [.], ack 1, w
        17:57:40.384578 IP 127.0.0.1.4444 > 127.0.0.4.56068: Flags [R.], seq 1,
        17:57:40.403327 IP6 ::6.37458 > ::1.6666: Flags [S], seq 406513443, win
        17:57:40.403357 IP6 ::1.6666 > ::6.37458: Flags [S.], seq 2448389240, ac
        17:57:40.403376 IP6 ::6.37458 > ::1.6666: Flags [.], ack 1, win 342, opt
        17:57:40.404263 IP6 ::1.6666 > ::6.37458: Flags [R.], seq 1, ack 1, win
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      622adafb
    • A
      selftests/bpf: Selftest for sys_bind hooks · e50b0a6f
      Andrey Ignatov 提交于
      Add selftest to work with bpf_sock_addr context from
      `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` programs.
      
      Try to bind(2) on IP:port and apply:
      * loads to make sure context can be read correctly, including narrow
        loads (byte, half) for IP and full-size loads (word) for all fields;
      * stores to those fields allowed by verifier.
      
      All combination from IPv4/IPv6 and TCP/UDP are tested.
      
      Both scenarios are tested:
      * valid programs can be loaded and attached;
      * invalid programs can be neither loaded nor attached.
      
      Test passes when expected data can be read from context in the
      BPF-program, and after the call to bind(2) socket is bound to IP:port
      pair that was written by BPF-program to the context.
      
      Example:
        # ./test_sock_addr
        Attached bind4 program.
        Test case #1 (IPv4/TCP):
                Requested: bind(192.168.1.254, 4040) ..
                   Actual: bind(127.0.0.1, 4444)
        Test case #2 (IPv4/UDP):
                Requested: bind(192.168.1.254, 4040) ..
                   Actual: bind(127.0.0.1, 4444)
        Attached bind6 program.
        Test case #3 (IPv6/TCP):
                Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
                   Actual: bind(::1, 6666)
        Test case #4 (IPv6/UDP):
                Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
                   Actual: bind(::1, 6666)
        ### SUCCESS
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e50b0a6f
    • A
      libbpf: Support expected_attach_type at prog load · d7be143b
      Andrey Ignatov 提交于
      Support setting `expected_attach_type` at prog load time in both
      `bpf/bpf.h` and `bpf/libbpf.h`.
      
      Since both headers already have API to load programs, new functions are
      added not to break backward compatibility for existing ones:
      * `bpf_load_program_xattr()` is added to `bpf/bpf.h`;
      * `bpf_prog_load_xattr()` is added to `bpf/libbpf.h`.
      
      Both new functions accept structures, `struct bpf_load_program_attr` and
      `struct bpf_prog_load_attr` correspondingly, where new fields can be
      added in the future w/o changing the API.
      
      Standard `_xattr` suffix is used to name the new API functions.
      
      Since `bpf_load_program_name()` is not used as heavily as
      `bpf_load_program()`, it was removed in favor of more generic
      `bpf_load_program_xattr()`.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d7be143b
    • L
      tc-testing: Add newline when writing test case files · c0b6edef
      Lucas Bates 提交于
      When using the -i feature to generate random ID numbers for test
      cases in tdc, the function that writes the JSON to file doesn't
      add a newline character to the end of the file, so we have to
      add our own.
      Signed-off-by: NLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0b6edef
    • R
      tc-testing: add connmark action tests · 1dad0f9f
      Roman Mashak 提交于
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dad0f9f
  6. 29 3月, 2018 5 次提交
  7. 28 3月, 2018 8 次提交
  8. 27 3月, 2018 4 次提交
    • L
      tc-testing: Correct compound statements for namespace execution · cd464197
      Lucas Bates 提交于
      If tdc is executing test cases inside a namespace, only the
      first command in a compound statement will be executed inside
      the namespace by tdc. As a result, the subsequent commands
      are not executed inside the namespace and the test will fail.
      
      Example:
      
      for i in {x..y}; do args="foo"; done && tc actions add $args
      
      The namespace execution feature will prepend 'ip netns exec'
      to the command:
      
      ip netns exec tcut for i in {x..y}; do args="foo"; done && \
        tc actions add $args
      
      So the actual tc command is not parsed by the shell as being
      part of the namespace execution.
      
      Enclosing these compound statements inside a bash invocation
      with proper escape characters resolves the problem by creating
      a subshell inside the namespace.
      Signed-off-by: NLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd464197
    • F
      tools/thermal: tmon: fix for segfault · 6c59f64b
      Frank Asseg 提交于
      Fixes a segfault occurring when e.g. <TAB> is pressed multiple times in the
      ncurses tmon application. The segfault is caused by incrementing
      cur_thermal_record in the main function without checking if it's value reached
      NR_THERMAL_RECORD immediately. Since the boundary check only occurred in
      update_thermal_data a race condition existed, which lead to an attempted read
      beyond the last element of the trec array.
      
      The fix was implemented by moving the cur_thermal_record incrementation to the
      update_thermal_data function using a temporary variable on which the boundary
      condition is checked before updating cur_thread_record, so that the variable is
      never incremented beyond the trec array's boundary.
      
      It seems the segfault does not occur on every machine: On a HP EliteBook G4 the
      segfault happens, while it does not happen on a Thinkpad T540p.
      Signed-off-by: NFrank Asseg <frank.asseg@objecthunter.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      6c59f64b
    • J
      objtool: Add Clang support · 3c1f0583
      Josh Poimboeuf 提交于
      Since the ORC unwinder was made the default on x86_64, Clang-built
      defconfig kernels have triggered some new objtool warnings:
      
        drivers/gpu/drm/i915/i915_gpu_error.o: warning: objtool: i915_error_printf()+0x6c: return with modified stack frame
        drivers/gpu/drm/i915/intel_display.o: warning: objtool: pipe_config_err()+0xa6: return with modified stack frame
      
      The problem is that objtool has never seen clang-built binaries before.
      
      Shockingly enough, objtool is apparently able to follow the code flow
      mostly fine, except for one instruction sequence.  Instead of a LEAVE
      instruction, clang restores RSP and RBP the long way:
      
         67c:   48 89 ec                mov    %rbp,%rsp
         67f:   5d                      pop    %rbp
      
      Teach objtool about this new code sequence.
      Reported-and-test-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/fce88ce81c356eedcae7f00ed349cfaddb3363cc.1521741586.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3c1f0583
    • M
      selftests: Print the test we're running to /dev/kmsg · 88893cf7
      Michael Ellerman 提交于
      Some tests cause the kernel to print things to the kernel log
      buffer (ie. printk), in particular oops and warnings etc. However when
      running all the tests in succession it's not always obvious which
      test(s) caused the kernel to print something.
      
      We can narrow it down by printing which test directory we're running
      in to /dev/kmsg, if it's writable.
      
      Example output:
      
        [  170.149149] kselftest: Running tests in powerpc
        [  305.300132] kworker/dying (71) used greatest stack depth: 7776 bytes
                       left
        [  808.915456] kselftest: Running tests in pstore
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NShuah Khan <shuahkh@osg.samsung.com>
      88893cf7
  9. 26 3月, 2018 2 次提交
  10. 25 3月, 2018 1 次提交
  11. 24 3月, 2018 7 次提交
    • A
      perf annotate: Use absolute addresses to calculate jump target offsets · 980b68ec
      Arnaldo Carvalho de Melo 提交于
      These types of jumps were confusing the annotate browser:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
        Percent│ffffffff81a00020:   swapgs
        <SNIP>
               │ffffffff81a00128: ↓ jae    ffffffff81a00139 <syscall_return_via_sysret+0x53>
        <SNIP>
               │ffffffff81a00155: → jmpq   *0x825d2d(%rip)   # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      I.e. the syscall_return_via_sysret function is actually "inside" the
      entry_SYSCALL_64 function, and the offsets in jumps like these (+0x53)
      are relative to syscall_return_via_sysret, not to syscall_return_via_sysret.
      
      Or this may be some artifact in how the assembler marks the start and
      end of a function and how this ends up in the ELF symtab for vmlinux,
      i.e. syscall_return_via_sysret() isn't "inside" entry_SYSCALL_64, but
      just right after it.
      
      From readelf -sw vmlinux:
      
       80267: ffffffff81a00020   315 NOTYPE  GLOBAL DEFAULT    1 entry_SYSCALL_64
         316: ffffffff81a000e6     0 NOTYPE  LOCAL  DEFAULT    1 syscall_return_via_sysret
      
       0xffffffff81a00020 + 315 > 0xffffffff81a000e6
      
      So instead of looking for offsets after that last '+' sign, calculate
      offsets for jump target addresses that are inside the function being
      disassembled from the absolute address, 0xffffffff81a00139 in this case,
      subtracting from it the objdump address for the start of the function
      being disassembled, entry_SYSCALL_64() in this case.
      
      So, before this patch:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
      Percent│       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↑ jmp    6c
             │       mov    %cr3,%rdi
             │     ↑ jmp    62
             │       mov    %rdi,%rax
             │       and    $0x7ff,%rdi
             │       bt     %rdi,%gs:0x2219a
             │     ↑ jae    53
             │       btr    %rdi,%gs:0x2219a
             │       mov    %rax,%rdi
             │     ↑ jmp    5b
      
      After:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
        0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode
             │       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↓ jmp    132
             │       mov    %cr3,%rdi
             │    ┌──jmp    128
             │    │  mov    %rdi,%rax
             │    │  and    $0x7ff,%rdi
             │    │  bt     %rdi,%gs:0x2219a
             │    │↓ jae    119
             │    │  btr    %rdi,%gs:0x2219a
             │    │  mov    %rax,%rdi
             │    │↓ jmp    121
             │119:│  mov    %rax,%rdi
             │    │  bts    $0x3f,%rdi
             │121:│  or     $0x800,%rdi
             │128:└─→or     $0x1000,%rdi
             │       mov    %rdi,%cr3
             │132:   pop    %rax
             │       pop    %rdi
             │       pop    %rsp
             │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      With those at least navigating to the right destination, an improvement
      for these cases seems to be to be to somehow mark those inner functions,
      which in this case could be:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
             │syscall_return_via_sysret:
             │       pop    %r15
             │       pop    %r14
             │       pop    %r13
             │       pop    %r12
             │       pop    %rbp
             │       pop    %rbx
             │       pop    %rsi
             │       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↓ jmp    132
             │       mov    %cr3,%rdi
             │    ┌──jmp    128
             │    │  mov    %rdi,%rax
             │    │  and    $0x7ff,%rdi
             │    │  bt     %rdi,%gs:0x2219a
             │    │↓ jae    119
             │    │  btr    %rdi,%gs:0x2219a
             │    │  mov    %rax,%rdi
             │    │↓ jmp    121
             │119:│  mov    %rax,%rdi
             │    │  bts    $0x3f,%rdi
             │121:│  or     $0x800,%rdi
             │128:└─→or     $0x1000,%rdi
             │       mov    %rdi,%cr3
             │132:   pop    %rax
             │       pop    %rdi
             │       pop    %rsp
             │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      This all gets much better viewed if one uses 'perf report --ignore-vmlinux'
      forcing the usage of /proc/kcore + /proc/kallsyms, when the above
      actually gets down to:
      
        # perf report --ignore-vmlinux
        ## do '/64', will show the function names containing '64',
        ## navigate to /entry_SYSCALL_64_after_hwframe.annotation,
        ## press 'A' to annotate, then 'P' to print that annotation
        ## to a file
        ## From another xterm (or see on screen, this 'P' thing is for
        ## getting rid of those right side scroll bars/spaces):
        # cat /entry_SYSCALL_64_after_hwframe.annotation
        entry_SYSCALL_64_after_hwframe() /proc/kcore
        Event: cycles:ppp
      
        Percent
                    Disassembly of section load0:
      
                    ffffffff9aa00044 <load0>:
         11.97        push   %rax
          4.85        push   %rdi
                      push   %rsi
          2.59        push   %rdx
          2.27        push   %rcx
          0.32        pushq  $0xffffffffffffffda
          1.29        push   %r8
                      xor    %r8d,%r8d
          1.62        push   %r9
          0.65        xor    %r9d,%r9d
          1.62        push   %r10
                      xor    %r10d,%r10d
          5.50        push   %r11
                      xor    %r11d,%r11d
          3.56        push   %rbx
                      xor    %ebx,%ebx
          4.21        push   %rbp
                      xor    %ebp,%ebp
          2.59        push   %r12
          0.97        xor    %r12d,%r12d
          3.24        push   %r13
                      xor    %r13d,%r13d
          2.27        push   %r14
                      xor    %r14d,%r14d
          4.21        push   %r15
                      xor    %r15d,%r15d
          0.97        mov    %rsp,%rdi
          5.50      → callq  do_syscall_64
         14.56        mov    0x58(%rsp),%rcx
          7.44        mov    0x80(%rsp),%r11
          0.32        cmp    %rcx,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        shl    $0x10,%rcx
          0.32        sar    $0x10,%rcx
          3.24        cmp    %rcx,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          2.27        cmpq   $0x33,0x88(%rsp)
          1.29      → jne    swapgs_restore_regs_and_return_to_usermode
                      mov    0x30(%rsp),%r11
          8.74        cmp    %r11,0x90(%rsp)
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        test   $0x10100,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        cmpq   $0x2b,0xa0(%rsp)
          0.65      → jne    swapgs_restore_regs_and_return_to_usermode
      
      I.e. using kallsyms makes the function start/end be done differently
      than using what is in the vmlinux ELF symtab and actually the hits
      goes to entry_SYSCALL_64_after_hwframe, which is a GLOBAL() after the
      start of entry_SYSCALL_64:
      
        ENTRY(entry_SYSCALL_64)
                UNWIND_HINT_EMPTY
        <SNIP>
                pushq   $__USER_CS                      /* pt_regs->cs */
                pushq   %rcx                            /* pt_regs->ip */
        GLOBAL(entry_SYSCALL_64_after_hwframe)
                pushq   %rax                            /* pt_regs->orig_ax */
      
                PUSH_AND_CLEAR_REGS rax=$-ENOSYS
      
      And it goes and ends at:
      
                cmpq    $__USER_DS, SS(%rsp)            /* SS must match SYSRET */
                jne     swapgs_restore_regs_and_return_to_usermode
      
                /*
                 * We win! This label is here just for ease of understanding
                 * perf profiles. Nothing jumps here.
                 */
        syscall_return_via_sysret:
                /* rcx and r11 are already restored (see code above) */
                UNWIND_HINT_EMPTY
                POP_REGS pop_rdi=0 skip_r11rcx=1
      
      So perhaps some people should really just play with '--ignore-vmlinux'
      to force /proc/kcore + kallsyms.
      
      One idea is to do both, i.e. have a vmlinux annotation and a
      kcore+kallsyms one, when possible, and even show the patched location,
      etc.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-r11knxv8voesav31xokjiuo6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      980b68ec
    • A
      perf annotate: Defer searching for comma in raw line till it is needed · c448234c
      Arnaldo Carvalho de Melo 提交于
      That strchr() in jump__scnprintf() needs to be nuked somehow, as it,
      IIRC is already done in jump__parse() and if needed at scnprintf() time,
      should be stashed in the struct filled in parse() time.
      
      For now jus defer it to just before where it is used.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-j0t5hagnphoz9xw07bh3ha3g@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c448234c
    • A
      perf annotate: Support jumping from one function to another · e4cc91b8
      Arnaldo Carvalho de Melo 提交于
      For instance:
      
        entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
          5.50 │     → callq  do_syscall_64
         14.56 │       mov    0x58(%rsp),%rcx
          7.44 │       mov    0x80(%rsp),%r11
          0.32 │       cmp    %rcx,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       shl    $0x10,%rcx
          0.32 │       sar    $0x10,%rcx
          3.24 │       cmp    %rcx,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          2.27 │       cmpq   $0x33,0x88(%rsp)
          1.29 │     → jne    swapgs_restore_regs_and_return_to_usermode
               │       mov    0x30(%rsp),%r11
          8.74 │       cmp    %r11,0x90(%rsp)
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       test   $0x10100,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       cmpq   $0x2b,0xa0(%rsp)
          0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode
      
      It'll behave just like a "call" instruction, i.e. press enter or right
      arrow over one such line and the browser will navigate to the annotated
      disassembly of that function, which when exited, via left arrow or esc,
      will come back to the calling function.
      
      Now to support jump to an offset on a different function...
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e4cc91b8
    • A
      perf annotate: Add "_local" to jump/offset validation routines · 2eff0611
      Arnaldo Carvalho de Melo 提交于
      Because they all really check if we can access data structures/visual
      constructs where a "jump" instruction targets code in the same function,
      i.e. things like:
      
        __pthread_mutex_lock  /usr/lib64/libpthread-2.26.so
        1.95 │       mov    __pthread_force_elision,%ecx
             │    ┌──test   %ecx,%ecx
        0.07 │    ├──je     60
             │    │  test   $0x300,%esi
             │    │↓ jne    60
             │    │  or     $0x100,%esi
             │    │  mov    %esi,0x10(%rdi)
             │ 42:│  mov    %esi,%edx
             │    │  lea    0x16(%r8),%rsi
             │    │  mov    %r8,%rdi
             │    │  and    $0x80,%edx
             │    │  add    $0x8,%rsp
             │    │→ jmpq   __lll_lock_elision
             │    │  nop
        0.29 │ 60:└─→and    $0x80,%esi
        0.07 │       mov    $0x1,%edi
        0.29 │       xor    %eax,%eax
        2.53 │       lock   cmpxchg %edi,(%r8)
      
      And not things like that "jmpq __lll_lock_elision", that instead should behave
      like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2eff0611
    • P
      perf python: Reference Py_None before returning it · 83428f2f
      Petr Machata 提交于
      Python None objects are handled just like all the other objects with
      respect to their reference counting. Before returning Py_None, its
      reference count thus needs to be bumped.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Machata <petrm@mellanox.com>
      Link: http://lkml.kernel.org/r/b1e565ecccf68064d8d54f37db5d028dda8fa522.1521675563.git.petrm@mellanox.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83428f2f
    • D
      tc-testing: add selftests for 'bpf' action · 440ea4ae
      Davide Caratti 提交于
      Test d959: Add cBPF action with valid bytecode
      Test f84a: Add cBPF action with invalid bytecode
      Test e939: Add eBPF action with valid object-file
      Test 282d: Add eBPF action with invalid object-file
      Test d819: Replace cBPF bytecode and action control
      Test 6ae3: Delete cBPF action
      Test 3e0d: List cBPF actions
      Test 55ce: Flush BPF actions
      Test ccc3: Add cBPF action with duplicate index
      Test 89c7: Add cBPF action with invalid index
      Test 7ab9: Add cBPF action with cookie
      
      Changes since v1:
       - use index=2^32-1 in test ccc3, add tests 7a89, 89c7 (thanks Roman Mashak)
       - added test 282d
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      440ea4ae
    • J
      bpftool: Adjust to new print_bpf_insn interface · 337682ca
      Jiri Olsa 提交于
      Change bpftool to skip the removed struct bpf_verifier_env
      argument in print_bpf_insn. It was passed as NULL anyway.
      
      No functional change intended.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      337682ca