1. 22 11月, 2018 2 次提交
    • D
      perf bench: Add epoll parallel epoll_wait benchmark · 121dd9ea
      Davidlohr Bueso 提交于
      This program benchmarks concurrent epoll_wait(2) for file descriptors
      that are monitored with with EPOLLIN along various semantics, by a
      single epoll instance. Such conditions can be found when using
      single/combined or multiple queuing when load balancing.
      
      Each thread has a number of private, nonblocking file descriptors,
      referred to as fdmap. A writer thread will constantly be writing to the
      fdmaps of all threads, minimizing each threads's chances of epoll_wait
      not finding any ready read events and blocking as this is not what we
      want to stress. Full details in the start of the C file.
      
      Committer testing:
      
        # perf bench
        Usage:
      	perf bench [<common options>] <collection> <benchmark> [<options>]
      
              # List of all available benchmark collections:
      
               sched: Scheduler and IPC benchmarks
                 mem: Memory access benchmarks
                numa: NUMA scheduling and MM benchmarks
               futex: Futex stressing benchmarks
               epoll: Epoll stressing benchmarks
                 all: All benchmarks
      
        # perf bench epoll
      
              # List of available benchmarks for collection 'epoll':
      
                wait: Benchmark epoll concurrent epoll_waits
                 all: Run all futex benchmarks
      
        # perf bench epoll wait
        # Running 'epoll/wait' benchmark:
        Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
      
        [thread  0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
        [thread  1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
        [thread  2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
      
        Averaged 353786 operations/sec (+- 4.35%), total secs = 8
        #
      
      Committer notes:
      
      Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
      and others:
      
          CC       /tmp/build/perf/bench/epoll-wait.o
        bench/epoll-wait.c: In function 'writerfn':
        bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
          printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~
        bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
          do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
                                       ^~~
        cc1: all warnings being treated as errors
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
      Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
      Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
      [ Applied above fixup as per Davidlohr's request ]
      [ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
      [ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      121dd9ea
    • A
      tools build feature: Check if eventfd() is available · 11c6cbe7
      Arnaldo Carvalho de Melo 提交于
      A new 'perf bench epoll' will use this, and to disable it for older
      systems, add a feature test for this API.
      
      This is just a simple program that if successfully compiled, means that
      the feature is present, at least at the library level, in a build that
      sets the output directory to /tmp/build/perf (using O=/tmp/build/perf),
      we end up with:
      
        $ ls -la /tmp/build/perf/feature/test-eventfd*
        -rwxrwxr-x. 1 acme acme 8176 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.bin
        -rw-rw-r--. 1 acme acme  588 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.d
        -rw-rw-r--. 1 acme acme    0 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.make.output
        $ ldd /tmp/build/perf/feature/test-eventfd.bin
      	  linux-vdso.so.1 (0x00007fff3bf3f000)
      	  libc.so.6 => /lib64/libc.so.6 (0x00007fa984061000)
      	  /lib64/ld-linux-x86-64.so.2 (0x00007fa984417000)
        $ grep eventfd -A 2 -B 2 /tmp/build/perf/FEATURE-DUMP
        feature-dwarf=1
        feature-dwarf_getlocations=1
        feature-eventfd=1
        feature-fortify-source=1
        feature-sync-compare-and-swap=1
        $
      
      The main thing here is that in the end we'll have -DHAVE_EVENTFD in
      CFLAGS, and then the 'perf bench' entry needing that API can be
      selectively pruned.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-wkeldwob7dpx6jvtuzl8164k@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      11c6cbe7
  2. 21 11月, 2018 19 次提交
    • D
      perf bench: Move HAVE_PTHREAD_ATTR_SETAFFINITY_NP into bench.h · d47d77c3
      Davidlohr Bueso 提交于
      Both futex and epoll need this call, and can cause build failure on
      systems that don't have it pthread_attr_setaffinity_np().
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Link: http://lkml.kernel.org/r/20181109210719.pr7ohayuwqmfp2wl@linux-r8p5Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d47d77c3
    • M
      perf script: Share code and output format for uregs and iregs output · 9add8fe8
      Milian Wolff 提交于
      The iregs output was missing the newline at end as well as the leading
      ABI output. This made it hard to compare the iregs and uregs values.
      Instead, use a single function to output the register values and use it
      for both, iregs and uregs, to ensure the output is consistent.
      
      Before:
      
        perf  7049 [-01]  1343.354347:          1 cycles:ppp:
              ffffffffa7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
          AX:0x80000000    BX:0x0    CX:0x0    DX:0x7    SI:0xf    DI:0x286    BP:0xffff95bc8213a460    SP:0xffffacbf0ba97d18    IP:0xffffffffa7bc21cd FLAGS:0x28e    CS:0x10    SS:0x18    R8:0x2    R9:0x21440   R10:0x33816fb3b8c   R11:0x1   R12:0xffff95bc8213a460   R13:0xffff95bc8213a400   R14:0xffff95bc8213a400   R15:0x1  ABI:2    AX:0xffffffffffffffda    BX:0xffffffffffffffff    CX:0x7f84ad85798b    DX:0x560209699d50    SI:0x7ffe2c7a6820    DI:0x7ffe2c7a8c9b    BP:0x7ffe2c7a20d0    SP:0x7ffe2c7a2058    IP:0x7f84ad85798b FLAGS:0x206    CS:0x33    SS:0x2b    R8:0x7ffe2c7a2030    R9:0x7f84ae55f010   R10:0x8   R11:0x206   R12:0xffffffffffffffff   R13:0xffffffffffffffff   R14:0xffffffffffffffff   R15:0xffffffffffffffff
      
        perf  7049 [-01]  1343.354363:          1 cycles:ppp:
              ...
      
      After:
      
        perf  7049 [-01]  1343.354347:          1 cycles:ppp:
              ffffffffa7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
              ffffffffa840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
          ABI:2    AX:0x80000000    BX:0x0    CX:0x0    DX:0x7    SI:0xf    DI:0x286    BP:0xffff95bc8213a460    SP:0xffffacbf0ba97d18    IP:0xffffffffa7bc21cd FLAGS:0x28e    CS:0x10    SS:0x18    R8:0x2    R9:0x21440   R10:0x33816fb3b8c   R11:0x1   R12:0xffff95bc8213a460   R13:0xffff95bc8213a400   R14:0xffff95bc8213a400   R15:0x1
          ABI:2    AX:0xffffffffffffffda    BX:0xffffffffffffffff    CX:0x7f84ad85798b    DX:0x560209699d50    SI:0x7ffe2c7a6820    DI:0x7ffe2c7a8c9b    BP:0x7ffe2c7a20d0    SP:0x7ffe2c7a2058    IP:0x7f84ad85798b FLAGS:0x206    CS:0x33    SS:0x2b    R8:0x7ffe2c7a2030    R9:0x7f84ae55f010   R10:0x8   R11:0x206   R12:0xffffffffffffffff   R13:0xffffffffffffffff   R14:0xffffffffffffffff   R15:0xffffffffffffffff
      
        perf  7049 [-01]  1343.354363:          1 cycles:ppp:
              ...
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181107223437.9071-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9add8fe8
    • A
      perf bpf: Reduce the hardcoded .max_entries for pid_maps · 0f7c2de5
      Arnaldo Carvalho de Melo 提交于
      While working on augmented syscalls I got into this error:
      
        # trace -vv --filter-pids 2469,1663 -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
        <SNIP>
        libbpf: map 0 is "__augmented_syscalls__"
        libbpf: map 1 is "__bpf_stdout__"
        libbpf: map 2 is "pids_filtered"
        libbpf: map 3 is "syscalls"
        libbpf: collecting relocating info for: '.text'
        libbpf: relo for 13 value 84 name 133
        libbpf: relocation: insn_idx=3
        libbpf: relocation: find map 3 (pids_filtered) for insn 3
        libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
        libbpf: relo for 8 value 0 name 0
        libbpf: relocation: insn_idx=1
        libbpf: relo for 8 value 0 name 0
        libbpf: relocation: insn_idx=3
        libbpf: relo for 9 value 28 name 178
        libbpf: relocation: insn_idx=36
        libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 36
        libbpf: collecting relocating info for: 'raw_syscalls:sys_exit'
        libbpf: relo for 8 value 0 name 0
        libbpf: relocation: insn_idx=0
        libbpf: relo for 8 value 0 name 0
        libbpf: relocation: insn_idx=2
        bpf: config program 'raw_syscalls:sys_enter'
        bpf: config program 'raw_syscalls:sys_exit'
        libbpf: create map __bpf_stdout__: fd=3
        libbpf: create map __augmented_syscalls__: fd=4
        libbpf: create map syscalls: fd=5
        libbpf: create map pids_filtered: fd=6
        libbpf: added 13 insn from .text to prog raw_syscalls:sys_enter
        libbpf: added 13 insn from .text to prog raw_syscalls:sys_exit
        libbpf: load bpf program failed: Operation not permitted
        libbpf: failed to load program 'raw_syscalls:sys_exit'
        libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
        bpf: load objects failed: err=-4009: (Incorrect kernel version)
        event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                             \___ Failed to load program for unknown reason
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf trace [<options>] [<command>]
            or: perf trace [<options>] -- <command> [<options>]
            or: perf trace record [<options>] [<command>]
            or: perf trace record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event/syscall selector. use 'perf list' to list available events
      
      If I then try to use strace (perf trace'ing 'perf trace' needs some more work
      before its possible) to get a bit more info I get:
      
        # strace -e bpf trace --filter-pids 2469,1663 -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
        bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=4, map_flags=0, inner_map_fd=0, map_name="__bpf_stdout__", map_ifindex=0}, 72) = 3
        bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=4, map_flags=0, inner_map_fd=0, map_name="__augmented_sys", map_ifindex=0}, 72) = 4
        bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=1, max_entries=500, map_flags=0, inner_map_fd=0, map_name="syscalls", map_ifindex=0}, 72) = 5
        bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=512, map_flags=0, inner_map_fd=0, map_name="pids_filtered", map_ifindex=0}, 72) = 6
        bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=57, insns=0x1223f50, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_enter", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = 7
        bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=18, insns=0x1224120, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_exit", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = -1 EPERM (Operation not permitted)
        bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=18, insns=0x1224120, license="GPL", log_level=1, log_size=262144, log_buf="", kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_exit", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = -1 EPERM (Operation not permitted)
        bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=18, insns=0x1224120, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_exit", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = -1 EPERM (Operation not permitted)
        event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                             \___ Failed to load program for unknown reason
        <SNIP similar output as without 'strace'>
        #
      
      I managed to create the maps, etc, but then installing the "sys_exit" hook into
      the "raw_syscalls:sys_exit" tracepoint somehow gets -EPERMed...
      
      I then go and try reducing the size of this new table:
      
        +++ b/tools/perf/examples/bpf/augmented_raw_syscalls.c
        @@ -47,6 +47,17 @@ struct augmented_filename {
         #define SYS_OPEN 2
         #define SYS_OPENAT 257
      
        +struct syscall {
        +       bool    filtered;
        +};
        +
        +struct bpf_map SEC("maps") syscalls = {
        +       .type        = BPF_MAP_TYPE_ARRAY,
        +       .key_size    = sizeof(int),
        +       .value_size  = sizeof(struct syscall),
        +       .max_entries = 500,
        +};
      
      And after reducing that .max_entries a tad, it works. So yeah, the "unknown
      reason" should be related to the number of bytes all this is taking, reduce the
      default for pid_map()s so that we can have a "syscalls" map with enough slots
      for all syscalls in most arches. And take notes about this error message,
      improve it :-)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Edward Cree <ecree@solarflare.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lkml.kernel.org/n/tip-yjzhak8asumz9e9hts2dgplp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f7c2de5
    • M
      perf script: Add newline after uregs output · b07d16f7
      Milian Wolff 提交于
      This change makes it much easier to easily distinguish between
      consecutive samples by keeping the empty line between them, like we see
      when we do not enable uregs output.
      
      Before:
      
        cpp-inlining 28298 [-01] 54837.342780:    3068085 cycles:pp:
                    7ffff7c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
                    ...
         ABI:2    AX:0x0    BX:0x40f56cf6    CX:0x294a3ae7    ...
        cpp-inlining 28298 [-01] 54837.344493:    2881929 cycles:pp:
                    7ffff7c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
                    ...
         ABI:2    AX:0x40d440c7    BX:0x40d440c7    CX:0x4d45e5da    ...
      
      After:
      
        cpp-inlining 28298 [-01] 54837.342780:    3068085 cycles:pp:
                    7ffff7c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
                    ...
         ABI:2    AX:0x0    BX:0x40f56cf6    CX:0x294a3ae7    ...
      
        cpp-inlining 28298 [-01] 54837.344493:    2881929 cycles:pp:
                    7ffff7c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
                    ...
         ABI:2    AX:0x40d440c7    BX:0x40d440c7    CX:0x4d45e5da    ...
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181107093705.16346-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b07d16f7
    • A
      Revert "perf augmented_syscalls: Drop 'write', 'poll' for testing without self pid filter" · 4aa792de
      Arnaldo Carvalho de Melo 提交于
      Now that we have the "filtered_pids" logic in place, no need to do this
      rough filter to avoid the feedback loop from 'perf trace's own syscalls,
      revert it.
      
      This reverts commit 7ed71f124284359676b6496ae7db724fee9da753.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-88vh02cnkam0vv5f9vp02o3h@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4aa792de
    • A
      perf augmented_syscalls: Remove example hardcoded set of filtered pids · e312747b
      Arnaldo Carvalho de Melo 提交于
      Now that 'perf trace' fills in that "filtered_pids" BPF map, remove the
      set of filtered pids used as an example to test that feature.
      
      That feature works like this:
      
      Starting a system wide 'strace' like 'perf trace' augmented session we
      noticed that lots of events take place for a pid, which ends up being
      the feedback loop of perf trace's syscalls being processed by the
      'gnome-terminal' process:
      
        # perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c
           0.391 ( 0.002 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f750bc, count: 8176) = 453
           0.394 ( 0.001 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f75280, count: 7724) = -1 EAGAIN Resource temporarily unavailable
           0.438 ( 0.001 ms): gnome-terminal/2469 read(fd: 4<anon_inode:[eventfd]>, buf: 0x7fffc696aeb0, count: 16) = 8
           0.519 ( 0.001 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f75280, count: 7724) = 114
           0.522 ( 0.001 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f752f1, count: 7611) = -1 EAGAIN Resource temporarily unavailable
        ^C
      
      So we can use --filter-pids to get rid of that one, and in this case what is
      being used to implement that functionality is that "filtered_pids" BPF map that
      the tools/perf/examples/bpf/augmented_raw_syscalls.c created and that 'perf trace'
      bpf loader noticed and created a "struct bpf_map" associated that then got populated
      by 'perf trace':
      
        # perf trace --filter-pids 2469 -e tools/perf/examples/bpf/augmented_raw_syscalls.c
           0.020 ( 0.002 ms): gnome-shell/1663 epoll_pwait(epfd: 12<anon_inode:[eventpoll]>, events: 0x7ffd8f3ef960, maxevents: 32, sigsetsize: 8) = 1
           0.025 ( 0.002 ms): gnome-shell/1663 read(fd: 24</dev/input/event4>, buf: 0x560c01bb8240, count: 8112) = 48
           0.029 ( 0.001 ms): gnome-shell/1663 read(fd: 24</dev/input/event4>, buf: 0x560c01bb8258, count: 8088) = -1 EAGAIN Resource temporarily unavailable
           0.032 ( 0.001 ms): gnome-shell/1663 read(fd: 24</dev/input/event4>, buf: 0x560c01bb8240, count: 8112) = -1 EAGAIN Resource temporarily unavailable
           0.040 ( 0.003 ms): gnome-shell/1663 recvmsg(fd: 46<socket:[35893]>, msg: 0x7ffd8f3ef950) = -1 EAGAIN Resource temporarily unavailable
          21.529 ( 0.002 ms): gnome-shell/1663 epoll_pwait(epfd: 5<anon_inode:[eventpoll]>, events: 0x7ffd8f3ef960, maxevents: 32, sigsetsize: 8) = 1
          21.533 ( 0.004 ms): gnome-shell/1663 recvmsg(fd: 82<socket:[42826]>, msg: 0x7ffd8f3ef7b0, flags: DONTWAIT|CMSG_CLOEXEC) = 236
          21.581 ( 0.006 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_BUSY, arg: 0x7ffd8f3ef060) = 0
          21.605 ( 0.020 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_CREATE, arg: 0x7ffd8f3eeea0) = 0
          21.626 ( 0.119 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_SET_DOMAIN, arg: 0x7ffd8f3eee94) = 0
          21.746 ( 0.081 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_PWRITE, arg: 0x7ffd8f3eeea0) = 0
        ^C
      
      Oops, yet another gnome process that is involved with the output that
      'perf trace' generates, lets filter that out too:
      
        # perf trace --filter-pids 2469,1663 -e tools/perf/examples/bpf/augmented_raw_syscalls.c
               ? (         ): wpa_supplicant/1366  ... [continued]: select()) = 0 Timeout
           0.006 ( 0.002 ms): wpa_supplicant/1366 clock_gettime(which_clock: BOOTTIME, tp: 0x7fffe5b1e430) = 0
           0.011 ( 0.001 ms): wpa_supplicant/1366 clock_gettime(which_clock: BOOTTIME, tp: 0x7fffe5b1e3e0) = 0
           0.014 ( 0.001 ms): wpa_supplicant/1366 clock_gettime(which_clock: BOOTTIME, tp: 0x7fffe5b1e430) = 0
               ? (         ): gmain/1791  ... [continued]: poll()) = 0 Timeout
           0.017 (         ): wpa_supplicant/1366 select(n: 6, inp: 0x55646fed3ad0, outp: 0x55646fed3b60, exp: 0x55646fed3bf0, tvp: 0x7fffe5b1e4a0) ...
         157.879 ( 0.019 ms): gmain/1791 inotify_add_watch(fd: 8<anon_inode:inotify>, pathname: , mask: 16789454) = -1 ENOENT No such file or directory
               ? (         ): cupsd/1001  ... [continued]: epoll_pwait()) = 0
               ? (         ): gsd-color/1908  ... [continued]: poll()) = 0 Timeout
         499.615 (         ): cupsd/1001 epoll_pwait(epfd: 4<anon_inode:[eventpoll]>, events: 0x557a21166500, maxevents: 4096, timeout: 1000, sigsetsize: 8) ...
         586.593 ( 0.004 ms): gsd-color/1908 recvmsg(fd: 3<socket:[38074]>, msg: 0x7ffdef34e800) = -1 EAGAIN Resource temporarily unavailable
               ? (         ): fwupd/2230  ... [continued]: poll()) = 0 Timeout
               ? (         ): rtkit-daemon/906  ... [continued]: poll()) = 0 Timeout
               ? (         ): rtkit-daemon/907  ... [continued]: poll()) = 1
         724.603 ( 0.007 ms): rtkit-daemon/907 read(fd: 6<anon_inode:[eventfd]>, buf: 0x7f05ff768d08, count: 8) = 8
               ? (         ): ssh/5461  ... [continued]: select()) = 1
         810.431 ( 0.002 ms): ssh/5461 clock_gettime(which_clock: BOOTTIME, tp: 0x7ffd7f39f870) = 0
         ^C
      
      Several syscall exit events for syscalls in flight when 'perf trace' started, etc. Saner :-)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-c3tu5yg204p5mvr9kvwew07n@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e312747b
    • A
      perf trace: Fill in BPF "filtered_pids" map when present · a9964c43
      Arnaldo Carvalho de Melo 提交于
      This makes the augmented_syscalls support the --filter-pids and
      auto-filtered feedback loop pids just like when working without BPF,
      i.e. with just raw_syscalls:sys_{enter,exit} and tracepoint filters.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zc5n453sxxm0tz1zfwwelyti@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9964c43
    • A
      perf trace: See if there is a map named "filtered_pids" · 744fafc7
      Arnaldo Carvalho de Melo 提交于
      Lookup for the first map named "filtered_pids" and, if augmenting
      syscalls, i.e. if a BPF event is present and the
      "__augmented_syscalls__" is present, then fill in that map with the pids
      to filter, be it feedback loop ones (perf trace's pid, its father if it
      is "sshd", more auto-filtered in the future) or the ones explicitely
      stated in the tool command line via --filter-pids.
      
      The code to actually fill in the map comes next.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-rhzytmw7qpe6lqyjxi1ded9t@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      744fafc7
    • A
      perf trace: Add "_from_option" suffix to trace__set_filter() · 6a0b3aba
      Arnaldo Carvalho de Melo 提交于
      As we'll need that name for a new function to set filters for both
      tracepoints and BPF maps for filtering pids.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-mdkck6hf3fnd21rz2766280q@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6a0b3aba
    • A
      perf evlist: Rename perf_evlist__set_filter* to perf_evlist__set_tp_filter* · 7ad92a33
      Arnaldo Carvalho de Melo 提交于
      To better reflect that this is a tracepoint filter, as opposed, for
      instance to map based BPF filters.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-9138svli6ddcphrr3ymy9oy3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7ad92a33
    • A
      perf augmented_syscalls: Use pid_filter · ed9a77ba
      Arnaldo Carvalho de Melo 提交于
      Just to test filtering a bunch of pids, now its time to go and get that
      hooked up in 'perf trace', right after we load the bpf program, if we
      find a "pids_filtered" map defined, we'll populate it with the filtered
      pids.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-1i9s27wqqdhafk3fappow84x@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ed9a77ba
    • A
      perf augmented_syscalls: Drop 'write', 'poll' for testing without self pid filter · 77ecb640
      Arnaldo Carvalho de Melo 提交于
      When testing system wide tracing without filtering the syscalls called
      by 'perf trace' itself we get into a feedback loop, drop for now those
      two syscalls, that are the ones that 'perf trace' does in its loop for
      writing the syscalls it intercepts, to help with testing till we get
      that filtering in place.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-rkbu536af66dbsfx51sr8yof@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      77ecb640
    • A
      perf bpf: Add simple pid_filter class accessible to BPF proggies · 8008aab0
      Arnaldo Carvalho de Melo 提交于
      Will be used in the augmented_raw_syscalls.c to implement 'perf trace
      --filter-pids'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-9sybmz4vchlbpqwx2am13h9e@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8008aab0
    • A
      perf bpf: Add defines for map insertion/lookup · 382b55db
      Arnaldo Carvalho de Melo 提交于
      Starting with a helper for a basic pid_map(), a hash using a pid as a
      key.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-gdwvq53wltvq6b3g5tdmh0cw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      382b55db
    • A
      perf augmented_syscalls: Remove needless linux/socket.h include · 66067538
      Arnaldo Carvalho de Melo 提交于
      Leftover from when we started augmented_raw_syscalls.c from
      tools/perf/examples/bpf/augmented_syscalls.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: e58a0322dbac ("perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit}")
      Link: https://lkml.kernel.org/n/tip-pmts9ls2skh8n3zisb4txudd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      66067538
    • A
      perf augmented_syscalls: Filter on a hard coded pid · 55f127b4
      Arnaldo Carvalho de Melo 提交于
      Just to show where we'll hook pid based filters, and what we use to
      obtain the current pid, using a BPF getpid() equivalent.
      
      Now we need to remove that hardcoded PID with a BPF hash map, so that we
      start by filtering 'perf trace's own PID, implement the --filter-pid
      functionality, etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-oshrcgcekiyhd0whwisxfvtv@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      55f127b4
    • A
      perf bpf: Add unistd.h to the headers accessible to bpf proggies · 1475d35c
      Arnaldo Carvalho de Melo 提交于
      Start with a getpid() function wrapping BPF_FUNC_get_current_pid_tgid,
      idea is to mimic the system headers.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zo8hv22onidep7tm785dzxfk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1475d35c
    • I
      Merge tag 'perf-urgent-for-mingo-4.20-20181121' of... · b1a9d7b0
      Ingo Molnar 提交于
      Merge tag 'perf-urgent-for-mingo-4.20-20181121' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes:
      
      - Update kernel ABI headers, one of them lead to a small change in
        the ioctl 'cmd' beautifier in 'perf trace' to support the new ISO7816
        commands. (Arnaldo Carvalho de Melo)
      
      - Restore proper cwd on return from mnt namespace (Jiri Olsa)
      
      - Add feature check for the get_current_dir_name() function used in the
        namespace fix from Jiri, that is not available in systems such as
        Alpine Linux, which uses the  musl libc (Arnaldo Carvalho de Melo)
      
      - Fix crash in 'perf record' when synthesizing the unit for events such
        as 'cpu-clock' (Jiri Olsa)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b1a9d7b0
    • P
      perf/x86/intel: Fix regression by default disabling perfmon v4 interrupt handling · 2a5bf23d
      Peter Zijlstra 提交于
      Kyle Huey reported that 'rr', a replay debugger, broke due to the following commit:
      
        af3bdb99 ("perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler")
      
      Rework the 'disable_counter_freezing' __setup() parameter such that we
      can explicitly enable/disable it and switch to default disabled.
      
      To this purpose, rename the parameter to "perf_v4_pmi=" which is a much
      better description and allows requiring a bool argument.
      
      [ mingo: Improved the changelog some more. ]
      Reported-by: NKyle Huey <me@kylehuey.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert O'Callahan <robert@ocallahan.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/20181120170842.GZ2131@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2a5bf23d
  3. 20 11月, 2018 6 次提交
    • A
      perf tools beauty ioctl: Support new ISO7816 commands · a4243e14
      Arnaldo Carvalho de Melo 提交于
      Introduced in:
      
        ad8c0eaa ("tty/serial_core: add ISO7816 infrastructure")
      
      Now 'perf trace' will be able to pretty-print the 'cmd' ioctl arg when
      used in capable systems with software emitting those commands.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-7bds48dhckfnleie08mit314@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a4243e14
    • A
      tools uapi asm-generic: Synchronize ioctls.h · 83d9bdea
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        ad8c0eaa ("tty/serial_core: add ISO7816 infrastructure")
      
      That is a change that imply a change to be made in tools/perf/trace/beauty/ioctl.c to
      make 'perf trace' ioctl syscall argument beautifier to support these new
      commands:  TIOCGISO7816 and TIOCSISO7816.
      
      This is not yet done automatically by a script like is done for some
      other headers, for instance:
      
        $ tools/perf/trace/beauty/drm_ioctl.sh | head
        #ifndef DRM_COMMAND_BASE
        #define DRM_COMMAND_BASE                0x40
        #endif
        static const char *drm_ioctl_cmds[] = {
      	[0x00] = "VERSION",
      	[0x01] = "GET_UNIQUE",
      	[0x02] = "GET_MAGIC",
      	[0x03] = "IRQ_BUSID",
      	[0x04] = "GET_MAP",
      	[0x05] = "GET_CLIENT",
        $
      
      So we will need to change tools/perf/trace/beauty/ioctl.c in a follow up
      patch until we switch to a generator script.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zin76fe6iykqsilvo6u47f9o@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83d9bdea
    • A
      tools arch x86: Update tools's copy of cpufeatures.h · 65e259d5
      Arnaldo Carvalho de Melo 提交于
      To get the changes in the following csets:
      
        ace6485a ("x86/cpufeatures: Enumerate MOVDIR64B instruction")
        33823f4d ("x86/cpufeatures: Enumerate MOVDIRI instruction")
      
      No tools were affected, copy it to silence this perf tool build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Link: https://lkml.kernel.org/n/tip-83kcyqa1qkxkhm1s7q3hbpel@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      65e259d5
    • A
      tools headers uapi: Synchronize i915_drm.h · 53f00f45
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        900ccf30 ("drm/i915: Only force GGTT coherency w/a on required chipsets")
      
      No changes are required in tools/ nor does anything gets automatically
      generated to be used in the 'perf trace' syscall arg beautifiers.
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
        diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-t2vor2wegv41gt5n49095kly@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      53f00f45
    • J
      perf tools: Restore proper cwd on return from mnt namespace · b01c1f69
      Jiri Olsa 提交于
      When reporting on 'record' server we try to retrieve/use the mnt
      namespace of the profiled tasks. We use following API with cookie to
      hold the return namespace, roughly:
      
        nsinfo__mountns_enter(struct nsinfo *nsi, struct nscookie *nc)
          setns(newns, 0);
        ...
        new ns related open..
        ...
        nsinfo__mountns_exit(struct nscookie *nc)
          setns(nc->oldns)
      
      Once finished we setns to old namespace, which also sets the current
      working directory (cwd) to "/", trashing the cwd we had.
      
      This is mostly fine, because we use absolute paths almost everywhere,
      but it screws up 'perf diff':
      
        # perf diff
        failed to open perf.data: No such file or directory  (try 'perf record' first)
        ...
      
      Adding the current working directory to be part of the cookie and
      restoring it in the nsinfo__mountns_exit call.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 843ff37b ("perf symbols: Find symbols in different mount namespace")
      Link: http://lkml.kernel.org/r/20181101170001.30019-1-jolsa@kernel.org
      [ No need to check for NULL args for free(), use zfree() for struct members ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b01c1f69
    • A
      tools build feature: Check if get_current_dir_name() is available · 8feb8efe
      Arnaldo Carvalho de Melo 提交于
      As the namespace support code will use this, which is not available in
      some non _GNU_SOURCE libraries such as Android's bionic used in my
      container build tests (r12b and r15c at the moment).
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-x56ypm940pwclwu45d7jfj47@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8feb8efe
  4. 19 11月, 2018 13 次提交
    • L
      Linux 4.20-rc3 · 9ff01193
      Linus Torvalds 提交于
      9ff01193
    • L
      Merge tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 25e19c1f
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "A small batch of fixes for v4.20-rc3.
      
        The overflow continuation fix addresses something that has been broken
        for several releases. Arguably it could wait even longer, but it's a
        one line fix and this finishes the last of the known address range
        scrub bug reports. The revert addresses a lockdep regression. The unit
        tests are not critical to fix, but no reason to hold this fix back.
      
        Summary:
      
         - Address Range Scrub overflow continuation handling has been broken
           since it was initially merged. It was only recently that error
           injection and platform-BIOS support enabled this corner case to be
           exercised.
      
         - The recent attempt to provide more isolation for the kernel Address
           Range Scrub state machine from userapace initiated sessions
           triggers a lockdep report. Revert and try again at the next merge
           window.
      
         - Fix a kasan reported buffer overflow in libnvdimm unit test
           infrastrucutre (nfit_test)"
      
      * tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        Revert "acpi, nfit: Further restrict userspace ARS start requests"
        acpi, nfit: Fix ARS overflow continuation
        tools/testing/nvdimm: Fix the array size for dimm devices.
      25e19c1f
    • L
      Merge branch 'akpm' (patches from Andrew) · c67a98c0
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
        mm, page_alloc: check for max order in hot path
        scripts/spdxcheck.py: make python3 compliant
        tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
        lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
        mm/vmstat.c: fix NUMA statistics updates
        mm/gup.c: fix follow_page_mask() kerneldoc comment
        ocfs2: free up write context when direct IO failed
        scripts/faddr2line: fix location of start_kernel in comment
        mm: don't reclaim inodes with many attached pages
        mm, memory_hotplug: check zone_movable in has_unmovable_pages
        mm/swapfile.c: use kvzalloc for swap_info_struct allocation
        MAINTAINERS: update OMAP MMC entry
        hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
        kernel/sched/psi.c: simplify cgroup_move_task()
        z3fold: fix possible reclaim races
      c67a98c0
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 03582f33
      Linus Torvalds 提交于
      Pull scheduler fix from Ingo Molnar:
       "Fix an exec() related scalability/performance regression, which was
        caused by incorrectly calculating load and migrating tasks on exec()
        when they shouldn't be"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix cpu_util_wake() for 'execl' type workloads
      03582f33
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b53e27f6
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "Fix uncore PMU enumeration for CofeeLake CPUs"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Support CoffeeLake 8th CBOX
        perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
      b53e27f6
    • L
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 743a4863
      Linus Torvalds 提交于
      Pull EFI fixes from Ingo Molnar:
       "Misc fixes: two warning splat fixes, a leak fix and persistent memory
        allocation fixes for ARM"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi: Permit calling efi_mem_reserve_persistent() from atomic context
        efi/arm: Defer persistent reservations until after paging_init()
        efi/arm/libstub: Pack FDT after populating it
        efi/arm: Revert deferred unmap of early memmap mapping
        efi: Fix debugobjects warning on 'efi_rts_work'
      743a4863
    • L
      Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm · cfaa9f02
      Linus Torvalds 提交于
      Pull ARM spectre updates from Russell King:
       "These are the currently known final bits that resolve the Spectre
        issues. big.Little systems used to be sufficiently identical in that
        there were no differences between individual CPUs in the system that
        mattered to the kernel. With the advent of the Spectre problem, the
        CPUs now have differences in how the workaround is applied.
      
        As a result of previous Spectre patches, these systems ended up
        reporting quite a lot of:
      
           "CPUx: Spectre v2: incorrect context switching function, system vulnerable"
      
        messages due to the action of the big.Little switcher causing the CPUs
        to be re-initialised regularly. This series resolves that issue by
        making the CPU vtable unique to each CPU.
      
        However, since this is used very early, before per-cpu is setup,
        per-cpu can't be used. We also have a problem that two of the methods
        are not called from preempt-safe paths, but thankfully these remain
        identical between all CPUs in the system. To make sure, we validate
        that these are identical during boot"
      
      * 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: spectre-v2: per-CPU vtables to work around big.Little systems
        ARM: add PROC_VTABLE and PROC_TABLE macros
        ARM: clean up per-processor check_bugs method call
        ARM: split out processor lookup
        ARM: make lookup_processor_type() non-__init
      cfaa9f02
    • C
    • M
      mm, page_alloc: check for max order in hot path · c63ae43b
      Michal Hocko 提交于
      Konstantin has noticed that kvmalloc might trigger the following
      warning:
      
        WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
        [...]
        Call Trace:
         fragmentation_index+0x76/0x90
         compaction_suitable+0x4f/0xf0
         shrink_node+0x295/0x310
         node_reclaim+0x205/0x250
         get_page_from_freelist+0x649/0xad0
         __alloc_pages_nodemask+0x12a/0x2a0
         kmalloc_large_node+0x47/0x90
         __kmalloc_node+0x22b/0x2e0
         kvmalloc_node+0x3e/0x70
         xt_alloc_table_info+0x3a/0x80 [x_tables]
         do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
         nf_setsockopt+0x44/0x60
         SyS_setsockopt+0x6f/0xc0
         do_syscall_64+0x67/0x120
         entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      the problem is that we only check for an out of bound order in the slow
      path and the node reclaim might happen from the fast path already.  This
      is fixable by making sure that kvmalloc doesn't ever use kmalloc for
      requests that are larger than KMALLOC_MAX_SIZE but this also shows that
      the code is rather fragile.  A recent UBSAN report just underlines that
      by the following report
      
        UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
        shift exponent 51 is too large for 32-bit type 'int'
        CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0xd2/0x148 lib/dump_stack.c:113
         ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
         __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
         __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
         zone_watermark_fast mm/page_alloc.c:3216 [inline]
         get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
         __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
         alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
         alloc_pages include/linux/gfp.h:509 [inline]
         __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
         dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
         raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
         raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
         fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
         fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
         __blkdev_driver_ioctl block/ioctl.c:303 [inline]
         blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
         block_ioctl+0x105/0x150 fs/block_dev.c:1883
         vfs_ioctl fs/ioctl.c:46 [inline]
         do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
         ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
         __do_sys_ioctl fs/ioctl.c:709 [inline]
         __se_sys_ioctl fs/ioctl.c:707 [inline]
         __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
         do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Note that this is not a kvmalloc path.  It is just that the fast path
      really depends on having sanitzed order as well.  Therefore move the
      order check to the fast path.
      
      Link: http://lkml.kernel.org/r/20181113094305.GM15120@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reported-by: NKyungtae Kim <kt0755@gmail.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Byoungyoung Lee <lifeasageek@gmail.com>
      Cc: "Dae R. Jeong" <threeearcat@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c63ae43b
    • U
      scripts/spdxcheck.py: make python3 compliant · 6f4d29df
      Uwe Kleine-König 提交于
      Without this change the following happens when using Python3 (3.6.6):
      
      	$ echo "GPL-2.0" | python3 scripts/spdxcheck.py -
      	FAIL: 'str' object has no attribute 'decode'
      	Traceback (most recent call last):
      	  File "scripts/spdxcheck.py", line 253, in <module>
      	    parser.parse_lines(sys.stdin, args.maxlines, '-')
      	  File "scripts/spdxcheck.py", line 171, in parse_lines
      	    line = line.decode(locale.getpreferredencoding(False), errors='ignore')
      	AttributeError: 'str' object has no attribute 'decode'
      
      So as the line is already a string, there is no need to decode it and
      the line can be dropped.
      
      /usr/bin/python on Arch is Python 3.  So this would indeed be worth
      going into 4.19.
      
      Link: http://lkml.kernel.org/r/20181023070802.22558-1-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Joe Perches <joe@perches.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f4d29df
    • Y
      tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset · 1a413646
      Yufen Yu 提交于
      Other filesystems such as ext4, f2fs and ubifs all return ENXIO when
      lseek (SEEK_DATA or SEEK_HOLE) requests a negative offset.
      
      man 2 lseek says
      
      :      EINVAL whence  is  not  valid.   Or: the resulting file offset would be
      :             negative, or beyond the end of a seekable device.
      :
      :      ENXIO  whence is SEEK_DATA or SEEK_HOLE, and the file offset is  beyond
      :             the end of the file.
      
      Make tmpfs return ENXIO under these circumstances as well.  After this,
      tmpfs also passes xfstests's generic/448.
      
      [akpm@linux-foundation.org: rewrite changelog]
      Link: http://lkml.kernel.org/r/1540434176-14349-1-git-send-email-yuyufen@huawei.comSigned-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a413646
    • A
      lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn · 1c23b410
      Arnd Bergmann 提交于
      gcc-8 complains about the prototype for this function:
      
        lib/ubsan.c:432:1: error: ignoring attribute 'noreturn' in declaration of a built-in function '__ubsan_handle_builtin_unreachable' because it conflicts with attribute 'const' [-Werror=attributes]
      
      This is actually a GCC's bug. In GCC internals
      __ubsan_handle_builtin_unreachable() declared with both 'noreturn' and
      'const' attributes instead of only 'noreturn':
      
         https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84210
      
      Workaround this by removing the noreturn attribute.
      
      [aryabinin: add information about GCC bug in changelog]
      Link: http://lkml.kernel.org/r/20181107144516.4587-1-aryabinin@virtuozzo.comSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c23b410
    • J
      mm/vmstat.c: fix NUMA statistics updates · 13c9aaf7
      Janne Huttunen 提交于
      Scan through the whole array to see if an update is needed.  While we're
      at it, use sizeof() to be safe against any possible type changes in the
      future.
      
      The bug here is that we wouldn't sync per-cpu counters into global ones
      if there was an update of numa_stats for higher cpus.  Highly
      theoretical one though because it is much more probable that zone_stats
      are updated so we would refresh anyway.  So I wouldn't bother to mark
      this for stable, yet something nice to fix.
      
      [mhocko@suse.com: changelog enhancement]
      Link: http://lkml.kernel.org/r/1541601517-17282-1-git-send-email-janne.huttunen@nokia.com
      Fixes: 1d90ca89 ("mm: update NUMA counter threshold size")
      Signed-off-by: NJanne Huttunen <janne.huttunen@nokia.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13c9aaf7