1. 09 8月, 2018 28 次提交
  2. 03 8月, 2018 6 次提交
    • T
      perf auxtrace: Support for perf report -D for s390 · b96e6615
      Thomas Richter 提交于
      Add initial support for s390 auxiliary traces using the CPU-Measurement
      Sampling Facility.
      
      Support and ignore PERF_REPORT_AUXTRACE_INFO records in the perf data
      file. Later patches will show the contents of the auxiliary traces.
      
      Setup the auxtrace queues and data structures for s390.  A raw dump of
      the perf.data file now does not show an error when an auxtrace event is
      encountered.
      
      Output before:
      
        [root@s35lp76 perf]# ./perf report -D -i perf.data.auxtrace
        0x128 [0x10]: failed to process type: 70
        Error:
        failed to process sample
      
        0x128 [0x10]: event: 70
        .
        . ... raw event: size 16 bytes
        .  0000:  00 00 00 46 00 00 00 10 00 00 00 00 00 00 00 00  ...F............
      
        0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 0
        [root@s35lp76 perf]#
      
      Output after:
      
         # ./perf report -D -i perf.data.auxtrace |fgrep PERF_RECORD_AUXTRACE
        0 0 0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 5
        0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000
      	   offset: 0  ref: 0  idx: 4  tid: -1  cpu: 4
        ....
      
      Additional notes about the underlying hardware and software
      implementation, provided by Hendrik Brueckner (see Link: below).
      
      =============================================================================
      
      The CPU-Measurement Facility (CPU-MF) provides a set of functions to obtain
      performance information on the mainframe.  Basically, it was introduced
      with System z10 years ago for the z/Architecture, that means, 64-bit.
      For Linux, there are two facilities of interest, counter facility and sampling
      facility.  The counter facility provides hardware counters for instructions,
      cycles, crypto-activities, and many more.
      
      The sampling facility is a hardware sampler that when started will write
      samples at a particular interval into a sampling buffer.  At some point,
      for example, if a sample block is full, it generates an interrupt to collect
      samples (while the sampler continues to run).
      
      Few years ago, I started to provide the a perf PMU to use the counter
      and sampling facilities.  Recently, the device driver was updated to also
      "export" the sampling buffer into the AUX area.  Thomas now completed the
      related perf work to interpret and process these AUX data.
      
      If people are more interested in the sampling facility, they can have a
      look into:
      
      - The Load-Program-Parameter and the CPU-Measurement Facilities, SA23-2260-05
        http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a
      
      and to learn how-to use it for Linux on Z, have look at chapter 54,
      "Using the CPU-measurement facilities" in the:
      
      - Device Drivers, Features, and Commands, SC33-8411-34
        http://public.dhe.ibm.com/software/dw/linux390/docu/l416dd34.pdf
      
      =============================================================================
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20180803100758.GA28475@linux.ibm.com
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180802074622.13641-2-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b96e6615
    • A
      perf trace: Use perf_evsel__sc_tp_{uint,ptr} for "id"/"args" handling syscalls:* events · f3acd886
      Arnaldo Carvalho de Melo 提交于
      Now it looks just about the same as for the trace__sys_{enter,exit}.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-y59may7zx1eccnp4m3qm4u0b@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f3acd886
    • A
      perf trace: Setup struct syscall_tp for syscalls:sys_{enter,exit}_NAME events · d32855fa
      Arnaldo Carvalho de Melo 提交于
      Mapping "__syscall_nr" to "id" and setting up "args" from the offset of
      "__syscall_nr" + sizeof(u64), as the payload for syscalls:* is the same
      as for raw_syscalls:*, just the fields have different names.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ogeenrpviwcpwl3oy1l55f3m@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d32855fa
    • A
      perf trace: Allow setting up a syscall_tp struct without a format_field · aa823f58
      Arnaldo Carvalho de Melo 提交于
      To avoid having to ask libtraceevent to find a field by name when
      handling each tracepoint event, we setup a struct syscall_tp with
      a tp_field struct having an extractor function + the offset for the
      "id", "args" and "ret" raw_syscalls:sys_{enter,exit} tracepoints.
      
      Now that we want to do the same with syscalls:sys_{entry,exit}_NAME
      individual syscall tracepoints, where we have "id" as "__syscall_nr" and
      "args" as the actual series of per syscall parameters, we need more
      flexibility from the routines that set up these pre-looked up syscall
      tracepoint arg fields.
      
      The next cset will use it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-v59q5e0jrlzkpl9a1c7t81ni@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa823f58
    • A
      perf trace: Rename some syscall_tp methods to raw_syscall · 63f11c80
      Arnaldo Carvalho de Melo 提交于
      Because raw_syscalls have the field for the syscall number as 'id' while
      the syscalls:sys_{enter,exit}_NAME have it as __syscall_nr...
      
      Since we want to support both for being able to enable just a
      syscalls:sys_{enter,exit}_name instead of asking for
      raw_syscalls:sys_{enter,exit} plus filters, make the method names for
      each kind of tracepoint more explicit.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-4rixbfzco6tsry0w9ghx3ktb@git.kernel.orgSignef-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63f11c80
    • A
      perf trace: Use beautifiers on syscalls:sys_enter_ handlers · a98392bb
      Arnaldo Carvalho de Melo 提交于
      We were using the beautifiers only when processing the
      raw_syscalls:sys_enter events, but we can as well use them for the
      syscalls:sys_enter_NAME events, as the layout is the same.
      
      Some more tweaking is needed as we're processing them straight away,
      i.e. there is no buffering in the sys_enter_NAME event to wait for
      things like vfs_getname to provide pointer contents and then flushing
      at sys_exit_NAME, so we need to state in the syscall_arg that this
      is unbuffered, just print the pointer values, beautifying just
      non-pointer syscall args.
      
      This just shows an alternative way of processing tracepoints, that we
      will end up using when creating "tracepoint" payloads that already copy
      pointer contents (or chunks of it, i.e. not the whole filename, but just
      the end of it, not all the bf for a read/write, but just the start,
      etc), directly in the kernel using eBPF.
      
      E.g.:
      
        # perf trace -e syscalls:*enter*sleep,*sleep sleep 1
           0.303 (         ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffc93d5ecc0
           0.305 (1000.229 ms): sleep/8746 nanosleep(rqtp: 0x7ffc93d5ecc0) = 0
        # perf trace -e syscalls:*_*sleep,*sleep sleep 1
           0.288 (         ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffecde87e40
           0.289 (         ): sleep/8748 nanosleep(rqtp: 0x7ffecde87e40) ...
        1000.479 (         ): syscalls:sys_exit_nanosleep:0x0
           0.289 (1000.208 ms): sleep/8748  ... [continued]: nanosleep()) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-jehyd2zwhw00z3p7v7mg9632@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a98392bb
  3. 02 8月, 2018 4 次提交
    • A
      perf trace: Associate vfs_getname()'ed pathname with fd returned from 'openat' · 6a648b53
      Arnaldo Carvalho de Melo 提交于
      When the vfs_getname() wannabe tracepoint is in place:
      
        # perf probe -l
          probe:vfs_getname    (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
        #
      
      'perf trace' will use it to get the pathname when it is copied from
      userspace to the kernel, right after syscalls:sys_enter_open, copied
      in the 'probe:vfs_getname', stash it somewhere and then, at
      syscalls:sys_exit_open time, if the 'open' return is not -1, i.e. a
      successfull open syscall, associate that pathname to this return, i.e.
      the fd.
      
      We were not doing this for the 'openat' syscall, which would cause 'perf
      trace' to fallback to using /proc to get the fd, change it so that we
      use what we got from probe:vfs_getname, reducing the 'openat'
      beautification process cost, ditching the syscalls performed to read
      procfs state and avoiding some possible races in the process.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-xnp44ao3bkb6ejeczxfnjwsh@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6a648b53
    • I
      Merge tag 'perf-core-for-mingo-4.19-20180801' of... · ec2cb7a5
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-4.19-20180801' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf trace: (Arnaldo Carvalho de Melo)
      
      - Do not require --no-syscalls to suppress strace like output, i.e.
      
           # perf trace -e sched:*switch
      
        will show just sched:sched_switch events, not strace-like formatted
        syscall events, use --syscalls to get the previous behaviour.
      
        If instead:
      
           # perf trace
      
        is used, i.e. no events specified, then --syscalls is implied and
        system wide strace like formatting will be applied to all syscalls.
      
        The behaviour when just a syscall subset is used with '-e' is unchanged:
      
           # perf trace -e *sleep,sched:*switch
      
        will work as before: just the 'nanosleep' syscall will be strace-like
        formatted plus the sched:sched_switch tracepoint event, system wide.
      
      - Allow string table generators to use a default header dir, allowing
        use of them without parameters to see the table it generates on
        stdout, e.g.:
      
          $ tools/perf/trace/beauty/kvm_ioctl.sh
          static const char *kvm_ioctl_cmds[] = {
              [0x00] = "GET_API_VERSION",
              [0x01] = "CREATE_VM",
              [0x02] = "GET_MSR_INDEX_LIST",
              [0x03] = "CHECK_EXTENSION",
      <BIG SNIP>
              [0xe0] = "CREATE_DEVICE",
              [0xe1] = "SET_DEVICE_ATTR",
              [0xe2] = "GET_DEVICE_ATTR",
              [0xe3] = "HAS_DEVICE_ATTR",
          };
          $
      
        See 'ls tools/perf/trace/beauty/*.sh' to see the available string
        table generators.
      
      - Add a generator for IPPROTO_ socket's protocol constants.
      
      perf record: (Kan Liang)
      
      - Fix error out while applying initial delay and using LBR, due to
        the use of a PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY event to track
        PERF_RECORD_MMAP events while waiting for the initial delay. Such
        events fail when configured asking PERF_SAMPLE_BRANCH_STACK in
        perf_event_attr.sample_type.
      
      perf c2c: (Jiri Olsa)
      
      - Fix report crash for empty browser, when processing a perf.data file
        without events of interest, either because not asked for in
        'perf record' or because the workload didn't triggered such events.
      
      perf list: (Michael Petlan)
      
      - Align metric group description format with PMU event description.
      
      perf tests: (Sandipan Das)
      
      - Fix indexing when invoking subtests, which caused BPF tests to
        get results for the next test in the list, with the last one
        reporting a failure.
      
      eBPF:
      
      - Fix installation directory for header files included from eBPF proggies,
        avoiding clashing with relative paths used to build other software projects
        such as glibc. (Thomas Richter)
      
      - Show better message when failing to load an object. (Arnaldo Carvalho de Melo)
      
      General: (Christophe Leroy)
      
      - Allow overriding MAX_NR_CPUS at compile time, to make the tooling
        usable in systems with less memory, in time this has to be changed
        to properly allocate based on _NPROCESSORS_ONLN.
      
      Architecture specific:
      
      - Update arm64's ThunderX2 implementation defined pmu core events (Ganapatrao Kulkarni)
      
      - Fix complex event name parsing in 'perf test' for PowerPC, where the 'umask' event
        modifier isn't present. (Sandipan Das)
      
      CoreSight ARM hardware tracing: (Leo Yan)
      
      - Fix start tracing packet handling.
      
      - Support dummy address value for CS_ETM_TRACE_ON packet.
      
      - Generate branch sample when receiving a CS_ETM_TRACE_ON packet.
      
      - Generate branch sample for CS_ETM_TRACE_ON packet.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ec2cb7a5
    • I
      16e0e6a8
    • A
      perf trace: Do not require --no-syscalls to suppress strace like output · b912885a
      Arnaldo Carvalho de Melo 提交于
      So far the --syscalls option was the default, requiring explicit
      --no-syscalls when wanting to process just some other event, invert that
      and assume it only when no other event was specified, allowing its
      explicit enablement when wanting to see all syscalls together with some
      other event:
      
      E.g:
      
      The existing default is maintained for a single workload:
      
        # perf trace sleep 1
      <SNIP>
           0.264 ( 0.003 ms): sleep/12762 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f62cbf04000
           0.271 ( 0.001 ms): sleep/12762 close(fd: 3) = 0
           0.295 (1000.130 ms): sleep/12762 nanosleep(rqtp: 0x7ffd15194fd0) = 0
        1000.469 ( 0.006 ms): sleep/12762 close(fd: 1) = 0
        1000.480 ( 0.004 ms): sleep/12762 close(fd: 2) = 0
        1000.502 (         ): sleep/12762 exit_group()
        #
      
      For a pid:
      
        # pidof ssh
        7826 3961 3226 2628 2493
        # perf trace -p 3961
               ? (         ):  ... [continued]: select()) = 1
           0.023 ( 0.005 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce870               ) = 0
           0.036 ( 0.009 ms): read(fd: 5</dev/pts/7>, buf: 0x7ffcc8fca7b0, count: 16384             ) = 3
           0.060 ( 0.004 ms): getpid(                                                               ) = 3961 (ssh)
           0.079 ( 0.004 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce8e0               ) = 0
           0.088 ( 0.003 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce7c0               ) = 0
      <SNIP>
      
      For system wide, threads, cgroups, user, etc when no event is specified,
      the existing behaviour is maintained, i.e. --syscalls is selected.
      
      When some event is specified, then --no-syscalls doesn't need to be
      specified:
      
        # perf trace -e tcp:tcp_probe ssh localhost
           0.000 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=53 snd_nxt=0xb67ce8f7 snd_una=0xb67ce8f7 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43690
           0.010 tcp:tcp_probe:src=[::1]:39074 dest=[::1]:22 mark=0 length=32 snd_nxt=0xa8f9ef38 snd_una=0xa8f9ef23 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43690 srtt=31 rcv_wnd=43776
           4.525 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=1240 snd_nxt=0xb67ce90c snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43776
           7.242 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=80 snd_nxt=0xb67ced44 snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=174720
        The authenticity of host 'localhost (::1)' can't be established.
        ECDSA key fingerprint is SHA256:TKZS58923458203490asekfjaklskljmkjfgPMBfHzY.
        ECDSA key fingerprint is MD5:d8:29:54:40:71:fa:b8:44:89:52:64:8a:35:42:d0:e8.
        Are you sure you want to continue connecting (yes/no)?
      ^C
        #
      
      To get the previous behaviour just use --syscalls and get all syscalls formatted
      strace like + the specified extra events:
      
        # trace -e sched:*switch --syscalls sleep 1
        <SNIP>
           0.160 ( 0.003 ms): sleep/12877 mprotect(start: 0x7fdfe2361000, len: 4096, prot: READ) = 0
           0.164 ( 0.009 ms): sleep/12877 munmap(addr: 0x7fdfe2345000, len: 113155) = 0
           0.211 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce68e000
           0.212 ( 0.002 ms): sleep/12877 brk(brk: 0x55d3ce6af000) = 0x55d3ce6af000
           0.215 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce6af000
           0.219 ( 0.004 ms): sleep/12877 open(filename: 0xe1f07c00, flags: CLOEXEC) = 3
           0.225 ( 0.001 ms): sleep/12877 fstat(fd: 3, statbuf: 0x7fdfe2138aa0) = 0
           0.227 ( 0.003 ms): sleep/12877 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7fdfdb1b8000
           0.234 ( 0.001 ms): sleep/12877 close(fd: 3) = 0
           0.257 (         ): sleep/12877 nanosleep(rqtp: 0x7fffb36b6020) ...
           0.260 (         ): sched:sched_switch:prev_comm=sleep prev_pid=12877 prev_prio=120 prev_state=D ==> next_comm=swapper/3 next_pid=0 next_prio=120
           0.257 (1000.134 ms): sleep/12877  ... [continued]: nanosleep()) = 0
        1000.428 ( 0.006 ms): sleep/12877 close(fd: 1) = 0
        1000.440 ( 0.004 ms): sleep/12877 close(fd: 2) = 0
        1000.461 (         ): sleep/12877 exit_group()
        #
      
      When specifiying just some syscalls, the behaviour doesn't change, i.e.:
      
        # trace -e nanosleep -e sched:*switch sleep 1
           0.000 (         ): sleep/14974 nanosleep(rqtp: 0x7ffc344ba9c0                                        ) ...
           0.007 (         ): sched:sched_switch:prev_comm=sleep prev_pid=14974 prev_prio=120 prev_state=D ==> next_comm=swapper/2 next_pid=0 next_prio=120
           0.000 (1000.139 ms): sleep/14974  ... [continued]: nanosleep()) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-om2fulll97ytnxv40ler8jkf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b912885a
  4. 01 8月, 2018 2 次提交
    • A
      perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h · 822c2621
      Arnaldo Carvalho de Melo 提交于
      The next example scripts need the definition for the BPF functions, i.e.
      things like BPF_FUNC_probe_read, and in time will require lots of other
      definitions found in uapi/linux/bpf.h, so include it from the bpf.h file
      included from the eBPF scripts build with clang via '-e bpf_script.c'
      like in this example:
      
        $ tail -8 tools/perf/examples/bpf/5sec.c
        #include <bpf.h>
      
        int probe(hrtimer_nanosleep, rqtp->tv_sec)(void *ctx, int err, long sec)
        {
      	return sec == 5;
        }
      
        license(GPL);
        $
      
      That 'bpf.h' include in the 5sec.c eBPF example will come from a set of
      header files crafted for building eBPF objects, that in a end-user
      system will come from:
      
        /usr/lib/perf/include/bpf/bpf.h
      
      And will include <uapi/linux/bpf.h> either from the place where the
      kernel was built, or from a kernel-devel rpm package like:
      
        -working-directory /lib/modules/4.17.9-100.fc27.x86_64/build
      
      That is set up by tools/perf/util/llvm-utils.c, and can be overriden
      by setting the 'kbuild-dir' variable in the "llvm" ~/.perfconfig file,
      like:
      
        # cat ~/.perfconfig
        [llvm]
             kbuild-dir = /home/foo/git/build/linux
      
      This usually doesn't need any change, just documenting here my findings
      while working with this code.
      
      In the future we may want to instead just use what is in
      /usr/include/linux/bpf.h, that comes from the UAPI provided from the
      kernel sources, for now, to avoid getting the kernel's non-UAPI
      "linux/bpf.h" file, that will cause clang to fail and is not what we
      want anyway (no BPF function definitions, etc), do it explicitely by
      asking for "uapi/linux/bpf.h".
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zd8zeyhr2sappevojdem9xxt@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      822c2621
    • C
      perf tools: Allow overriding MAX_NR_CPUS at compile time · 21b8732e
      Christophe Leroy 提交于
      After update of kernel, the perf tool doesn't run anymore on my 32MB RAM
      powerpc board, but still runs on a 128MB RAM board:
      
        ~# strace perf
        execve("/usr/sbin/perf", ["perf"], [/* 12 vars */]) = -1 ENOMEM (Cannot allocate memory)
        --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
        +++ killed by SIGSEGV +++
        Segmentation fault
      
      objdump -x shows that .bss section has a huge size of 24Mbytes:
      
       27 .bss          016baca8  101cebb8  101cebb8  001cd988  2**3
      
      With especially the following objects having quite big size:
      
        10205f80 l     O .bss	00140000     runtime_cycles_stats
        10345f80 l     O .bss	00140000     runtime_stalled_cycles_front_stats
        10485f80 l     O .bss	00140000     runtime_stalled_cycles_back_stats
        105c5f80 l     O .bss	00140000     runtime_branches_stats
        10705f80 l     O .bss	00140000     runtime_cacherefs_stats
        10845f80 l     O .bss	00140000     runtime_l1_dcache_stats
        10985f80 l     O .bss	00140000     runtime_l1_icache_stats
        10ac5f80 l     O .bss	00140000     runtime_ll_cache_stats
        10c05f80 l     O .bss	00140000     runtime_itlb_cache_stats
        10d45f80 l     O .bss	00140000     runtime_dtlb_cache_stats
        10e85f80 l     O .bss	00140000     runtime_cycles_in_tx_stats
        10fc5f80 l     O .bss	00140000     runtime_transaction_stats
        11105f80 l     O .bss	00140000     runtime_elision_stats
        11245f80 l     O .bss	00140000     runtime_topdown_total_slots
        11385f80 l     O .bss	00140000     runtime_topdown_slots_retired
        114c5f80 l     O .bss	00140000     runtime_topdown_slots_issued
        11605f80 l     O .bss	00140000     runtime_topdown_fetch_bubbles
        11745f80 l     O .bss	00140000     runtime_topdown_recovery_bubbles
      
      This is due to commit 4d255766 ("perf: Bump max number of cpus
      to 1024"), because many tables are sized with MAX_NR_CPUS
      
      This patch gives the opportunity to redefine MAX_NR_CPUS via
      
        $ make EXTRA_CFLAGS=-DMAX_NR_CPUS=1
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20170922112043.8349468C57@po15668-vm-win7.idsi0.si.c-s.frSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      21b8732e