1. 23 3月, 2015 1 次提交
  2. 25 2月, 2015 9 次提交
    • M
      perf/x86/intel: Enable conflicting event scheduling for CQM · 59bf7fd4
      Matt Fleming 提交于
      We can leverage the workqueue that we use for RMID rotation to support
      scheduling of conflicting monitoring events. Allowing events that
      monitor conflicting things is done at various other places in the perf
      subsystem, so there's precedent there.
      
      An example of two conflicting events would be monitoring a cgroup and
      simultaneously monitoring a task within that cgroup.
      
      This uses the cache_groups list as a queuing mechanism, where every
      event that reaches the front of the list gets the chance to be scheduled
      in, possibly descheduling any conflicting events that are running.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-10-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      59bf7fd4
    • M
      perf/x86/intel: Perform rotation on Intel CQM RMIDs · bff671db
      Matt Fleming 提交于
      There are many use cases where people will want to monitor more tasks
      than there exist RMIDs in the hardware, meaning that we have to perform
      some kind of multiplexing.
      
      We do this by "rotating" the RMIDs in a workqueue, and assigning an RMID
      to a waiting event when the RMID becomes unused.
      
      This scheme reserves one RMID at all times for rotation. When we need to
      schedule a new event we give it the reserved RMID, pick a victim event
      from the front of the global CQM list and wait for the victim's RMID to
      drop to zero occupancy, before it becomes the new reserved RMID.
      
      We put the victim's RMID onto the limbo list, where it resides for a
      "minimum queue time", which is intended to save ourselves an expensive
      smp IPI when the RMID is unlikely to have a occupancy value below
      __intel_cqm_threshold.
      
      If we fail to recycle an RMID, even after waiting the minimum queue time
      then we need to increment __intel_cqm_threshold. There is an upper bound
      on this threshold, __intel_cqm_max_threshold, which is programmable from
      userland as /sys/devices/intel_cqm/max_recycling_threshold.
      
      The comments above __intel_cqm_rmid_rotate() have more details.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-9-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bff671db
    • M
      perf/x86/intel: Support task events with Intel CQM · bfe1fcd2
      Matt Fleming 提交于
      Add support for task events as well as system-wide events. This change
      has a big impact on the way that we gather LLC occupancy values in
      intel_cqm_event_read().
      
      Currently, for system-wide (per-cpu) events we defer processing to
      userspace which knows how to discard all but one cpu result per package.
      
      Things aren't so simple for task events because we need to do the value
      aggregation ourselves. To do this, we defer updating the LLC occupancy
      value in event->count from intel_cqm_event_read() and do an SMP
      cross-call to read values for all packages in intel_cqm_event_count().
      We need to ensure that we only do this for one task event per cache
      group, otherwise we'll report duplicate values.
      
      If we're a system-wide event we want to fallback to the default
      perf_event_count() implementation. Refactor this into a common function
      so that we don't duplicate the code.
      
      Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an
      event's task (if the event isn't per-cpu) inside of the Intel CQM PMU
      driver.  This task information is only availble in the upper layers of
      the perf infrastructure.
      
      Other perf backends stash the target task in event->hw.*target so we
      need to do something similar. The task is used to determine whether
      events should share a cache group and an RMID.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/1422038748-21397-8-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bfe1fcd2
    • M
      perf/x86/intel: Implement LRU monitoring ID allocation for CQM · 35298e55
      Matt Fleming 提交于
      It's possible to run into issues with re-using unused monitoring IDs
      because there may be stale cachelines associated with that ID from a
      previous allocation. This can cause the LLC occupancy values to be
      inaccurate.
      
      To attempt to mitigate this problem we place the IDs on a least recently
      used list, essentially a FIFO. The basic idea is that the longer the
      time period between ID re-use the lower the probability that stale
      cachelines exist in the cache.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-7-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35298e55
    • M
      perf/x86/intel: Add Intel Cache QoS Monitoring support · 4afbb24c
      Matt Fleming 提交于
      Future Intel Xeon processors support a Cache QoS Monitoring feature that
      allows tracking of the LLC occupancy for a task or task group, i.e. the
      amount of data in pulled into the LLC for the task (group).
      
      Currently the PMU only supports per-cpu events. We create an event for
      each cpu and read out all the LLC occupancy values.
      
      Because this results in duplicate values being written out to userspace,
      we also export a .per-pkg event file so that the perf tools only
      accumulate values for one cpu per package.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-6-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4afbb24c
    • P
      x86: Add support for Intel Cache QoS Monitoring (CQM) detection · cbc82b17
      Peter P Waskiewicz Jr 提交于
      This patch adds support for the new Cache QoS Monitoring (CQM)
      feature found in future Intel Xeon processors.  It includes the
      new values to track CQM resources to the cpuinfo_x86 structure,
      plus the CPUID detection routines for CQM.
      
      CQM allows a process, or set of processes, to be tracked by the CPU
      to determine the cache usage of that task group.  Using this data
      from the CPU, software can be written to extract this data and
      report cache usage and occupancy for a particular process, or
      group of processes.
      
      More information about Cache QoS Monitoring can be found in the
      Intel (R) x86 Architecture Software Developer Manual, section 17.14.
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Webb <chris@arachsys.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Honeyman <stevenhoneyman@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cbc82b17
    • M
      perf: Move cgroup init before PMU ->event_init() · 79dff51e
      Matt Fleming 提交于
      The Intel QoS PMU needs to know whether an event is part of a cgroup
      during ->event_init(), because tasks in the same cgroup share a
      monitoring ID.
      
      Move the cgroup initialisation before calling into the PMU driver.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-4-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      79dff51e
    • M
      perf: Add ->count() function to read per-package counters · eacd3ecc
      Matt Fleming 提交于
      For PMU drivers that record per-package counters, the ->count variable
      cannot be used to record an accurate aggregated value, since it's not
      possible to perform SMP cross-calls to cpus on other packages from the
      context in which we update ->count.
      
      Introduce a new optional ->count() accessor function that can be used to
      customize how values are collected. If a PMU driver doesn't provide a
      ->count() function, we fallback to the existing code.
      
      There is necessarily a window of staleness with this approach because
      the task that generated the counter value may not have been scheduled by
      the cpu recently.
      
      An alternative and more complex approach would be to use a hrtimer to
      periodically refresh the values from a more permissive scheduling
      context. So, we're trading off complexity for accuracy.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-3-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eacd3ecc
    • M
      perf: Make perf_cgroup_from_task() global · 39bed6cb
      Matt Fleming 提交于
      Move perf_cgroup_from_task() from kernel/events/ to include/linux/ along
      with the necessary struct definitions, so that it can be used by the PMU
      code.
      
      When the upcoming Intel Cache Monitoring PMU driver assigns monitoring
      IDs to perf events, it needs to be able to check whether any two
      monitoring events overlap (say, a cgroup and task event), which means we
      need to be able to lookup the cgroup associated with a task (if any).
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      39bed6cb
  3. 19 2月, 2015 22 次提交
  4. 14 2月, 2015 3 次提交
    • A
      perf trace: Support --events foo:bar --no-syscalls · 726f3234
      Arnaldo Carvalho de Melo 提交于
      I.e. support tracing just tracepoints, without strace like
      raw_syscalls:*.
      
      [acme@ssdandy linux]$ trace --no-sys --ev sched:*exec,sched:*switch,sched:*exit usleep 1
        0.048 (     ): sched:sched_process_exec:filename=/usr/bin/usleep pid=27298 old_pid=27298)
        0.369 (     ): sched:sched_switch:usleep:27298 [120] S ==> swapper/5:0 [120])
        0.452 (     ): sched:sched_process_exit:comm=usleep pid=27298 prio=120)
      [acme@ssdandy linux]$
      
      TODO: remove that (...) thing when --no-syscalls is specified.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-vn0hsixsbhm31b2rpj97r96k@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      726f3234
    • A
      perf trace: Allow mixing with other events · 14a052df
      Arnaldo Carvalho de Melo 提交于
      Basically adopting 'perf record' --event command line argument syntax:
      
       # trace -e \!mprotect,mmap,munmap,open,close,read,fstat,access,arch_prctl --event sched:*switch,sched:*exec,sched:*exit usleep 1
        0.048 (        ): sched:sched_process_exec:filename=/bin/usleep pid=24732 old_pid=24732)
        0.078 (0.002 ms): usleep/24732 brk(                          ) = 0x78f000
        0.430 (0.002 ms): usleep/24732 brk(                          ) = 0x78f000
        0.434 (0.003 ms): usleep/24732 brk(brk: 0x7b0000             ) = 0x7b0000
        0.438 (0.001 ms): usleep/24732 brk(                          ) = 0x7b0000
        0.460 (0.004 ms): usleep/24732 nanosleep(rqtp: 0x7ffff3696a40) ...
        0.460 (        ): sched:sched_switch:prev_comm=usleep prev_pid=24732 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120)
        0.515 (0.058 ms): usleep/24732  ... [continued]: nanosleep()) = 0
        0.520 (0.000 ms): usleep/24732 exit_group(
        0.550 (        ): sched:sched_process_exit:comm=usleep pid=24732 prio=120)
       #
      
      Next steps, probably in this order:
      
      1) Use ordered_events code, the logic in trace needs the events to be
         time ordered when needed, i.e. when multiple CPUs are involved.
      
      2) Callchains!
      
      3) Automatically account for interruptions when saying how long things
         took.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-gpst8mph575yb4wgf91qibyb@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      14a052df
    • A
      perf trace: Handle multiple threads better wrt syscalls being intermixed · e596663e
      Arnaldo Carvalho de Melo 提交于
       $ trace time taskset -c 0 usleep 1
         0.845 ( 0.021 ms): time/16722 wait4(upid: 4294967295, stat_addr: 0x7fff17f443d4, ru: 0x7fff17f44438 ) ...
         0.865 ( 0.008 ms): time/16723 execve(arg0: 140733595272004, arg1: 140733595272720, arg2: 140733595272768, arg3: 139755107218496, arg4: 7307199665339051828, arg5: 3) = -2
         2.395 ( 1.523 ms): taskset/16723 execve(arg0: 140733595272013, arg1: 140733595272720, arg2: 140733595272768, arg3: 139755107218496, arg4: 7307199665339051828, arg5: 3) = 0
         2.411 ( 0.002 ms): taskset/16723 brk(                                                                  ) = 0x1915000
         3.300 ( 0.058 ms): usleep/16723 nanosleep(rqtp: 0x7ffff4ada190                                        ) = 0
       <SNIP>
         3.305 ( 0.000 ms): usleep/16723 exit_group(
         3.363 ( 2.539 ms): time/16722  ... [continued]: wait4()) = 16723
         3.366 ( 0.001 ms): time/16722 rt_sigaction(sig: INT, act: 0x7fff17f44160, oact: 0x7fff17f44200, sigsetsize: 8) = 0
      
      We we're not seeing this line:
      
        0.845 ( 0.021 ms): time/16722 wait4(upid: 4294967295, stat_addr: 0x7fff17f443d4, ru: 0x7fff17f44438 ) ...
      
      just the one when it finishes:
      
        3.363 ( 2.539 ms): time/16722  ... [continued]: wait4()) = 16723
      
      Still some issues left till we move to ordered_samples when multiple
      CPUs/threads are involved...
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-zq9x30a1ky3djqewqn2v3ja3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e596663e
  5. 13 2月, 2015 5 次提交
    • A
      perf trace: Print thread info when following children · 42052bea
      Arnaldo Carvalho de Melo 提交于
      The default for 'trace workload' is to set perf_event_attr.inherit to 1,
      i.e. to make it equivalent to 'strace -f workload', so we were ending
      with syscalls for multiple processes mixed up, fix it:
      
      Before:
      
        [root@ssdandy ~]# trace -e brk time usleep 1
           0.071 ( 0.002 ms): brk(              ) = 0x100e000
           0.802 ( 0.001 ms): brk(              ) = 0x1d99000
           1.132 ( 0.003 ms): brk(              ) = 0x1d99000
           1.136 ( 0.003 ms): brk(brk: 0x1dba000) = 0x1dba000
           1.140 ( 0.001 ms): brk(              ) = 0x1dba000
        0.00user 0.00system 0:00.00elapsed 63%CPU (0avgtext+0avgdata 528maxresident)k
        0inputs+0outputs (0major+181minor)pagefaults 0swaps
        [root@ssdandy ~]#
      
      After:
      
        [root@ssdandy ~]# trace -f -e brk time usleep 1
           0.072 ( 0.002 ms): time/26308 brk(               ) = 0x1e6e000
           0.860 ( 0.001 ms): usleep/26309 brk(             ) = 0xb91000
           1.193 ( 0.003 ms): usleep/26309 brk(             ) = 0xb91000
           1.197 ( 0.003 ms): usleep/26309 brk(brk: 0xbb2000) = 0xbb2000
           1.201 ( 0.001 ms): usleep/26309 brk(             ) = 0xbb2000
        0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 524maxresident)k
        0inputs+0outputs (0major+180minor)pagefaults 0swaps
        [root@ssdandy ~]#
      
      BTW: to achieve the 'strace workload' behaviour, i.e. without a explicit
      '-f', one has to use --no-inherit.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      echo Link: http://lkml.kernel.org/n/tip-`ranpwd -l 24`@git.kernel.org
      Link: http://lkml.kernel.org/n/tip-2wu2d5n65msxoq1i7vtcaft2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42052bea
    • Y
      perf list: Place the header text in its right position · 619a303c
      Yunlong Song 提交于
      The hearer text 'List of pre-defined events (to be used in -e):' is
      placed in an improper function, which causes an abnormal output, e.g.
      'perf list hw' shows no guiding text at all, and 'perf list hw
      L1-dcache*' shows the guiding text incorrectly in the middle of the
      output.
      
      Example
      Before this patch:
      
       $ perf list hw L1-dcache*
      
         branch-instructions OR branches                    [Hardware event]
         branch-misses                                      [Hardware event]
         bus-cycles                                         [Hardware event]
         cache-misses                                       [Hardware event]
         cache-references                                   [Hardware event]
         cpu-cycles OR cycles                               [Hardware event]
         instructions                                       [Hardware event]
         stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
         stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
      
       List of pre-defined events (to be used in -e):              <-- incorrect position
         L1-dcache-load-misses                              [Hardware cache event]
         L1-dcache-loads                                    [Hardware cache event]
         L1-dcache-prefetch-misses                          [Hardware cache event]
         L1-dcache-prefetches                               [Hardware cache event]
         L1-dcache-store-misses                             [Hardware cache event]
         L1-dcache-stores                                   [Hardware cache event]
      
      After this patch:
      
       $ perf list hw L1-dcache*
      
       List of pre-defined events (to be used in -e):              <-- correct position
      
         branch-instructions OR branches                    [Hardware event]
         branch-misses                                      [Hardware event]
         bus-cycles                                         [Hardware event]
         cache-misses                                       [Hardware event]
         cache-references                                   [Hardware event]
         cpu-cycles OR cycles                               [Hardware event]
         instructions                                       [Hardware event]
         stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
         stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
      
         L1-dcache-load-misses                              [Hardware cache event]
         L1-dcache-loads                                    [Hardware cache event]
         L1-dcache-prefetch-misses                          [Hardware cache event]
         L1-dcache-prefetches                               [Hardware cache event]
         L1-dcache-store-misses                             [Hardware cache event]
         L1-dcache-stores                                   [Hardware cache event]
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1423833115-11199-8-git-send-email-yunlong.song@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      619a303c
    • K
      perf: Remove the extra validity check on nr_pages · 74390aa5
      Kaixu Xia 提交于
      The function is_power_of_2() also do the check on nr_pages, so the first
      check performed is unnecessary. On the other hand, the key point is to
      ensure @nr_pages is a power-of-two number and mostly @nr_pages is a
      nonzero value, so in the most cases, the function is_power_of_2() will
      be called.
      Signed-off-by: NKaixu Xia <xiakaixu@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Link: http://lkml.kernel.org/r/1422352512-75150-1-git-send-email-xiakaixu@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      74390aa5
    • Y
      perf tools: Fix a bug of segmentation fault · 3a03005f
      Yunlong Song 提交于
      Fix the 'segmentation fault' bug of 'perf list --list-cmds', which also
      happens in other cases (e.g. record, report ...). This bug happens when
      there are no cmds to list at all.
      
      Example:
      
      Before this patch:
      
        $ perf list --list-cmds
        Segmentation fault
        $
      
        After this patch:
        $ perf list --list-cmds
        $
      
      As shown above, the result prints nothing rather than a segmentation
      fault. The null result means 'perf list' has no cmds to display at this
      time.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1423833115-11199-5-git-send-email-yunlong.song@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3a03005f
    • J
      perf build: Display make commands on V=1 · ceed252f
      Jiri Olsa 提交于
      Get more verbose output wrt displaying executed commands from make.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Tested-by: NWill Deacon <will.deacon@arm.com>
      Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-68v67h59zoz7ilb1ggcuff3j@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ceed252f