1. 06 12月, 2011 10 次提交
    • R
      perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
      Robert Richter 提交于
      The current x86 event scheduler fails to resolve scheduling problems
      of certain combinations of events and constraints. This happens if the
      counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight, e.g. constraints
      of the AMD family 15h pmu:
      
                              counter mask    weight
      
       amd_f15_PMC30          0x09            2  <--- overlapping counters
       amd_f15_PMC20          0x07            3
       amd_f15_PMC53          0x38            3
      
      The scheduler does not find then an existing solution. Here is an
      example:
      
       event code     counter         failure         possible solution
      
       0x02E          PMC[3,0]        0               3
       0x043          PMC[2:0]        1               0
       0x045          PMC[2:0]        2               1
       0x046          PMC[2:0]        FAIL            2
      
      The event scheduler may not select the correct counter in the first
      cycle because it needs to know which subsequent events will be
      scheduled. It may fail to schedule the events then.
      
      To solve this, we now save the scheduler state of events with
      overlapping counter counstraints.  If we fail to schedule the events
      we rollback to those states and try to use another free counter.
      
      Constraints with overlapping counters are marked with a new introduced
      overlap flag. We set the overlap flag for such constraints to give the
      scheduler a hint which events to select for counter rescheduling. The
      EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
      
      Care must be taken as the rescheduling algorithm is O(n!) which will
      increase scheduling cycles for an over-commited system dramatically.
      The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
      masks must be kept at a minimum. Thus, the current stack is limited to
      2 states to limit the number of loops the algorithm takes in the worst
      case.
      
      On systems with no overlapping-counter constraints, this
      implementation does not increase the loop count compared to the
      previous algorithm.
      
      V2:
      * Renamed redo -> overlap.
      * Reimplementation using perf scheduling helper functions.
      
      V3:
      * Added WARN_ON_ONCE() if out of save states.
      * Changed function interface of perf_sched_restore_state() to use bool
        as return value.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      bc1738f6
    • R
      perf, x86: Implement event scheduler helper functions · 1e2ad28f
      Robert Richter 提交于
      This patch introduces x86 perf scheduler code helper functions. We
      need this to later add more complex functionality to support
      overlapping counter constraints (next patch).
      
      The algorithm is modified so that the range of weight values is now
      generated from the constraints. There shouldn't be other functional
      changes.
      
      With the helper functions the scheduler is controlled. There are
      functions to initialize, traverse the event list, find unused counters
      etc. The scheduler keeps its own state.
      
      V3:
      * Added macro for_each_set_bit_cont().
      * Changed functions interfaces of perf_sched_find_counter() and
        perf_sched_next_event() to use bool as return value.
      * Added some comments to make code better understandable.
      
      V4:
      * Fix broken event assignment if weight of the first event is not
        wmin (perf_sched_init()).
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1e2ad28f
    • P
      perf: Avoid a useless pmu_disable() in the perf-tick · 0f5a2601
      Peter Zijlstra 提交于
      Gleb writes:
      
       > Currently pmu is disabled and re-enabled on each timer interrupt even
       > when no rotation or frequency adjustment is needed. On Intel CPU this
       > results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal
       > it does not cause significant slowdown, but when running perf in a virtual
       > machine it leads to 20% slowdown on my machine.
      
      Cure this by keeping a perf_event_context::nr_freq counter that counts the
      number of active events that require frequency adjustments and use this in a
      similar fashion to the already existing nr_events != nr_active test in
      perf_rotate_context().
      
      By being able to exclude both rotation and frequency adjustments a-priory for
      the common case we can avoid the otherwise superfluous PMU disable.
      Suggested-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      0f5a2601
    • I
      Merge branch 'perf/urgent' into perf/core · d6c1c49d
      Ingo Molnar 提交于
      Merge reason: Add these cherry-picked commits so that future changes
                    on perf/core don't conflict.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6c1c49d
    • S
      ftrace: Fix hash record accounting bug · ddf6e0e5
      Steven Rostedt 提交于
      If the set_ftrace_filter is cleared by writing just whitespace to
      it, then the filter hash refcounts will be decremented but not
      updated. This causes two bugs:
      
      1) No functions will be enabled for tracing when they all should be
      
      2) If the users clears the set_ftrace_filter twice, it will crash ftrace:
      
      ------------[ cut here ]------------
      WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7()
      Modules linked in:
      Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32
      Call Trace:
       [<ffffffff81051828>] warn_slowpath_common+0x83/0x9b
       [<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c
       [<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7
       [<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f
       [<ffffffff8111bdfe>] ? kfree+0xe5/0x115
       [<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151
       [<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f
       [<ffffffff8112e49a>] fput+0xfd/0x1c2
       [<ffffffff8112b54c>] filp_close+0x6d/0x78
       [<ffffffff8113a92d>] sys_dup3+0x197/0x1c1
       [<ffffffff8113a9a6>] sys_dup2+0x4f/0x54
       [<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b
      ---[ end trace 77a3a7ee73794a02 ]---
      
      Link: http://lkml.kernel.org/r/20111101141420.GA4918@debianReported-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ddf6e0e5
    • S
      perf: Fix parsing of __print_flags() in TP_printk() · d06c27b2
      Steven Rostedt 提交于
      A update is made to the sched:sched_switch event that adds some
      logic to the first parameter of the __print_flags() that shows the
      state of tasks. This change cause perf to fail parsing the flags.
      
      A simple fix is needed to have the parser be able to process ops
      within the argument.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndrew Vagin <avagin@openvz.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d06c27b2
    • G
      jump_label: jump_label_inc may return before the code is patched · bbbf7af4
      Gleb Natapov 提交于
      If cpu A calls jump_label_inc() just after atomic_add_return() is
      called by cpu B, atomic_inc_not_zero() will return value greater then
      zero and jump_label_inc() will return to a caller before jump_label_update()
      finishes its job on cpu B.
      
      Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com
      
      Cc: stable@vger.kernel.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bbbf7af4
    • S
      ftrace: Remove force undef config value left for testing · c7c6ec8b
      Steven Rostedt 提交于
      A forced undef of a config value was used for testing and was
      accidently left in during the final commit. This causes x86 to
      run slower than needed while running function tracing as well
      as causes the function graph selftest to fail when DYNMAIC_FTRACE
      is not set. This is because the code in MCOUNT expects the ftrace
      code to be processed with the config value set that happened to
      be forced not set.
      
      The forced config option was left in by:
          commit 6331c28c
          ftrace: Fix dynamic selftest failure on some archs
      
      Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian
      
      Cc: stable@vger.kernel.org
      Reported-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c7c6ec8b
    • L
      tracing: Restore system filter behavior · 27b14b56
      Li Zefan 提交于
      Though not all events have field 'prev_pid', it was allowed to do this:
      
        # echo 'prev_pid == 100' > events/sched/filter
      
      but commit 75b8e982 (tracing/filter: Swap
      entire filter of events) broke it without any reason.
      
      Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.comSigned-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      27b14b56
    • I
      tracing: fix event_subsystem ref counting · cb599747
      Ilya Dryomov 提交于
      Fix a bug introduced by e9dbfae5, which prevents event_subsystem from
      ever being released.
      
      Ref_count was added to keep track of subsystem users, not for counting
      events.  Subsystem is created with ref_count = 1, so there is no need to
      increment it for every event, we have nr_events for that.  Fix this by
      touching ref_count only when we actually have a new user -
      subsystem_open().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      cb599747
  2. 05 12月, 2011 11 次提交
  3. 03 12月, 2011 1 次提交
  4. 02 12月, 2011 3 次提交
    • A
      perf test: Validate PERF_RECORD_ events and perf_sample fields · 3e7c439a
      Arnaldo Carvalho de Melo 提交于
      This new test will validate these new routines extracted from 'perf
      record':
      
       - perf_evlist__config_attrs
       - perf_evlist__prepare_workload
       - perf_evlist__start_workload
      
      In addition to several other perf_evlist methods.
      
      It consists of starting a simple workload, setting up just one event to
      monitor ("cycles") requesting that several PERF_SAMPLE_ fields be
      present in all events.
      
      It then will check that the expected PERF_RECORD_ events are produced
      and will sanity check all its fields.
      
      Some checks performed:
      
      . PERF_SAMPLE_TIME monotonically increases.
      
      . PERF_SAMPLE_CPU is the one requested with sched_setaffinity
      
      . PERF_SAMPLE_TID and PERF_SAMPLE_PID matches the one we forked
        in perf_evlist__prepare_workload and that is stored in
        evlist->workload.pid
      
      . For the events where these fields are also present in its
        pre-sample_id_all fields (e.g. event->mmap.pid), that they are what
        is expected too.
      
      . That we get a bunch of mmaps:
      
        PATH/libcSUFFIX
        PATH/ldSUFFIX
        [vdso]
        PATH/sleep
      
      Example:
      
        [root@emilia ~]# taskset -c 3,4 perf test -v1 perf_sample
         6: Validate PERF_RECORD_* events & perf_sample fields:
        --- start ---
        7159480799825 3 PERF_RECORD_SAMPLE
        7159480805584 3 PERF_RECORD_SAMPLE
        7159480807814 3 PERF_RECORD_SAMPLE
        7159480810430 3 PERF_RECORD_SAMPLE
        7159480861511 3 PERF_RECORD_MMAP 8086/8086: [0x7fffffffd000(0x2000) @ 0x7fffffffd000]: //anon
        7159481052516 3 PERF_RECORD_COMM: sleep:8086
        7159481070188 3 PERF_RECORD_MMAP 8086/8086: [0x400000(0x6000) @ 0]: /bin/sleep
        7159481077104 3 PERF_RECORD_MMAP 8086/8086: [0x3d06400000(0x221000) @ 0]: /lib64/ld-2.12.so
        7159481092912 3 PERF_RECORD_MMAP 8086/8086: [0x7fff1adff000(0x1000) @ 0x7fff1adff000]: [vdso]
        7159481196779 3 PERF_RECORD_MMAP 8086/8086: [0x3d06800000(0x37f000) @ 0]: /lib64/libc-2.12.so
        7160481558435 3 PERF_RECORD_EXIT(8086:8086):(8086:8086)
        ---- end ----
        Validate PERF_RECORD_* events & perf_sample fields: Ok
        [root@emilia ~]#
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-svag18v2z4idas0dyz3umjpq@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e7c439a
    • A
      perf event: Introduce perf_event__fprintf · 482ad897
      Arnaldo Carvalho de Melo 提交于
      So that tools like 'perf test' can print the events when in verbose
      mode, for instance.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-xnovdqfi25nc48gy6604k7yp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      482ad897
    • T
      trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter · d3d9acf6
      Tejun Heo 提交于
      ftrace_event_call->filter is sched RCU protected but didn't use
      rcu_assign_pointer().  Use it.
      
      TODO: Add proper __rcu annotation to call->filter and all its users.
      
      -v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric.
      
      Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@kernel.org # (2.6.39+)
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d3d9acf6
  5. 30 11月, 2011 1 次提交
    • A
      perf test: Allow running just a subset of the available tests · e60770a0
      Arnaldo Carvalho de Melo 提交于
      To obtain a list of available tests:
      
      [root@emilia linux]# perf test list
       1: vmlinux symtab matches kallsyms
       2: detect open syscall event
       3: detect open syscall event on all cpus
       4: read samples using the mmap interface
       5: parse events tests
      [root@emilia linux]#
      
      To list just a subset:
      
      [root@emilia linux]# perf test list syscall
       2: detect open syscall event
       3: detect open syscall event on all cpus
      [root@emilia linux]#
      
      To run a subset:
      
      [root@emilia linux]# perf test detect
       2: detect open syscall event: Ok
       3: detect open syscall event on all cpus: Ok
      [root@emilia linux]#
      
      Specific tests can be chosen by number:
      
      [root@emilia linux]# perf test 1 3 parse
       1: vmlinux symtab matches kallsyms: Ok
       3: detect open syscall event on all cpus: Ok
       5: parse events tests: Ok
      [root@emilia linux]#
      
      Now to write more tests!
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-nqec2145qfxdgimux28aw7v8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e60770a0
  6. 29 11月, 2011 2 次提交
    • A
      perf evlist: Always do automatic allocation of pollfd and mmap structures · 806fb630
      Arnaldo Carvalho de Melo 提交于
      At first tools were required to do that, but while writing the python
      bindings to simplify the API I made them auto-allocate when needed.
      
      This just makes record, stat and top use that auto allocation,
      simplifying them a bit.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-iokhcvkzzijr3keioubx8hlq@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      806fb630
    • A
      perf tools: Save some loops using perf_evlist__id2evsel · ee29be62
      Arnaldo Carvalho de Melo 提交于
      Since we already ask for PERF_SAMPLE_ID and use it to quickly find the
      associated evsel, add handler func + data to struct perf_evsel to avoid
      using chains of if(strcmp(event_name)) and also to avoid all the linear
      list searches via trace_event_find.
      
      To demonstrate the technique convert 'perf sched' to it:
      
       # perf sched record sleep 5m
      
      And then:
      
       Performance counter stats for '/tmp/oldperf sched lat':
      
              646.929438 task-clock                #    0.999 CPUs utilized
                       9 context-switches          #    0.000 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                  20,901 page-faults               #    0.032 M/sec
           1,290,144,450 cycles                    #    1.994 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
           1,606,158,439 instructions              #    1.24  insns per cycle
             339,088,395 branches                  #  524.151 M/sec
               4,550,735 branch-misses             #    1.34% of all branches
      
             0.647524759 seconds time elapsed
      
      Versus:
      
       Performance counter stats for 'perf sched lat':
      
              473.564691 task-clock                #    0.999 CPUs utilized
                       9 context-switches          #    0.000 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                  20,903 page-faults               #    0.044 M/sec
             944,367,984 cycles                    #    1.994 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
           1,442,385,571 instructions              #    1.53  insns per cycle
             308,383,106 branches                  #  651.195 M/sec
               4,481,784 branch-misses             #    1.45% of all branches
      
             0.474215751 seconds time elapsed
      
      [root@emilia ~]#
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-1kbzpl74lwi6lavpqke2u2p3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ee29be62
  7. 28 11月, 2011 12 次提交