提交 · 3e76ac78b08479e84a3eca3fb1b3066fb8230461 · openeuler / raspberrypi-kernel

20 12月, 2011 2 次提交

perf record: Add ability to record event period · 3e76ac78

由 Andrew Vagin 提交于 12月 20, 2011

The problem is that when SAMPLE_PERIOD is not set, the kernel generates
a number of samples in proportion to an event's period. Number of these
samples may be too big and the kernel throttles all samples above a
defined limit.

E.g.: I want to trace when a process sleeps. I created a process which
sleeps for 1ms and for 4ms.  perf got 100 events in both cases.

swapper 0 [000] 1141.371830: sched_stat_sleep: comm=foo pid=1801 delay=1386750 [ns]
swapper 0 [000] 1141.369444: sched_stat_sleep: comm=foo pid=1801 delay=4499585 [ns]

In the first case a kernel want to send 4499585 events and in the second
case it wants to send 1386750 events.  perf-reports shows that process
sleeps in both places equal time.

Instead of this we can get only one sample with an attribute period. As
result we have less data transferring between kernel and user-space and
we avoid throttling of samples.

The patch "events: Don't divide events if it has field period" added a
kernel part of this functionality.
Acked-by: NArun Sharma <asharma@fb.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: devel@openvz.org
Link: http://lkml.kernel.org/r/1324391565-1369947-1-git-send-email-avagin@openvz.orgSigned-off-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3e76ac78

Merge branch 'for-tip' of... · 124ba940

由 Ingo Molnar 提交于 12月 20, 2011

Merge branch 'for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core

124ba940

12 12月, 2011 3 次提交

perf tools: Add ability to synthesize event according to a sample · 74eec26f

由 Andrew Vagin 提交于 11月 28, 2011

It's the counterpart of perf_session__parse_sample.

v2: fixed mistakes found by David Ahern.
v3: s/data/sample/
    s/perf_event__change_sample/perf_event__synthesize_sample
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: devel@openvz.org
Link: http://lkml.kernel.org/r/1323266161-394927-3-git-send-email-avagin@openvz.orgSigned-off-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

74eec26f

perf script: Implement option for system-wide profiling · 317df650

由 Robert Richter 提交于 11月 25, 2011

The option is documented in man perf-script but was not yet implemented:

       -a
           Force system-wide collection. Scripts run without a
           <command> normally use -a by default, while scripts run
           with a <command> normally don't - this option allows the
           latter to be run in system-wide mode.

As with perf record you now can profile in system-wide mode for the
runtime of a given command, e.g.:

 # perf script -a syscall-counts sleep 2

Cc: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/1322229925-10075-1-git-send-email-robert.richter@amd.comSigned-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

317df650

perf script: Fix mem leaks and NULL pointer checks around strdup()s · 38efb539

由 Robert Richter 提交于 11月 25, 2011

Fix mem leaks and missing NULL pointer checks after strdup().

And get_script_path() did not free __script_root in case of continue.

Introduce a helper function get_script_root().

Cc: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/1322217520-3287-1-git-send-email-robert.richter@amd.comSigned-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

38efb539

07 12月, 2011 8 次提交

oprofile, s390: Add event interface to the System z hardware sampling module · dd3c4670

由 Andreas Krebbel 提交于 11月 25, 2011

With this patch the OProfile Basic Mode Sampling support for System z
is enhanced with a counter file system.  That way hardware sampling
can be configured using the user space tools with only little
modifications.

With the patch by default new cpu_types (s390/z10, s390/z196) are
returned in order to indicate that we are running a CPU which provides
the hardware sampling facility.  Existing user space tools will
complain about an unknown cpu type. In order to be compatible with
existing user space tools the `cpu_type' module parameter has been
added.  Setting the parameter to `timer' will force the module to
return `timer' as cpu_type.  The module will still try to use hardware
sampling if available and the hwsampling virtual filesystem will be
also be available for configuration.  So this has a different effect
than using the generic oprofile module parameter `timer=1'.

If the basic mode sampling is enabled on the machine and the
cpu_type=timer parameter is not used the kernel module will provide
the following virtual filesystem:

/dev/oprofile/0/enabled
/dev/oprofile/0/event
/dev/oprofile/0/count
/dev/oprofile/0/unit_mask
/dev/oprofile/0/kernel
/dev/oprofile/0/user

In the counter file system only the values of 'enabled', 'count',
'kernel', and 'user' are evaluated by the kernel module. Everything
else must contain fixed values.

The 'event' value only supports a single event - HWSAMPLING with value
0.

The 'count' value specifies the hardware sampling rate as it is passed
to the CPU measurement facility.

The 'kernel' and 'user' flags can now be used to filter for samples
when using hardware sampling.

Additionally also the following file will be created:
/dev/oprofile/timer/enabled

This will always be the inverted value of /dev/oprofile/0/enabled. 0
is not accepted without hardware sampling.
Signed-off-by: NAndreas Krebbel <krebbel@linux.vnet.ibm.com>
Signed-off-by: NRobert Richter <robert.richter@amd.com>

dd3c4670

oprofile: Fix oprofile_timer_exit() breakage · f8c85203

由 Robert Richter 提交于 12月 07, 2011

Removing remainings of oprofile_timer_exit() completly.
Signed-off-by: NRobert Richter <robert.richter@amd.com>

f8c85203

perf, x86: Expose perf capability to other modules · b3d9468a

由 Gleb Natapov 提交于 11月 10, 2011

KVM needs to know perf capability to decide which PMU it can expose to a
guest.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320929850-10480-8-git-send-email-gleb@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b3d9468a

perf, x86: Implement arch event mask as quirk · c1d6f42f

由 Peter Zijlstra 提交于 12月 06, 2011

Implement the disabling of arch events as a quirk so that we can print
a message along with it. This creates some visibility into the problem
space and could allow us to work on adding more work-around like the
AAJ80 one.
Requested-by: NIngo Molnar <mingo@elte.hu>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-wcja2z48wklzu1b0nkz0a5y7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

c1d6f42f

x86, perf: Disable non available architectural events · ffb871bc

由 Gleb Natapov 提交于 11月 10, 2011

Intel CPUs report non-available architectural events in cpuid leaf
0AH.EBX. Use it to disable events that are not available according
to CPU.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320929850-10480-7-git-send-email-gleb@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

ffb871bc

jump_label: Provide jump_label_key initializers · ac99b862

由 Peter Zijlstra 提交于 7月 06, 2011

Provide two initializers for jump_label_key that initialize it enabled
or disabled. Also modify all jump_label code to allow for jump_labels to be
initialized enabled.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Baron <jbaron@redhat.com>
Link: http://lkml.kernel.org/n/tip-p40e3yj21b68y03z1yv825e7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

ac99b862

jump_label, x86: Fix section mismatch · 9cdbe1cb

由 Peter Zijlstra 提交于 12月 06, 2011

WARNING: arch/x86/kernel/built-in.o(.text+0x4c71): Section mismatch in
reference from the function arch_jump_label_transform_static() to the
function .init.text:text_poke_early()
The function arch_jump_label_transform_static() references
the function __init text_poke_early().
This is often because arch_jump_label_transform_static lacks a __init
annotation or the annotation of text_poke_early is wrong.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Baron <jbaron@redhat.com>
Link: http://lkml.kernel.org/n/tip-9lefe89mrvurrwpqw5h8xm8z@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

9cdbe1cb

Merge branch 'tip/perf/core' of... · cc991b83

由 Ingo Molnar 提交于 12月 06, 2011

Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core

cc991b83

06 12月, 2011 14 次提交

perf, core: Rate limit perf_sched_events jump_label patching · b2029520

由 Gleb Natapov 提交于 11月 27, 2011

jump_lable patching is very expensive operation that involves pausing all
cpus. The patching of perf_sched_events jump_label is easily controllable
from userspace by unprivileged user.

When te user runs a loop like this:

  "while true; do perf stat -e cycles true; done"

... the performance of my test application that just increments a counter
for one second drops by 4%.

This is on a 16 cpu box with my test application using only one of
them. An impact on a real server doing real work will be worse.

Performance of KVM PMU drops nearly 50% due to jump_lable for "perf
record" since KVM PMU implementation creates and destroys perf event
frequently.

This patch introduces a way to rate limit jump_label patching and uses
it to fix the above problem.

I believe that as jump_label use will spread the problem will become more
common and thus solving it in a generic code is appropriate. Also fixing
it in the perf code would result in moving jump_label accounting logic to
perf code with all the ifdefs in case of JUMP_LABEL=n kernel. With this
patch all details are nicely hidden inside jump_label code.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111127155909.GO2557@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b2029520

perf: Fix enable_on_exec for sibling events · b79387ef

由 Peter Zijlstra 提交于 11月 22, 2011

Deng-Cheng Zhu reported that sibling events that were created disabled
with enable_on_exec would never get enabled. Iterate all events
instead of the group lists.
Reported-by: NDeng-Cheng Zhu <dczhu@mips.com>
Tested-by: NDeng-Cheng Zhu <dczhu@mips.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322048382.14799.41.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

b79387ef

perf: Remove superfluous arguments · 1d9b482e

由 Peter Zijlstra 提交于 11月 23, 2011

Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-yv4o74vh90suyghccgykbnry@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

1d9b482e

perf, x86: Prefer fixed-purpose counters when scheduling · 4defea85

由 Peter Zijlstra 提交于 11月 10, 2011

This avoids a scheduling failure for cases like:

  cycles, cycles, instructions, instructions (on Core2)

Which would end up being programmed like:

  PMC0, PMC1, FP-instructions, fail

Because all events will have the same weight.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

4defea85

perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6

由 Robert Richter 提交于 11月 18, 2011

The current x86 event scheduler fails to resolve scheduling problems
of certain combinations of events and constraints. This happens if the
counter mask of such an event is not a subset of any other counter
mask of a constraint with an equal or higher weight, e.g. constraints
of the AMD family 15h pmu:

                        counter mask    weight

 amd_f15_PMC30          0x09            2  <--- overlapping counters
 amd_f15_PMC20          0x07            3
 amd_f15_PMC53          0x38            3

The scheduler does not find then an existing solution. Here is an
example:

 event code     counter         failure         possible solution

 0x02E          PMC[3,0]        0               3
 0x043          PMC[2:0]        1               0
 0x045          PMC[2:0]        2               1
 0x046          PMC[2:0]        FAIL            2

The event scheduler may not select the correct counter in the first
cycle because it needs to know which subsequent events will be
scheduled. It may fail to schedule the events then.

To solve this, we now save the scheduler state of events with
overlapping counter counstraints.  If we fail to schedule the events
we rollback to those states and try to use another free counter.

Constraints with overlapping counters are marked with a new introduced
overlap flag. We set the overlap flag for such constraints to give the
scheduler a hint which events to select for counter rescheduling. The
EVENT_CONSTRAINT_OVERLAP() macro can be used for this.

Care must be taken as the rescheduling algorithm is O(n!) which will
increase scheduling cycles for an over-commited system dramatically.
The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
masks must be kept at a minimum. Thus, the current stack is limited to
2 states to limit the number of loops the algorithm takes in the worst
case.

On systems with no overlapping-counter constraints, this
implementation does not increase the loop count compared to the
previous algorithm.

V2:
* Renamed redo -> overlap.
* Reimplementation using perf scheduling helper functions.

V3:
* Added WARN_ON_ONCE() if out of save states.
* Changed function interface of perf_sched_restore_state() to use bool
  as return value.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

bc1738f6

perf, x86: Implement event scheduler helper functions · 1e2ad28f

由 Robert Richter 提交于 11月 18, 2011

This patch introduces x86 perf scheduler code helper functions. We
need this to later add more complex functionality to support
overlapping counter constraints (next patch).

The algorithm is modified so that the range of weight values is now
generated from the constraints. There shouldn't be other functional
changes.

With the helper functions the scheduler is controlled. There are
functions to initialize, traverse the event list, find unused counters
etc. The scheduler keeps its own state.

V3:
* Added macro for_each_set_bit_cont().
* Changed functions interfaces of perf_sched_find_counter() and
  perf_sched_next_event() to use bool as return value.
* Added some comments to make code better understandable.

V4:
* Fix broken event assignment if weight of the first event is not
  wmin (perf_sched_init()).
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1e2ad28f

perf: Avoid a useless pmu_disable() in the perf-tick · 0f5a2601

由 Peter Zijlstra 提交于 11月 16, 2011

Gleb writes:

 > Currently pmu is disabled and re-enabled on each timer interrupt even
 > when no rotation or frequency adjustment is needed. On Intel CPU this
 > results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal
 > it does not cause significant slowdown, but when running perf in a virtual
 > machine it leads to 20% slowdown on my machine.

Cure this by keeping a perf_event_context::nr_freq counter that counts the
number of active events that require frequency adjustments and use this in a
similar fashion to the already existing nr_events != nr_active test in
perf_rotate_context().

By being able to exclude both rotation and frequency adjustments a-priory for
the common case we can avoid the otherwise superfluous PMU disable.
Suggested-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

0f5a2601

Merge branch 'perf/urgent' into perf/core · d6c1c49d

由 Ingo Molnar 提交于 12月 06, 2011

Merge reason: Add these cherry-picked commits so that future changes
              on perf/core don't conflict.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d6c1c49d

ftrace: Fix hash record accounting bug · ddf6e0e5

由 Steven Rostedt 提交于 11月 04, 2011

If the set_ftrace_filter is cleared by writing just whitespace to
it, then the filter hash refcounts will be decremented but not
updated. This causes two bugs:

1) No functions will be enabled for tracing when they all should be

2) If the users clears the set_ftrace_filter twice, it will crash ftrace:

------------[ cut here ]------------
WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7()
Modules linked in:
Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32
Call Trace:
 [<ffffffff81051828>] warn_slowpath_common+0x83/0x9b
 [<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c
 [<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7
 [<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f
 [<ffffffff8111bdfe>] ? kfree+0xe5/0x115
 [<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151
 [<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f
 [<ffffffff8112e49a>] fput+0xfd/0x1c2
 [<ffffffff8112b54c>] filp_close+0x6d/0x78
 [<ffffffff8113a92d>] sys_dup3+0x197/0x1c1
 [<ffffffff8113a9a6>] sys_dup2+0x4f/0x54
 [<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b
---[ end trace 77a3a7ee73794a02 ]---

Link: http://lkml.kernel.org/r/20111101141420.GA4918@debianReported-by: NRabin Vincent <rabin@rab.in>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ddf6e0e5

perf: Fix parsing of __print_flags() in TP_printk() · d06c27b2

由 Steven Rostedt 提交于 11月 04, 2011

A update is made to the sched:sched_switch event that adds some
logic to the first parameter of the __print_flags() that shows the
state of tasks. This change cause perf to fail parsing the flags.

A simple fix is needed to have the parser be able to process ops
within the argument.

Cc: stable@vger.kernel.org
Reported-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d06c27b2

jump_label: jump_label_inc may return before the code is patched · bbbf7af4

由 Gleb Natapov 提交于 10月 18, 2011

If cpu A calls jump_label_inc() just after atomic_add_return() is
called by cpu B, atomic_inc_not_zero() will return value greater then
zero and jump_label_inc() will return to a caller before jump_label_update()
finishes its job on cpu B.

Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com

Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bbbf7af4

ftrace: Remove force undef config value left for testing · c7c6ec8b

由 Steven Rostedt 提交于 11月 04, 2011

A forced undef of a config value was used for testing and was
accidently left in during the final commit. This causes x86 to
run slower than needed while running function tracing as well
as causes the function graph selftest to fail when DYNMAIC_FTRACE
is not set. This is because the code in MCOUNT expects the ftrace
code to be processed with the config value set that happened to
be forced not set.

The forced config option was left in by:
    commit 6331c28c
    ftrace: Fix dynamic selftest failure on some archs

Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian

Cc: stable@vger.kernel.org
Reported-by: NRabin Vincent <rabin@rab.in>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

c7c6ec8b

tracing: Restore system filter behavior · 27b14b56

由 Li Zefan 提交于 11月 01, 2011

Though not all events have field 'prev_pid', it was allowed to do this:

  # echo 'prev_pid == 100' > events/sched/filter

but commit 75b8e982 (tracing/filter: Swap
entire filter of events) broke it without any reason.

Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.comSigned-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

27b14b56

tracing: fix event_subsystem ref counting · cb599747

由 Ilya Dryomov 提交于 10月 31, 2011

Fix a bug introduced by e9dbfae5, which prevents event_subsystem from
ever being released.

Ref_count was added to keep track of subsystem users, not for counting
events.  Subsystem is created with ref_count = 1, so there is no need to
increment it for every event, we have nr_events for that.  Fix this by
touching ref_count only when we actually have a new user -
subsystem_open().

Cc: stable@vger.kernel.org
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>

cb599747

05 12月, 2011 11 次提交

x86/tools: Add decoded instruction dump mode · 9dde9dc0

由 Masami Hiramatsu 提交于 12月 05, 2011

Add instruction dump mode to insn_sanity tool for
checking decoder really decoded instructions.

This mode is enabled when passing double -v (-vv) to
insn_sanity. It is useful for who wants to check whether
the decoder can decode some instructions correctly.
e.g.
 $ echo 0f 73 10 11 | ./insn_sanity -y -vv -i -
 Instruction = {
        .prefixes = {
                .value = 0, bytes[] = {0, 0, 0, 0},
                .got = 1, .nbytes = 0},
        .rex_prefix = {
                .value = 0, bytes[] = {0, 0, 0, 0},
                .got = 1, .nbytes = 0},
        .vex_prefix = {
                .value = 0, bytes[] = {0, 0, 0, 0},
                .got = 1, .nbytes = 0},
        .opcode = {
                .value = 29455, bytes[] = {f, 73, 0, 0},
                .got = 1, .nbytes = 2},
        .modrm = {
                .value = 16, bytes[] = {10, 0, 0, 0},
                .got = 1, .nbytes = 1},
        .sib = {
                .value = 0, bytes[] = {0, 0, 0, 0},
                .got = 1, .nbytes = 0},
        .displacement = {
                .value = 0, bytes[] = {0, 0, 0, 0},
                .got = 1, .nbytes = 0},
        .immediate1 = {
                .value = 17, bytes[] = {11, 0, 0, 0},
                .got = 1, .nbytes = 1},
        .immediate2 = {
                .value = 0, bytes[] = {0, 0, 0, 0},
                .got = 0, .nbytes = 0},
        .attr = 44800, .opnd_bytes = 4, .addr_bytes = 8,
        .length = 4, .x86_64 = 1, .kaddr = 0x7fff0f7d9430}
 Success: decoded and checked 1 given instructions with 0 errors (seed:0x0)
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120603.15475.91192.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

9dde9dc0

x86: Update instruction decoder to support new AVX formats · a9c373d0

由 Masami Hiramatsu 提交于 12月 05, 2011

Since new Intel software developers manual introduces
new format for AVX instruction set (including AVX2),
it is important to update x86-opcode-map.txt to fit
those changes.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120557.15475.13236.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

a9c373d0

x86/tools: Fix insn_sanity message outputs · e70825fc

由 Masami Hiramatsu 提交于 12月 05, 2011

Fix x86 instruction decoder test to dump all error messages
to stderr and others to stdout.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120550.15475.70149.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

e70825fc

x86/tools: Fix instruction decoder message output · bfbe9015

由 Masami Hiramatsu 提交于 12月 05, 2011

Fix instruction decoder test (insn_sanity), so that
it doesn't show both info and error messages twice on
same instruction. (In that case, show only error message)
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120545.15475.7928.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

bfbe9015

x86: Fix instruction decoder to handle grouped AVX instructions · 130b78b2

由 Masami Hiramatsu 提交于 12月 05, 2011

For reducing memory usage of attribute table, x86 instruction
decoder puts "Group" attribute only on "no-last-prefix"
attribute table (same as vex_p == 0 case).

Thus, the decoder should look no-last-prefix table first, and
then only if it is not a group, move on to "with-last-prefix"
table (vex_p != 0).

However, current implementation, inat_get_avx_attribute()
looks with-last-prefix directly. So, when decoding
a grouped AVX instruction, the decoder fails to find correct
group because there is no "Group" attribute on the table.
This ends up with the mis-decoding of instructions, as Ingo
reported in http://thread.gmane.org/gmane.linux.kernel/1214103

This patch fixes it to check no-last-prefix table first
even if that is an AVX instruction, and get an attribute from
"with last-prefix" table only if that is not a group.
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120539.15475.91428.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

130b78b2

x86/tools: Fix Makefile to build all test tools · 1056c3e9

由 Masami Hiramatsu 提交于 12月 05, 2011

Fix arch/x86/tools/Makefile to compile both test tools
correctly. This bug leads build error.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120533.15475.62047.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

1056c3e9

Merge branch 'tip/perf/urgent' of... · dc440d10

由 Ingo Molnar 提交于 12月 05, 2011

Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent

dc440d10

I

Merge branch 'perf/urgent' of git://github.com/acmel/linux into perf/urgent · 2c3757e5
由 Ingo Molnar 提交于 12月 05, 2011

2c3757e5

perf: Fix loss of notification with multi-event · 10c6db11

由 Peter Zijlstra 提交于 11月 26, 2011

When you do:
$ perf record -e cycles,cycles,cycles noploop 10

You expect about 10,000 samples for each event, i.e., 10s at
1000samples/sec. However, this is not what's happening. You
get much fewer samples, maybe 3700 samples/event:

$ perf report -D | tail -15
Aggregated stats:
TOTAL events: 10998
MMAP events: 66
COMM events: 2
SAMPLE events: 10930
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644
cycles stats:
TOTAL events: 3642
SAMPLE events: 3642
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644

On a Intel Nehalem or even AMD64, there are 4 counters capable
of measuring cycles, so there is plenty of space to measure those
events without multiplexing (even with the NMI watchdog active).
And even with multiplexing, we'd expect roughly the same number
of samples per event.

The root of the problem was that when the event that caused the buffer
to become full was not the first event passed on the cmdline, the user
notification would get lost. The notification was sent to the file
descriptor of the overflowed event but the perf tool was not polling
on it. The perf tool aggregates all samples into a single buffer,
i.e., the buffer of the first event. Consequently, it assumes
notifications for any event will come via that descriptor.

The seemingly straight forward solution of moving the waitq into the
ringbuffer object doesn't work because of life-time issues. One could
perf_event_set_output() on a fd that you're also blocking on and cause
the old rb object to be freed while its waitq would still be
referenced by the blocked thread -> FAIL.

Therefore link all events to the ringbuffer and broadcast the wakeup
from the ringbuffer object to all possible events that could be waited
upon. This is rather ugly, and we're open to better solutions but it
works for now.
Reported-by: NStephane Eranian <eranian@google.com>
Finished-by: NStephane Eranian <eranian@google.com>
Reviewed-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

10c6db11

perf, x86: Force IBS LVT offset assignment for family 10h · 16e5294e

由 Robert Richter 提交于 11月 08, 2011

On AMD family 10h we see firmware bug messages like the following:

 [Firmware Bug]: cpu 6, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu
 [Firmware Bug]: cpu 6, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
 [Firmware Bug]: using offset 1 for IBS interrupts
 [Firmware Bug]: workaround enabled for IBS LVT offset
 perf: AMD IBS detected (0x00000007)

We always see this, since the offsets are not assigned by the BIOS for
this family. Force LVT offset assignment in this case. If the OS
assignment fails, fallback to BIOS settings and try to setup this.

The fallback to BIOS settings weakens the family check since
force_ibs_eilvt_setup() may fail e.g. in case of virtual machines.
But setup may still succeed if BIOS offsets are correct.

Other families don't have a workaround implemented that assigns LVT
offsets. It's ok, to drop calling force_ibs_eilvt_setup() for that
families.

With the patch the [Firmware Bug] messages vanish. We see now:

 IBS: LVT offset 1 assigned
 perf: AMD IBS detected (0x00000007)
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111109162225.GO12451@erda.amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

16e5294e

perf, x86: Disable PEBS on SandyBridge chips · 6a600a8b

由 Peter Zijlstra 提交于 11月 15, 2011

Cc: Stephane Eranian <eranian@google.com>
Cc: stable@kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6a600a8b

03 12月, 2011 1 次提交

perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" test · f71c49e5

由 Arnaldo Carvalho de Melo 提交于 12月 02, 2011

For errors that don't preclude checking for further errors, aka "soft"
errors, just  continue testing for other errors.

Better coverage in verbose mode.
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-jafcokbj26m845dsgm2hx6az@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f71c49e5

02 12月, 2011 1 次提交

perf test: Validate PERF_RECORD_ events and perf_sample fields · 3e7c439a

由 Arnaldo Carvalho de Melo 提交于 12月 02, 2011

This new test will validate these new routines extracted from 'perf
record':

 - perf_evlist__config_attrs
 - perf_evlist__prepare_workload
 - perf_evlist__start_workload

In addition to several other perf_evlist methods.

It consists of starting a simple workload, setting up just one event to
monitor ("cycles") requesting that several PERF_SAMPLE_ fields be
present in all events.

It then will check that the expected PERF_RECORD_ events are produced
and will sanity check all its fields.

Some checks performed:

. PERF_SAMPLE_TIME monotonically increases.

. PERF_SAMPLE_CPU is the one requested with sched_setaffinity

. PERF_SAMPLE_TID and PERF_SAMPLE_PID matches the one we forked
  in perf_evlist__prepare_workload and that is stored in
  evlist->workload.pid

. For the events where these fields are also present in its
  pre-sample_id_all fields (e.g. event->mmap.pid), that they are what
  is expected too.

. That we get a bunch of mmaps:

  PATH/libcSUFFIX
  PATH/ldSUFFIX
  [vdso]
  PATH/sleep

Example:

  [root@emilia ~]# taskset -c 3,4 perf test -v1 perf_sample
   6: Validate PERF_RECORD_* events & perf_sample fields:
  --- start ---
  7159480799825 3 PERF_RECORD_SAMPLE
  7159480805584 3 PERF_RECORD_SAMPLE
  7159480807814 3 PERF_RECORD_SAMPLE
  7159480810430 3 PERF_RECORD_SAMPLE
  7159480861511 3 PERF_RECORD_MMAP 8086/8086: [0x7fffffffd000(0x2000) @ 0x7fffffffd000]: //anon
  7159481052516 3 PERF_RECORD_COMM: sleep:8086
  7159481070188 3 PERF_RECORD_MMAP 8086/8086: [0x400000(0x6000) @ 0]: /bin/sleep
  7159481077104 3 PERF_RECORD_MMAP 8086/8086: [0x3d06400000(0x221000) @ 0]: /lib64/ld-2.12.so
  7159481092912 3 PERF_RECORD_MMAP 8086/8086: [0x7fff1adff000(0x1000) @ 0x7fff1adff000]: [vdso]
  7159481196779 3 PERF_RECORD_MMAP 8086/8086: [0x3d06800000(0x37f000) @ 0]: /lib64/libc-2.12.so
  7160481558435 3 PERF_RECORD_EXIT(8086:8086):(8086:8086)
  ---- end ----
  Validate PERF_RECORD_* events & perf_sample fields: Ok
  [root@emilia ~]#

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-svag18v2z4idas0dyz3umjpq@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3e7c439a