1. 09 2月, 2014 1 次提交
  2. 13 1月, 2014 2 次提交
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
    • P
      sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs · 20d1c86a
      Peter Zijlstra 提交于
      Use a ring-buffer like multi-version object structure which allows
      always having a coherent object; we use this to avoid having to
      disable IRQs while reading sched_clock() and avoids a problem when
      getting an NMI while changing the cyc2ns data.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     331312     257223
          (cold) local_clock: 301773     310296     309889
          (warm) sched_clock: 38375      38247      25280
          (warm) local_clock: 100371     102713     85268
          (warm) rdtsc:       27340      27289      24247
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     372706     301224
          (cold) local_clock: 396890     399275     399870
          (warm) sched_clock: 38194      38124      25630
          (warm) local_clock: 143452     148698     129629
          (warm) rdtsc:       27345      27365      24307
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20d1c86a
  3. 06 11月, 2013 1 次提交
  4. 29 10月, 2013 1 次提交
    • P
      perf/x86: Fix NMI measurements · e8a923cc
      Peter Zijlstra 提交于
      OK, so what I'm actually seeing on my WSM is that sched/clock.c is
      'broken' for the purpose we're using it for.
      
      What triggered it is that my WSM-EP is broken :-(
      
        [    0.001000] tsc: Fast TSC calibration using PIT
        [    0.002000] tsc: Detected 2533.715 MHz processor
        [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
        [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
        [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
      
      For some reason it consistently detects TSC skew, even though NHM+
      should have a single clock domain for 'reasonable' systems.
      
      This marks sched_clock_stable=0, which means that we do fancy stuff to
      try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
      clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
      it either wakes up or gets kicked by another cpu.
      
      While this is perfectly fine for the scheduler -- it only cares about
      actually running stuff, and when we're running stuff we're obviously not
      idle. This does somewhat break down for perf which can trigger events
      just fine on an otherwise idle cpu.
      
      So I've got NMIs get get 'measured' as taking ~1ms, which actually
      don't last nearly that long:
      
                <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
        ...
                <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990
      
      So ftrace (which uses sched_clock(), not the fancy bits) only sees
      ~27us, but we measure ~1ms !!
      
      Now since all this measurement stuff lives in x86 code, we can actually
      fix it.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: mingo@kernel.org
      Cc: dave.hansen@linux.intel.com
      Cc: eranian@google.com
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: jmario@redhat.com
      Cc: acme@infradead.org
      Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e8a923cc
  5. 04 10月, 2013 1 次提交
  6. 28 9月, 2013 1 次提交
  7. 20 9月, 2013 1 次提交
    • P
      perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' · fa731587
      Peter Zijlstra 提交于
      Solve the problems around the broken definition of perf_event_mmap_page::
      cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
      fixed by:
      
        860f085b ("perf: Fix broken union in 'struct perf_event_mmap_page'")
      
      The problem with the fix (merged in v3.12-rc1 and not yet released
      officially), noticed by Vince Weaver is that the new behavior is
      not detectable by new user-space, and that due to the reuse of the
      field names it's easy to mis-compile a binary if old headers are used
      on a new kernel or new headers are used on an old kernel.
      
      To solve all that make this change explicit, detectable and self-contained,
      by iterating the ABI the following way:
      
       - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
         confuse old user-space binaries. RDPMC will be marked as unavailable
         to old binaries but that's within the ABI, this is a capability bit.
      
       - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
         libraries can reliably detect that bit 0 is deprecated and perma-zero
         without having to check the kernel version.
      
       - Use bits 2, 3, 4 for the newly defined, correct functionality:
      
      	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
      	cap_user_time		: 1, /* The time_* fields are used */
      	cap_user_time_zero	: 1, /* The time_zero field is used */
      
       - Rename all the bitfield names in perf_event.h to be different from the
         old names, to make sure it's not possible to mis-compile it
         accidentally with old assumptions.
      
      The 'size' field can then be used in the future to add new fields and it
      will act as a natural ABI version indicator as well.
      
      Also adjust tools/perf/ userspace for the new definitions, noticed by
      Adrian Hunter.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Also-Fixed-by: NAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fa731587
  8. 23 7月, 2013 1 次提交
  9. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  10. 27 6月, 2013 1 次提交
    • S
      perf/x86: Fix shared register mutual exclusion enforcement · 2f7f73a5
      Stephane Eranian 提交于
      This patch fixes a problem with the shared registers mutual
      exclusion code and incremental event scheduling by the
      generic perf_event code.
      
      There was a bug whereby the mutual exclusion on the shared
      registers was not enforced because of incremental scheduling
      abort due to event constraints. As an example on Intel
      Nehalem, consider the following events:
      
      group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
      group2= L1D_CACHE_LD:I_STATE
      
      The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
      are 3 instances here. The first group can be scheduled and is committed.
      Then, the generic code tries to schedule group2 and this fails (because
      there is no more counter to support the 3rd instance of L1D_CACHE_LD).
      But in x86_schedule_events() error path, put_event_contraints() is invoked
      on ALL the events and not just the ones that just failed. That causes the
      "lock" on the shared offcore_response MSR to be released. Yet the first group
      is actually scheduled and is exposed to reprogramming of that shared msr by
      the sibling HT thread. In other words, there is no guarantee on what is
      measured.
      
      This patch fixes the problem by tagging committed events with the
      PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
      only the events NOT tagged have their constraint released. The tag
      is eventually removed when the event in descheduled.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20130620164254.GA3556@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2f7f73a5
  11. 23 6月, 2013 1 次提交
    • D
      perf: Drop sample rate when sampling is too slow · 14c63f17
      Dave Hansen 提交于
      This patch keeps track of how long perf's NMI handler is taking,
      and also calculates how many samples perf can take a second.  If
      the sample length times the expected max number of samples
      exceeds a configurable threshold, it drops the sample rate.
      
      This way, we don't have a runaway sampling process eating up the
      CPU.
      
      This patch can tend to drop the sample rate down to level where
      perf doesn't work very well.  *BUT* the alternative is that my
      system hangs because it spends all of its time handling NMIs.
      
      I'll take a busted performance tool over an entire system that's
      busted and undebuggable any day.
      
      BTW, my suspicion is that there's still an underlying bug here.
      Using the HPET instead of the TSC is definitely a contributing
      factor, but I suspect there are some other things going on.
      But, I can't go dig down on a bug like that with my machine
      hanging all the time.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      [ Prettified it a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      14c63f17
  12. 19 6月, 2013 2 次提交
  13. 21 4月, 2013 1 次提交
    • G
      perf/x86: Check all MSRs before passing hw check · a5ebe0ba
      George Dunlap 提交于
      check_hw_exists() has a number of checks which go to two exit
      paths: msr_fail and bios_fail.  Checks classified as msr_fail
      will cause check_hw_exists() to return false, causing the PMU
      not to be used; bios_fail checks will only cause a warning to be
      printed, but will return true.
      
      The problem is that if there are both msr failures and bios
      failures, and the routine hits a bios_fail check first, it will
      exit early and return true, not finishing the rest of the msr
      checks.  If those msrs are in fact broken, it will cause them to
      be used erroneously.
      
      In the case of a Xen PV VM, the guest OS has read access to all
      the MSRs, but write access is white-listed to supported
      features.  Writes to unsupported MSRs have no effect.  The PMU
      MSRs are not (typically) supported, because they are expensive
      to save and restore on a VM context switch.  One of the
      "msr_fail" checks is supposed to detect this circumstance (ether
      for Xen or KVM) and disable the harware PMU.
      
      However, on one of my AMD boxen, there is (apparently) a broken
      BIOS which triggers one of the bios_fail checks.  In particular,
      MSR_K7_EVNTSEL0 has the ARCH_PERFMON_EVENTSEL_ENABLE bit set.
      The guest kernel detects this because it has read access to all
      MSRs, and causes it to skip the rest of the checks and try to
      use the non-existent hardware PMU.  This minimally causes a lot
      of useless instruction emulation and Xen console spam; it may
      cause other issues with the watchdog as well.
      
      This changset causes check_hw_exists() to go through all of the
      msr checks, failing and returning false if any of them fail.
      This makes sure that a guest running under Xen without a virtual
      PMU will detect that there is no functioning PMU and not attempt
      to use it.
      
      This problem affects kernels as far back as 3.2, and should thus
      be considered for backport.
      Signed-off-by: NGeorge Dunlap <george.dunlap@eu.citrix.com>
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Link: http://lkml.kernel.org/r/1365000388-32448-1-git-send-email-george.dunlap@eu.citrix.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5ebe0ba
  14. 01 4月, 2013 2 次提交
  15. 27 3月, 2013 2 次提交
  16. 07 2月, 2013 1 次提交
  17. 01 2月, 2013 1 次提交
  18. 10 1月, 2013 1 次提交
    • D
      perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side · a706d965
      David Ahern 提交于
      This patch is brought to you by the letter 'H'.
      
      Commit 20b279 breaks compatiblity with older perf binaries when run with
      precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
      set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
      samples) unless host only profiling is requested (:H modifier). The workaround
      for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
      toggles exclude_guest to 1). This was deemed unacceptable by Linus:
      
      https://lkml.org/lkml/2012/12/12/570
      
      Between family in town and the fresh snow in Breckenridge there is no time left
      to be working on the proper fix for this over the holidays. In the New Year I
      have more pressing problems to resolve -- like some memory leaks in perf which
      are proving to be elusive -- although the aforementioned snow is probably why
      they are proving to be elusive. Either way I do not have any spare time to work
      on this and from the time I have managed to spend on it the solution is more
      difficult than just moving to a new exclude_guest flag (does not work) or
      flipping the logic to include_guest (which is not as trivial as one would
      think).
      
      So, two options: silently force exclude_guest on as suggested by Gleb which
      means no impact to older perf binaries or revert the original patch which
      caused the breakage.
      
      This patch does the latter -- reverts the original patch that introduced the
      regression. The problem can be revisited in the future as time allows.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a706d965
  19. 30 10月, 2012 1 次提交
  20. 24 10月, 2012 5 次提交
    • J
      perf/x86: Add hardware events translations for AMD cpus · 0bf79d44
      Jiri Olsa 提交于
      Add support for AMD processors to display 'events' sysfs
      directory (/sys/devices/cpu/events/) with hw event translations:
      
        # ls  /sys/devices/cpu/events/
        branch-instructions
        branch-misses
        bus-cycles
        cache-misses
        cache-references
        cpu-cycles
        instructions
        ref-cycles
        stalled-cycles-backend
        stalled-cycles-frontend
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349873598-12583-5-git-send-email-jolsa@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0bf79d44
    • J
      perf/x86: Add hardware events translations for Intel cpus · 43c032fe
      Jiri Olsa 提交于
      Add support for Intel processors to display 'events' sysfs
      directory (/sys/devices/cpu/events/) with hw event translations:
      
        # ls  /sys/devices/cpu/events/
        branch-instructions
        branch-misses
        bus-cycles
        cache-misses
        cache-references
        cpu-cycles
        instructions
        ref-cycles
        stalled-cycles-backend
        stalled-cycles-frontend
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349873598-12583-4-git-send-email-jolsa@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      43c032fe
    • J
      perf/x86: Filter out undefined events from sysfs events attribute · 8300daa2
      Jiri Olsa 提交于
      The sysfs events group attribute currently shows all hw events,
      including also undefined ones.
      
      This patch filters out all undefined events out of the sysfs events
      group attribute, so they don't even show up.
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349873598-12583-3-git-send-email-jolsa@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8300daa2
    • J
      perf/x86: Make hardware event translations available in sysfs · a4747393
      Jiri Olsa 提交于
      Add support to display hardware events translations available
      through the sysfs. Add 'events' group attribute under the sysfs
      x86 PMU record with attribute/file for each hardware event.
      
      This patch adds only backbone for PMUs to display config under
      'events' directory. The specific PMU support itself will come
      in next patches, however this is how the sysfs group will look
      like:
      
        # ls  /sys/devices/cpu/events/
        branch-instructions
        branch-misses
        bus-cycles
        cache-misses
        cache-references
        cpu-cycles
        instructions
        ref-cycles
        stalled-cycles-backend
        stalled-cycles-frontend
      
      The file - hw event ID mapping is:
      
        file                      hw event ID
        ---------------------------------------------------------------
        cpu-cycles                PERF_COUNT_HW_CPU_CYCLES
        instructions              PERF_COUNT_HW_INSTRUCTIONS
        cache-references          PERF_COUNT_HW_CACHE_REFERENCES
        cache-misses              PERF_COUNT_HW_CACHE_MISSES
        branch-instructions       PERF_COUNT_HW_BRANCH_INSTRUCTIONS
        branch-misses             PERF_COUNT_HW_BRANCH_MISSES
        bus-cycles                PERF_COUNT_HW_BUS_CYCLES
        stalled-cycles-frontend   PERF_COUNT_HW_STALLED_CYCLES_FRONTEND
        stalled-cycles-backend    PERF_COUNT_HW_STALLED_CYCLES_BACKEND
        ref-cycles                PERF_COUNT_HW_REF_CPU_CYCLES
      
      Each file in the 'events' directory contains the term translation
      for the symbolic hw event for the currently running cpu model.
      
        # cat /sys/devices/cpu/events/stalled-cycles-backend
        event=0xb1,umask=0x01,inv,cmask=0x01
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349873598-12583-2-git-send-email-jolsa@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a4747393
    • A
      x86/perf: Fix virtualization sanity check · bffd5fc2
      Andre Przywara 提交于
      In check_hw_exists() we try to detect non-emulated MSR accesses
      by writing an arbitrary value into one of the PMU registers
      and check if it's value after a readout is still the same.
      This algorithm silently assumes that the register does not contain
      the magic value already, which is wrong in at least one situation.
      
      Fix the algorithm to really do a read-modify-write cycle. This fixes
      a warning under Xen under some circumstances on AMD family 10h CPUs.
      
      The reasons in more details actually sound like a story from
      Believe It or Not!:
      
      First you need an AMD family 10h/12h CPU. These do not reset the
      PERF_CTR registers on a reboot.
      Now you boot bare metal Linux, which goes successfully through this
      check, but leaves the magic value of 0xabcd in the register. You
      don't use the performance counters, but do a reboot (warm reset).
      Then you choose to boot Xen. The check will be triggered with a
      recent Linux kernel as Dom0 again, trying to write 0xabcd into the
      MSR. Xen silently drops the write (expected), but the subsequent read
      will return the value in the register, which just happens to be the
      expected magic value. Thus the test misleadingly succeeds, leaving
      the kernel in the belief that the PMU is available. This will trigger
      the following message:
      
      [    0.020294] ------------[ cut here ]------------
      [    0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
      [    0.020318] Hardware name: empty
      [    0.020323] Modules linked in:
      [    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
      [    0.020340] Call Trace:
      [    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
      [    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
      [    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
      [    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
      [    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
      [    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
      [    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
      [    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
      [    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
      [    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
      [    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
      [    0.020500] ---[ end trace a7919e7f17c0a725 ]---
      
      The new code will change every of the 16 low bits read from the
      register and tries to write and read-back that modified number
      from the MSR.
      Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bffd5fc2
  21. 16 10月, 2012 1 次提交
  22. 31 7月, 2012 1 次提交
    • P
      perf/x86: Fix USER/KERNEL tagging of samples properly · d07bdfd3
      Peter Zijlstra 提交于
      Some PMUs don't provide a full register set for their sample,
      specifically 'advanced' PMUs like AMD IBS and Intel PEBS which provide
      'better' than regular interrupt accuracy.
      
      In this case we use the interrupt regs as basis and over-write some
      fields (typically IP) with different information.
      
      The perf core however uses user_mode() to distinguish user/kernel
      samples, user_mode() relies on regs->cs. If the interrupt skid pushed
      us over a boundary the new IP might not be in the same domain as the
      interrupt.
      
      Commit ce5c1fe9 ("perf/x86: Fix USER/KERNEL tagging of samples")
      tried to fix this by making the perf core use kernel_ip(). This
      however is wrong (TM), as pointed out by Linus, since it doesn't allow
      for VM86 and non-zero based segments in IA32 mode.
      
      Therefore, provide a new helper to set the regs->ip field,
      set_linear_ip(), which massages the regs into a suitable state
      assuming the provided IP is in fact a linear address.
      
      Also modify perf_instruction_pointer() and perf_callchain_user() to
      deal with segments base offsets.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1341910954.3462.102.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d07bdfd3
  23. 06 7月, 2012 5 次提交
  24. 18 6月, 2012 2 次提交
  25. 11 6月, 2012 1 次提交
  26. 08 6月, 2012 1 次提交
  27. 06 6月, 2012 1 次提交