1. 25 8月, 2017 2 次提交
    • A
      perf/x86: Fix data source decoding for Skylake · 6ae5fa61
      Andi Kleen 提交于
      Skylake changed the encoding of the PEBS data source field.
      Some combinations are not available anymore, but some new cases
      e.g. for L4 cache hit are added.
      
      Fix up the conversion table for Skylake, similar as had been done
      for Nehalem.
      
      On Skylake server the encoding for L4 actually means persistent
      memory. Handle this case too.
      
      To properly describe it in the abstracted perf format I had to add
      some new fields. Since a hit can have only one level add a new
      field that is an enumeration, not a bit field to describe
      the level. It can describe any level. Some numbers are also
      used to describe PMEM and LFB.
      
      Also add a new generic remote flag that can be combined with
      the generic level to signify a remote cache.
      
      And there is an extension field for the snoop indication to handle
      the Forward state.
      
      I didn't add a generic flag for hops because it's not needed
      for Skylake.
      
      I changed the existing encodings for older CPUs to also fill in the
      new level and remote fields.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6ae5fa61
    • A
      perf/x86: Move Nehalem PEBS code to flag · 95298355
      Andi Kleen 提交于
      Minor cleanup: use an explicit x86_pmu flag to handle the
      missing Lock / TLB information on Nehalem, instead of always
      checking the model number for each PEBS sample.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-2-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      95298355
  2. 18 8月, 2017 1 次提交
  3. 11 8月, 2017 1 次提交
    • C
      x86: Mark various structures and functions as 'static' · b45e4c45
      Colin Ian King 提交于
      Mark a couple of structures and functions as 'static', pointed out by Sparse:
      
        warning: symbol 'bts_pmu' was not declared. Should it be static?
        warning: symbol 'p4_event_aliases' was not declared. Should it be static?
        warning: symbol 'rapl_attr_groups' was not declared. Should it be static?
        symbol 'process_uv2_message' was not declared. Should it be static?
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: Andrew Banman <abanman@hpe.com> # for the UV change
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: kernel-janitors@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170810155709.7094-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b45e4c45
  4. 24 7月, 2017 6 次提交
  5. 21 7月, 2017 1 次提交
    • J
      perf/x86/intel: Add proper condition to run sched_task callbacks · df6c3db8
      Jiri Olsa 提交于
      We have 2 functions using the same sched_task callback:
      
        - PEBS drain for free running counters
        - LBR save/store
      
      Both of them are called from intel_pmu_sched_task() and
      either of them can be unwillingly triggered when the
      other one is configured to run.
      
      Let's say there's PEBS drain configured in sched_task
      callback for the event, but in the callback itself
      (intel_pmu_sched_task()) we will also run the code for
      LBR save/restore, which we did not ask for, but the
      code in intel_pmu_sched_task() does not check for that.
      
      This can lead to extra cycles in some perf monitoring,
      like when we monitor PEBS event without LBR data.
      
        # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000
      
        (We need PEBS, non freq/non timestamp event to enable
         the sched_task callback)
      
      The perf stat of cycles and msr:write_msr for above
      command before the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,519,557,441      cycles:k
              91,195,527      msr:write_msr
      
            29.334476406 seconds time elapsed
      
      And after the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,704,973,540      cycles:k
              27,184,720      msr:write_msr
      
            16.977875900 seconds time elapsed
      
      There's no affect on cycles:k because the sched_task happens
      with events switched off, however the msr:write_msr tracepoint
      counter together with almost 50% of time speedup show the
      improvement.
      
      Monitoring LBR event and having extra PEBS drain processing
      in sched_task callback showed just a little speedup, because
      the drain function does not do much extra work in case there
      is no PEBS data.
      
      Adding conditions to recognize the configured work that needs
      to be done in the x86_pmu's sched_task callback.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170719075247.GA27506@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      df6c3db8
  6. 19 7月, 2017 1 次提交
    • J
      perf/x86/intel: Record branch type · d5c7f9dc
      Jin Yao 提交于
      Perf already has support for disassembling the branch instruction
      and using the branch type for filtering. The patch just records
      the branch type in perf_branch_entry.
      
      Before recording, the patch converts the x86 branch type to
      common branch type.
      
      Change log:
      
      v10: Set the branch_map array to be static. The previous version
           has it on stack then makes the compiler to create it every
           time when the function gets called.
      
      v9: Use __ffs() to find first bit in type in common_branch_type().
          It lets the code be clear.
      
      v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
      
      v7: Just convert following x86 branch types to common branch types.
      
      X86_BR_CALL      -> PERF_BR_CALL
      X86_BR_RET       -> PERF_BR_RET
      X86_BR_JCC       -> PERF_BR_COND
      X86_BR_JMP       -> PERF_BR_UNCOND
      X86_BR_IND_CALL  -> PERF_BR_IND_CALL
      X86_BR_ZERO_CALL -> PERF_BR_CALL
      X86_BR_IND_JMP   -> PERF_BR_IND
      X86_BR_SYSCALL   -> PERF_BR_SYSCALL
      X86_BR_SYSRET    -> PERF_BR_SYSRET
      
      Others are set to PERF_BR_NONE
      
      v6: Not changed.
      
      v5: Just fix the merge error. No other update.
      
      v4: Comparing to previous version, the major changes are:
      
      1. Uses a lookup table to convert x86 branch type to common branch
         type.
      
      2. Move the JCC forward/JCC backward and cross page computing to
         user space.
      
      3. Initialize branch type to 0 in intel_pmu_lbr_read_32 and
         intel_pmu_lbr_read_64
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Link: http://lkml.kernel.org/r/1500379995-6449-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d5c7f9dc
  7. 18 7月, 2017 3 次提交
    • J
      perf/x86/intel: Fix debug_store reset field for freq events · dc853e26
      Jiri Olsa 提交于
      There's a bug in PEBs event enabling code, that prevents PEBS
      freq events to work properly after non freq PEBS event was run.
      
      freq events - perf_event_attr::freq set
                    -F <freq> option of perf record
      
      PEBS events - perf_event_attr::precise_ip > 0
                    default for perf record
      
      Like in following example with CPU 0 busy, we expect ~10000 samples
      for following perf tool run:
      
        # perf record -F 10000 -C 0 sleep 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.640 MB perf.data (10031 samples) ]
      
      Everything's fine, but once we run non freq PEBS event like:
      
        # perf record -c 10000 -C 0 sleep 1
        [ perf record: Woken up 4 times to write data ]
        [ perf record: Captured and wrote 1.053 MB perf.data (20061 samples) ]
      
      the freq events start to fail like this:
      
        # perf record -F 10000 -C 0 sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.185 MB perf.data (40 samples) ]
      
      The issue is in non freq PEBs event initialization of debug_store reset
      field, which value is used to auto-reload the counter value after PEBS
      event drain. This value is not being used for PEBS freq events, but once
      we run non freq event it stays in debug_store data and screws the
      sample_freq counting for PEBS freq events.
      
      Setting the reset field to 0 for freq events.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170714163551.19459-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dc853e26
    • K
      perf/x86/intel: Add Goldmont Plus CPU PMU support · dd0b06b5
      Kan Liang 提交于
      Add perf core PMU support for Intel Goldmont Plus CPU cores:
      
       - The init code is based on Goldmont.
       - There is a new cache event list, based on the Goldmont cache event
         list.
       - All four general-purpose performance counters support PEBS.
       - The first general-purpose performance counter is for reduced skid
         PEBS mechanism. Using :ppp to indicate the event which want to do
         reduced skid PEBS.
       - Goldmont Plus has 4-wide pipeline for Topdown
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/20170712134423.17766-1-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd0b06b5
    • H
      perf/x86/intel: Enable C-state residency events for Apollo Lake · 5c10b048
      Harry Pan 提交于
      Goldmont microarchitecture supports C1/C3/C6, PC2/PC3/PC6/PC10 state
      residency counters, the patch enables them for Apollo Lake platform.
      
      The MSR information is based on Intel Software Developers' Manual,
      Vol. 4, Order No. 335592, Table 2-6 and 2-12.
      Signed-off-by: NHarry Pan <harry.pan@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bp@suse.de
      Cc: davidcc@google.com
      Cc: gs0622@gmail.com
      Cc: lukasz.odzioba@intel.com
      Cc: piotr.luc@intel.com
      Cc: srinivas.pandruvada@linux.intel.com
      Link: http://lkml.kernel.org/r/20170717103749.24337-1-harry.pan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5c10b048
  8. 30 6月, 2017 2 次提交
  9. 22 6月, 2017 1 次提交
  10. 26 5月, 2017 2 次提交
  11. 23 5月, 2017 1 次提交
    • K
      perf/x86: Add sysfs entry to freeze counters on SMI · 6089327f
      Kan Liang 提交于
      Currently, the SMIs are visible to all performance counters, because
      many users want to measure everything including SMIs. But in some
      cases, the SMI cycles should not be counted - for example, to calculate
      the cost of an SMI itself. So a knob is needed.
      
      When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
      counters will be effected. There is no way to do per-counter freeze
      on SMI. So it should not use the per-event interface (e.g. ioctl or
      event attribute) to set FREEZE_WHILE_SMM bit.
      
      Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
      bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
      while in SMM.
      
      Value has to be 0 or 1. It will be applied to all processors.
      
      Also serialize the entire setting so we don't get multiple concurrent
      threads trying to update to different values.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: bp@alien8.de
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/1494600673-244667-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6089327f
  12. 03 5月, 2017 1 次提交
    • V
      perf/x86: Fix Broadwell-EP DRAM RAPL events · 33b88e70
      Vince Weaver 提交于
      It appears as though the Broadwell-EP DRAM units share the special
      units quirk with Haswell-EP/KNL.
      
      Without this patch, you get really high results (a single DRAM using 20W
      of power).
      
      The powercap driver in drivers/powercap/intel_rapl.c already has this
      change.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      33b88e70
  13. 14 4月, 2017 2 次提交
    • K
      perf/x86: Fix spurious NMI with PEBS Load Latency event · fd583ad1
      Kan Liang 提交于
      Spurious NMIs will be observed with the following command:
      
        while :; do
          perf record -bae "cpu/umask=0x01,event=0xcd,ldlat=0x80/pp"
                        -e "cpu/umask=0x03,event=0x0/"
                        -e "cpu/umask=0x02,event=0x0/"
                        -e cycles,branches,cache-misses
                        -e cache-references -- sleep 10
        done
      
      The bug was introduced by commit:
      
        8077eca0 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")
      
      That commit clears the status bits for the counters used for PEBS
      events, by masking the whole 64 bits pebs_enabled. However, only the
      low 32 bits of both status and pebs_enabled are reserved for PEBS-able
      counters.
      
      For status bits 32-34 are fixed counter overflow bits. For
      pebs_enabled bits 32-34 are for PEBS Load Latency.
      
      In the test case, the PEBS Load Latency event and fixed counter event
      could overflow at the same time. The fixed counter overflow bit will
      be cleared by mistake. Once it is cleared, the fixed counter overflow
      never be processed, which finally trigger spurious NMI.
      
      Correct the PEBS enabled mask by ignoring the non-PEBS bits.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 8077eca0 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")
      Link: http://lkml.kernel.org/r/1491333246-3965-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd583ad1
    • P
      perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32() · f2200ac3
      Peter Zijlstra 提交于
      When the perf_branch_entry::{in_tx,abort,cycles} fields were added,
      intel_pmu_lbr_read_32() wasn't updated to initialize them.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Fixes: 135c5612 ("perf/x86/intel: Support Haswell/v4 LBR format")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f2200ac3
  14. 30 3月, 2017 1 次提交
  15. 16 3月, 2017 3 次提交
  16. 01 3月, 2017 1 次提交
  17. 12 2月, 2017 1 次提交
  18. 01 2月, 2017 4 次提交
    • A
      perf/x86/intel/pt: Add format strings for PTWRITE and power event tracing · 5443624b
      Alexander Shishkin 提交于
      Commit:
      
        8ee83b2a ("perf/x86/intel/pt: Add support for PTWRITE and power event tracing")
      
      forgot to add format strings to the PT driver. So one could enable these features
      by setting corresponding bits in the event config, but not by their mnemonic names.
      
      This patch adds the format strings.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Fixes: 8ee83b2a ("perf/x86/intel/pt: Add support for PTWRITE...")
      Link: http://lkml.kernel.org/r/20170127151644.8585-2-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5443624b
    • T
      perf/x86/intel/uncore: Make package handling more robust · fff4b87e
      Thomas Gleixner 提交于
      The package management code in uncore relies on package mapping being
      available before a CPU is started. This changed with:
      
        9d85eb91 ("x86/smpboot: Make logical package management more robust")
      
      because the ACPI/BIOS information turned out to be unreliable, but that
      left uncore in broken state. This was not noticed because on a regular boot
      all CPUs are online before uncore is initialized.
      
      Move the allocation to the CPU online callback and simplify the hotplug
      handling. At this point the package mapping is established and correct.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
      Link: http://lkml.kernel.org/r/20170131230141.377156255@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fff4b87e
    • T
      perf/x86/intel/uncore: Clean up hotplug conversion fallout · 1aa6cfd3
      Thomas Gleixner 提交于
      The recent conversion to the hotplug state machine kept two mechanisms from
      the original code:
      
       1) The first_init logic which adds the number of online CPUs in a package
          to the refcount. That's wrong because the callbacks are executed for
          all online CPUs.
      
          Remove it so the refcounting is correct.
      
       2) The on_each_cpu() call to undo box->init() in the error handling
          path. That's bogus because when the prepare callback fails no box has
          been initialized yet.
      
          Remove it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Fixes: 1a246b9f ("perf/x86/intel/uncore: Convert to hotplug state machine")
      Link: http://lkml.kernel.org/r/20170131230141.298032324@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1aa6cfd3
    • T
      perf/x86/intel/rapl: Make package handling more robust · dd86e373
      Thomas Gleixner 提交于
      The package management code in RAPL relies on package mapping being
      available before a CPU is started. This changed with:
      
        9d85eb91 ("x86/smpboot: Make logical package management more robust")
      
      because the ACPI/BIOS information turned out to be unreliable, but that
      left RAPL in broken state. This was not noticed because on a regular boot
      all CPUs are online before RAPL is initialized.
      
      A possible fix would be to reintroduce the mess which allocates a package
      data structure in CPU prepare and when it turns out to already exist in
      starting throw it away later in the CPU online callback. But that's a
      horrible hack and not required at all because RAPL becomes functional for
      perf only in the CPU online callback. That's correct because user space is
      not yet informed about the CPU being onlined, so nothing caan rely on RAPL
      being available on that particular CPU.
      
      Move the allocation to the CPU online callback and simplify the hotplug
      handling. At this point the package mapping is established and correct.
      
      This also adds a missing check for available package data in the
      event_init() function.
      Reported-by: NYasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
      Link: http://lkml.kernel.org/r/20170131230141.212593966@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd86e373
  19. 17 1月, 2017 1 次提交
    • Z
      perf/x86/intel: Handle exclusive threadid correctly on CPU hotplug · 4e71de79
      Zhou Chengming 提交于
      The CPU hotplug function intel_pmu_cpu_starting() sets
      cpu_hw_events.excl_thread_id unconditionally to 1 when the shared exclusive
      counters data structure is already availabe for the sibling thread.
      
      This works during the boot process because the first sibling gets threadid
      0 assigned and the second sibling which shares the data structure gets 1.
      
      But when the first thread of the core is offlined and onlined again it
      shares the data structure with the second thread and gets exclusive thread
      id 1 assigned as well.
      
      Prevent this by checking the threadid of the already online thread.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: NZhou Chengming <zhouchengming1@huawei.com>
      Cc: NuoHan Qiao <qiaonuohan@huawei.com>
      Cc: ak@linux.intel.com
      Cc: peterz@infradead.org
      Cc: kan.liang@intel.com
      Cc: dave.hansen@linux.intel.com
      Cc: eranian@google.com
      Cc: qiaonuohan@huawei.com
      Cc: davidcc@google.com
      Cc: guohanjun@huawei.com
      Link: http://lkml.kernel.org/r/1484536871-3131-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ---					---
       arch/x86/events/intel/core.c |    7 +++++--
       1 file changed, 5 insertions(+), 2 deletions(-)
      4e71de79
  20. 14 1月, 2017 1 次提交
    • J
      perf/x86/intel: Account interrupts for PEBS errors · 475113d9
      Jiri Olsa 提交于
      It's possible to set up PEBS events to get only errors and not
      any data, like on SNB-X (model 45) and IVB-EP (model 62)
      via 2 perf commands running simultaneously:
      
          taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
      
      This leads to a soft lock up, because the error path of the
      intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
      for error PEBS interrupts, so in case you're getting ONLY
      errors you don't have a way to stop the event when it's over
      the max_samples_per_tick limit:
      
        NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
        ...
        RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
        ...
        Call Trace:
         ? trace_hardirqs_on_caller+0xf5/0x1b0
         ? perf_cgroup_attach+0x70/0x70
         perf_install_in_context+0x199/0x1b0
         ? ctx_resched+0x90/0x90
         SYSC_perf_event_open+0x641/0xf90
         SyS_perf_event_open+0x9/0x10
         do_syscall_64+0x6c/0x1f0
         entry_SYSCALL64_slow_path+0x25/0x25
      
      Add perf_event_account_interrupt() which does the interrupt
      and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
      error path.
      
      We keep the pending_kill and pending_wakeup logic only in the
      __perf_event_overflow() path, because they make sense only if
      there's any data to deliver.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      475113d9
  21. 11 1月, 2017 2 次提交
  22. 05 1月, 2017 1 次提交
    • D
      perf/x86: Set pmu->module in Intel PMU modules · 74545f63
      David Carrillo-Cisneros 提交于
      The conversion of Intel PMU drivers into modules did not include reference
      counting. The machine will crash when attempting to  access deleted code
      if an event from a module PMU is started and the module removed before the
      event is destroyed.
      
      i.e. this crashes the machine:
      
      	$ insmod intel-rapl-perf.ko
      	$ perf stat -e power/energy-cores/ -C 0 &
      	$ rmmod intel-rapl-perf.ko
      
      Set THIS_MODULE to pmu->module in Intel module PMUs so that generic code
      can handle reference counting and deny rmmod while an event still exists.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1482455860-116269-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74545f63
  23. 25 12月, 2016 1 次提交