1. 27 5月, 2015 10 次提交
    • P
      perf/x86/intel: Simplify put_exclusive_constraints() · ba040653
      Peter Zijlstra 提交于
      Don't bother with taking locks if we're not actually going to do
      anything. Also, drop the _irqsave(), this is very much only called
      from IRQ-disabled context.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ba040653
    • P
      perf/x86/intel: Remove intel_excl_states::init_state · 43ef205b
      Peter Zijlstra 提交于
      For some obscure reason intel_{start,stop}_scheduling() copy the HT
      state to an intermediate array. This would make sense if we ever were
      to make changes to it which we'd have to discard.
      
      Except we don't. By the time we call intel_commit_scheduling() we're;
      as the name implies; committed to them. We'll never back out.
      
      A further hint its pointless is that stop_scheduling() unconditionally
      publishes the state.
      
      So the intermediate array is pointless, modify the state in place and
      kill the extra array.
      
      And remove the pointless array initialization: INTEL_EXCL_UNUSED == 0.
      
      Note; all is serialized by intel_excl_cntr::lock.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      43ef205b
    • P
      perf/x86/intel: Remove pointless tests · 1fe684e3
      Peter Zijlstra 提交于
      Both intel_commit_scheduling() and intel_get_excl_contraints() test
      for cntr < 0.
      
      The only way that can happen (aside from a bug) is through
      validate_event(), however that is already captured by the
      cpuc->is_fake test.
      
      So remove these test and simplify the code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1fe684e3
    • P
      perf/x86/intel: Clean up intel_commit_scheduling() placement · 0c41e756
      Peter Zijlstra 提交于
      Move the code of intel_commit_scheduling() to the right place, which is
      in between start() and stop().
      
      No change in functionality.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0c41e756
    • P
      perf/x86/intel: Make WARN()ings consistent · 17186ccd
      Peter Zijlstra 提交于
      The intel_commit_scheduling() callback is pointlessly different from
      the start and stop scheduling callback.
      
      Furthermore, the constraint should never be NULL, so remove that test.
      
      Even though we'll never get called (because we NULL the callbacks)
      when !is_ht_workaround_enabled() put that test in.
      
      Collapse the (pointless) WARN_ON_ONCE() and bail on !cpuc->excl_cntrs --
      this is doubly pointless, because its the same condition as
      is_ht_workaround_enabled() which was already pointless because the
      whole method won't ever be called.
      
      Furthremore, make all the !excl_cntrs test WARN_ON_ONCE(); they're all
      pointless, because the above, either the function
      ({get,put}_excl_constraint) are already predicated on it existing or
      the is_ht_workaround_enabled() thing is the same test.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      17186ccd
    • P
      perf/x86/intel: Simplify the dynamic constraint code somewhat · aaf932e8
      Peter Zijlstra 提交于
      We have two 'struct event_constraint' local variables in
      intel_get_excl_constraints(): 'cx' and 'c'.
      
      Instead of using 'cx' after the dynamic allocation, put all 'cx' inside
      the dynamic allocation block and use 'c' outside of it.
      
      Also use direct assignment to copy the structure; let the compiler
      figure it out.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aaf932e8
    • P
      perf/x86/intel: Add lockdep assert · b32ed7f5
      Peter Zijlstra 提交于
      Lockdep is very good at finding incorrect IRQ state while locking and
      is far better at telling us if we hold a lock than the _is_locked()
      API. It also generates less code for !DEBUG kernels.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b32ed7f5
    • P
      perf/x86/intel: Correct local vs remote sibling state · 1c565833
      Peter Zijlstra 提交于
      For some obscure reason the current code accounts the current SMT
      thread's state on the remote thread and reads the remote's state on
      the local SMT thread.
      
      While internally consistent, and 'correct' its pointless confusion we
      can do without.
      
      Flip them the right way around.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1c565833
    • P
      perf/x86: Improve HT workaround GP counter constraint · cc1790cf
      Peter Zijlstra 提交于
      The (SNB/IVB/HSW) HT bug only affects events that can be programmed
      onto GP counters, therefore we should only limit the number of GP
      counters that can be used per cpu -- iow we should not constrain the
      FP counters.
      
      Furthermore, we should only enfore such a limit when there are in fact
      exclusive events being scheduled on either sibling.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [ Fixed build fail for the !CONFIG_CPU_SUP_INTEL case. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cc1790cf
    • P
      perf/x86: Fix event/group validation · b371b594
      Peter Zijlstra 提交于
      Commit 43b45780 ("perf/x86: Reduce stack usage of
      x86_schedule_events()") violated the rule that 'fake' scheduling; as
      used for event/group validation; should not change the event state.
      
      This went mostly un-noticed because repeated calls of
      x86_pmu::get_event_constraints() would give the same result. And
      x86_pmu::put_event_constraints() would mostly not do anything.
      
      Commit e979121b ("perf/x86/intel: Implement cross-HT corruption
      bug workaround") made the situation much worse by actually setting the
      event->hw.constraint value to NULL, so when validation and actual
      scheduling interact we get NULL ptr derefs.
      
      Fix it by removing the constraint pointer from the event and move it
      back to an array, this time in cpuc instead of on the stack.
      
      validate_group()
        x86_schedule_events()
          event->hw.constraint = c; # store
      
            <context switch>
              perf_task_event_sched_in()
                ...
                  x86_schedule_events();
                    event->hw.constraint = c2; # store
      
                    ...
      
                    put_event_constraints(event); # assume failure to schedule
                      intel_put_event_constraints()
                        event->hw.constraint = NULL;
      
            <context switch end>
      
          c = event->hw.constraint; # read -> NULL
      
          if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref
      
      This in particular is possible when the event in question is a
      cpu-wide event and group-leader, where the validate_group() tries to
      add an event to the group.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 43b45780 ("perf/x86: Reduce stack usage of x86_schedule_events()")
      Fixes: e979121b ("perf/x86/intel: Implement cross-HT corruption bug workaround")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b371b594
  2. 08 5月, 2015 1 次提交
  3. 22 4月, 2015 1 次提交
    • J
      perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu · 3b6e0421
      Jiri Olsa 提交于
      The core_pmu does not define cpu_* callbacks, which handles
      allocation of 'struct cpu_hw_events::shared_regs' data,
      initialization of debug store and PMU_FL_EXCL_CNTRS counters.
      
      While this probably won't happen on bare metal, virtual CPU can
      define x86_pmu.extra_regs together with PMU version 1 and thus
      be using core_pmu -> using shared_regs data without it being
      allocated. That could could leave to following panic:
      
      	BUG: unable to handle kernel NULL pointer dereference at (null)
      	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40
      
      	SNIP
      
      	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
      	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
      	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
      	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
      	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
      	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
      	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
      	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
      	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
      	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
      	 [<ffffffff811a993a>] ? dput+0x9a/0x150
      	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
      	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
      	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
      	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
      	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
      	 [<ffffffff8119c731>] ? path_put+0x31/0x40
      	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
      	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
      	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
      	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
      	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
      	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
      	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
      	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
      	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
      	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
      	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0
      
      Adding cpu_(prepare|starting|dying) for core_pmu to have
      shared_regs data allocated for core_pmu. AFAICS there's no harm
      to initialize debug store and PMU_FL_EXCL_CNTRS either for
      core_pmu.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3b6e0421
  4. 17 4月, 2015 1 次提交
  5. 02 4月, 2015 16 次提交
  6. 27 3月, 2015 3 次提交
    • A
      perf/x86/intel: Add INST_RETIRED.ALL workarounds · 294fe0f5
      Andi Kleen 提交于
      On Broadwell INST_RETIRED.ALL cannot be used with any period
      that doesn't have the lowest 6 bits cleared. And the period
      should not be smaller than 128.
      
      This is erratum BDM11 and BDM55:
      
        http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf
      
      BDM11: When using a period < 100; we may get incorrect PEBS/PMI
      interrupts and/or an invalid counter state.
      BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
      records on overflow.
      
      Add a new callback to enforce this, and set it for Broadwell.
      
      How does this handle the case when an app requests a specific
      period with some of the bottom bits set?
      
      Short answer:
      
      Any useful instruction sampling period needs to be 4-6 orders
      of magnitude larger than 128, as an PMI every 128 instructions
      would instantly overwhelm the system and be throttled.
      So the +-64 error from this is really small compared to the
      period, much smaller than normal system jitter.
      
      Long answer (by Peterz):
      
      IFF we guarantee perf_event_attr::sample_period >= 128.
      
      Suppose we start out with sample_period=192; then we'll set period_left
      to 192, we'll end up with left = 128 (we truncate the lower bits). We
      get an interrupt, find that period_left = 64 (>0 so we return 0 and
      don't get an overflow handler), up that to 128. Then we trigger again,
      at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
      an overflow). We increment with sample_period so we get left = 128. We
      fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
      overflow). And on and on.
      
      So while the individual interrupts are 'wrong' we get then with
      interval=256,128 in exactly the right ratio to average out at 192. And
      this works for everything >=128.
      
      So the num_samples*fixed_period thing is still entirely correct +- 127,
      which is good enough I'd say, as you already have that error anyhow.
      
      So no need to 'fix' the tools, al we need to do is refuse to create
      INST_RETIRED:ALL events with sample_period < 128.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      [ Updated comments and changelog a bit. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      294fe0f5
    • A
      perf/x86/intel: Add Broadwell core support · 91f1b705
      Andi Kleen 提交于
      Add Broadwell support for Broadwell to perf.
      
      The basic support is very similar to Haswell. We use the new cache
      event list added for Haswell earlier. The only differences
      are a few bits related to remote nodes. To avoid an extra,
      mostly identical, table these are patched up in the initialization code.
      
      The constraint list has one new event that needs to be handled over Haswell.
      
      Includes code and testing from Kan Liang.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      91f1b705
    • A
      perf/x86/intel: Add new cache events table for Haswell · 0f1b5ca2
      Andi Kleen 提交于
      Haswell offcore events are quite different from Sandy Bridge.
      Add a new table to handle Haswell properly.
      
      Note that the offcore bits listed in the SDM are not quite correct
      (this is currently being fixed). An uptodate list of bits is
      in the patch.
      
      The basic setup is similar to Sandy Bridge. The prefetch columns
      have been removed, as prefetch counting is not very reliable
      on Haswell. One L1 event that is not in the event list anymore
      has been also removed.
      
      - data reads do not include code reads (comparable to earlier Sandy Bridge tables)
      - data counts include speculative execution (except L1 write, dtlb, bpu)
      - remote node access includes both remote memory, remote cache, remote mmio.
      - prefetches are not included in the counts for consistency
        (different from Sandy Bridge, which includes prefetches in the remote node)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0f1b5ca2
  7. 19 2月, 2015 3 次提交
  8. 28 1月, 2015 1 次提交
  9. 29 10月, 2014 1 次提交
    • I
      perf/x86/intel: Revert incomplete and undocumented Broadwell client support · 1776b106
      Ingo Molnar 提交于
      These patches:
      
        86a349a2 ("perf/x86/intel: Add Broadwell core support")
        c46e665f ("perf/x86: Add INST_RETIRED.ALL workarounds")
        fdda3c4a ("perf/x86/intel: Use Broadwell cache event list for Haswell")
      
      introduced magic constants and unexplained changes:
      
        https://lkml.org/lkml/2014/10/28/1128
        https://lkml.org/lkml/2014/10/27/325
        https://lkml.org/lkml/2014/8/27/546
        https://lkml.org/lkml/2014/10/28/546
      
      Peter Zijlstra has attempted to help out, to clean up the mess:
      
        https://lkml.org/lkml/2014/10/28/543
      
      But has not received helpful and constructive replies which makes
      me doubt wether it can all be finished in time until v3.18 is
      released.
      
      Despite various review feedback the author (Andi Kleen) has answered
      only few of the review questions and has generally been uncooperative,
      only giving replies when prompted repeatedly, and only giving minimal
      answers instead of constructively explaining and helping along the effort.
      
      That kind of behavior is not acceptable.
      
      There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
      commit from Andi Kleen:
      
        e735b9db ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
      
        https://lkml.org/lkml/2014/10/22/730
      
      Which is not yet resolved. The uncore driver is independent in theory,
      but the crash makes me worry about how well all these patches were
      tested and makes me uneasy about the level of interminging that the
      Broadwell and Haswell code has received by the commits above.
      
      As a first step to resolve the mess revert the Broadwell client commits
      back to the v3.17 version, before we run out of time and problematic
      code hits a stable upstream kernel.
      
      ( If the Haswell-EP crash is not resolved via a simple fix then we'll have
        to revert the Haswell-EP uncore driver as well. )
      
      The Broadwell client series has to be submitted in a clean fashion, with
      single, well documented changes per patch. If they are submitted in time
      and are accepted during review then they can possibly go into v3.19 but
      will need additional scrutiny due to the rocky history of this patch set.
      
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1776b106
  10. 24 9月, 2014 3 次提交
    • A
      perf/x86/intel: Use Broadwell cache event list for Haswell · fdda3c4a
      Andi Kleen 提交于
      Use the newly added Broadwell cache event list for Haswell too.
      All Haswell and Broadwell events and offcore masks used in these lists
      are identical.
      
      However Haswell is very different from the Sandy Bridge
      list that was used previously. That fixes a wide range of mis-counting
      cache events.
      
      The node events are now only for retired memory events, so prefetching
      and speculative memory accesses are not included. They are PEBS
      capable now, which makes it much easier to sample for them, plus it's
      possible to create address maps with -d.
      
      The prefetch events are gone now. They way the hardware counts
      them is very misleading (some prefetches included, others not), so
      it seemed best to leave them out.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409683455-29168-5-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fdda3c4a
    • A
      perf/x86: Add INST_RETIRED.ALL workarounds · c46e665f
      Andi Kleen 提交于
      On Broadwell INST_RETIRED.ALL cannot be used with any period
      that doesn't have the lowest 6 bits cleared. And the period
      should not be smaller than 128.
      
      Add a new callback to enforce this, and set it for Broadwell.
      
      This is erratum BDM57 and BDM11.
      
      How does this handle the case when an app requests a specific
      period with some of the bottom bits set
      
      The apps thinks it is sampling at X occurences per sample, when it is
      in fact at X - 63 (worst case).
      
      Short answer:
      
      Any useful instruction sampling period needs to be 4-6 orders
      of magnitude larger than 128, as an PMI every 128 instructions
      would instantly overwhelm the system and be throttled.
      So the +-64 error from this is really small compared to the
      period, much smaller than normal system jitter.
      
      Long answer:
      
      <write up by Peter:>
      
      IFF we guarantee perf_event_attr::sample_period >= 128.
      
      Suppose we start out with sample_period=192; then we'll set period_left
      to 192, we'll end up with left = 128 (we truncate the lower bits). We
      get an interrupt, find that period_left = 64 (>0 so we return 0 and
      don't get an overflow handler), up that to 128. Then we trigger again,
      at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
      an overflow). We increment with sample_period so we get left = 128. We
      fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
      overflow). And on and on.
      
      So while the individual interrupts are 'wrong' we get then with
      interval=256,128 in exactly the right ratio to average out at 192. And
      this works for everything >=128.
      
      So the num_samples*fixed_period thing is still entirely correct +- 127,
      which is good enough I'd say, as you already have that error anyhow.
      
      So no need to 'fix' the tools, al we need to do is refuse to create
      INST_RETIRED:ALL events with sample_period < 128.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Cc: Mark Davies <junk@eslaf.co.uk>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1409683455-29168-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c46e665f
    • A
      perf/x86/intel: Add Broadwell core support · 86a349a2
      Andi Kleen 提交于
      Add Broadwell support for Broadwell Client to perf.  This is very
      similar to Haswell.  It uses a new cache event table, because there
      were various changes there.
      
      The constraint list has one new event that needs to be handled over
      Haswell.
      
      The PEBS event list is the same, so we reuse Haswell's.
      
      [fengguang.wu: make intel_bdw_event_constraints[] static]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86a349a2