1. 05 9月, 2015 1 次提交
  2. 12 8月, 2015 1 次提交
  3. 04 8月, 2015 6 次提交
  4. 19 6月, 2015 2 次提交
  5. 07 6月, 2015 3 次提交
    • Y
      perf/x86/intel: Drain the PEBS buffer during context switches · 9c964efa
      Yan, Zheng 提交于
      Flush the PEBS buffer during context switches if PEBS interrupt threshold
      is larger than one. This allows perf to supply TID for sample outputs.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-6-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9c964efa
    • Y
      perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold) · 3569c0d7
      Yan, Zheng 提交于
      PEBS always had the capability to log samples to its buffers without
      an interrupt. Traditionally perf has not used this but always set the
      PEBS threshold to one.
      
      For frequently occurring events (like cycles or branches or load/store)
      this in term requires using a relatively high sampling period to avoid
      overloading the system, by only processing PMIs. This in term increases
      sampling error.
      
      For the common cases we still need to use the PMI because the PEBS
      hardware has various limitations. The biggest one is that it can not
      supply a callgraph. It also requires setting a fixed period, as the
      hardware does not support adaptive period. Another issue is that it
      cannot supply a time stamp and some other options. To supply a TID it
      requires flushing on context switch. It can however supply the IP, the
      load/store address, TSX information, registers, and some other things.
      
      So we can make PEBS work for some specific cases, basically as long as
      you can do without a callgraph and can set the period you can use this
      new PEBS mode.
      
      The main benefit is the ability to support much lower sampling period
      (down to -c 1000) without extensive overhead.
      
      One use cases is for example to increase the resolution of the c2c tool.
      Another is double checking when you suspect the standard sampling has
      too much sampling error.
      
      Some numbers on the overhead, using cycle soak, comparing the elapsed
      time from "kernbench -M -H" between plain (threshold set to one) and
      multi (large threshold).
      
      The test command for plain:
        "perf record --time -e cycles:p -c $period -- kernbench -M -H"
      
      The test command for multi:
        "perf record --no-time -e cycles:p -c $period -- kernbench -M -H"
      
      ( The only difference of test command between multi and plain is time
        stamp options. Since time stamp is not supported by large PEBS
        threshold, it can be used as a flag to indicate if large threshold is
        enabled during the test. )
      
      	period    plain(Sec)  multi(Sec)  Delta
      	10003     32.7        16.5        16.2
      	20003     30.2        16.2        14.0
      	40003     18.6        14.1        4.5
      	80003     16.8        14.6        2.2
      	100003    16.9        14.1        2.8
      	800003    15.4        15.7        -0.3
      	1000003   15.3        15.2        0.2
      	2000003   15.3        15.1        0.1
      
      With periods below 100003, plain (threshold one) cause much more
      overhead. With 10003 sampling period, the Elapsed Time for multi is
      even 2X faster than plain.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-5-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3569c0d7
    • Y
      perf/x86/intel: Use the PEBS auto reload mechanism when possible · 851559e3
      Yan, Zheng 提交于
      When a fixed period is specified, this patch makes perf use the PEBS
      auto reload mechanism. This makes normal profiling faster, because
      it avoids one costly MSR write in the PMI handler.
      
      However, the reset value will be loaded by hardware assist. There is a
      small delay compared to the previous non-auto-reload mechanism. The
      delay time is arbitrary, but very small. The assist cost is 400-800
      cycles, assuming common cases with everything cached. The minimum period
      the patch currently uses is 10000. In that extreme case it can be ~10%
      if cycles are used.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-2-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      851559e3
  6. 27 5月, 2015 11 次提交
    • B
      sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask() · 06931e62
      Bartosz Golaszewski 提交于
      Rename topology_thread_cpumask() to topology_sibling_cpumask()
      for more consistency with scheduler code.
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Benoit Cousson <bcousson@baylibre.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jean Delvare <jdelvare@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      06931e62
    • P
      perf/x86/intel: Simplify put_exclusive_constraints() · ba040653
      Peter Zijlstra 提交于
      Don't bother with taking locks if we're not actually going to do
      anything. Also, drop the _irqsave(), this is very much only called
      from IRQ-disabled context.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ba040653
    • P
      perf/x86/intel: Remove intel_excl_states::init_state · 43ef205b
      Peter Zijlstra 提交于
      For some obscure reason intel_{start,stop}_scheduling() copy the HT
      state to an intermediate array. This would make sense if we ever were
      to make changes to it which we'd have to discard.
      
      Except we don't. By the time we call intel_commit_scheduling() we're;
      as the name implies; committed to them. We'll never back out.
      
      A further hint its pointless is that stop_scheduling() unconditionally
      publishes the state.
      
      So the intermediate array is pointless, modify the state in place and
      kill the extra array.
      
      And remove the pointless array initialization: INTEL_EXCL_UNUSED == 0.
      
      Note; all is serialized by intel_excl_cntr::lock.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      43ef205b
    • P
      perf/x86/intel: Remove pointless tests · 1fe684e3
      Peter Zijlstra 提交于
      Both intel_commit_scheduling() and intel_get_excl_contraints() test
      for cntr < 0.
      
      The only way that can happen (aside from a bug) is through
      validate_event(), however that is already captured by the
      cpuc->is_fake test.
      
      So remove these test and simplify the code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1fe684e3
    • P
      perf/x86/intel: Clean up intel_commit_scheduling() placement · 0c41e756
      Peter Zijlstra 提交于
      Move the code of intel_commit_scheduling() to the right place, which is
      in between start() and stop().
      
      No change in functionality.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0c41e756
    • P
      perf/x86/intel: Make WARN()ings consistent · 17186ccd
      Peter Zijlstra 提交于
      The intel_commit_scheduling() callback is pointlessly different from
      the start and stop scheduling callback.
      
      Furthermore, the constraint should never be NULL, so remove that test.
      
      Even though we'll never get called (because we NULL the callbacks)
      when !is_ht_workaround_enabled() put that test in.
      
      Collapse the (pointless) WARN_ON_ONCE() and bail on !cpuc->excl_cntrs --
      this is doubly pointless, because its the same condition as
      is_ht_workaround_enabled() which was already pointless because the
      whole method won't ever be called.
      
      Furthremore, make all the !excl_cntrs test WARN_ON_ONCE(); they're all
      pointless, because the above, either the function
      ({get,put}_excl_constraint) are already predicated on it existing or
      the is_ht_workaround_enabled() thing is the same test.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      17186ccd
    • P
      perf/x86/intel: Simplify the dynamic constraint code somewhat · aaf932e8
      Peter Zijlstra 提交于
      We have two 'struct event_constraint' local variables in
      intel_get_excl_constraints(): 'cx' and 'c'.
      
      Instead of using 'cx' after the dynamic allocation, put all 'cx' inside
      the dynamic allocation block and use 'c' outside of it.
      
      Also use direct assignment to copy the structure; let the compiler
      figure it out.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aaf932e8
    • P
      perf/x86/intel: Add lockdep assert · b32ed7f5
      Peter Zijlstra 提交于
      Lockdep is very good at finding incorrect IRQ state while locking and
      is far better at telling us if we hold a lock than the _is_locked()
      API. It also generates less code for !DEBUG kernels.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b32ed7f5
    • P
      perf/x86/intel: Correct local vs remote sibling state · 1c565833
      Peter Zijlstra 提交于
      For some obscure reason the current code accounts the current SMT
      thread's state on the remote thread and reads the remote's state on
      the local SMT thread.
      
      While internally consistent, and 'correct' its pointless confusion we
      can do without.
      
      Flip them the right way around.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1c565833
    • P
      perf/x86: Improve HT workaround GP counter constraint · cc1790cf
      Peter Zijlstra 提交于
      The (SNB/IVB/HSW) HT bug only affects events that can be programmed
      onto GP counters, therefore we should only limit the number of GP
      counters that can be used per cpu -- iow we should not constrain the
      FP counters.
      
      Furthermore, we should only enfore such a limit when there are in fact
      exclusive events being scheduled on either sibling.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [ Fixed build fail for the !CONFIG_CPU_SUP_INTEL case. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cc1790cf
    • P
      perf/x86: Fix event/group validation · b371b594
      Peter Zijlstra 提交于
      Commit 43b45780 ("perf/x86: Reduce stack usage of
      x86_schedule_events()") violated the rule that 'fake' scheduling; as
      used for event/group validation; should not change the event state.
      
      This went mostly un-noticed because repeated calls of
      x86_pmu::get_event_constraints() would give the same result. And
      x86_pmu::put_event_constraints() would mostly not do anything.
      
      Commit e979121b ("perf/x86/intel: Implement cross-HT corruption
      bug workaround") made the situation much worse by actually setting the
      event->hw.constraint value to NULL, so when validation and actual
      scheduling interact we get NULL ptr derefs.
      
      Fix it by removing the constraint pointer from the event and move it
      back to an array, this time in cpuc instead of on the stack.
      
      validate_group()
        x86_schedule_events()
          event->hw.constraint = c; # store
      
            <context switch>
              perf_task_event_sched_in()
                ...
                  x86_schedule_events();
                    event->hw.constraint = c2; # store
      
                    ...
      
                    put_event_constraints(event); # assume failure to schedule
                      intel_put_event_constraints()
                        event->hw.constraint = NULL;
      
            <context switch end>
      
          c = event->hw.constraint; # read -> NULL
      
          if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref
      
      This in particular is possible when the event in question is a
      cpu-wide event and group-leader, where the validate_group() tries to
      add an event to the group.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 43b45780 ("perf/x86: Reduce stack usage of x86_schedule_events()")
      Fixes: e979121b ("perf/x86/intel: Implement cross-HT corruption bug workaround")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b371b594
  7. 08 5月, 2015 1 次提交
  8. 22 4月, 2015 1 次提交
    • J
      perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu · 3b6e0421
      Jiri Olsa 提交于
      The core_pmu does not define cpu_* callbacks, which handles
      allocation of 'struct cpu_hw_events::shared_regs' data,
      initialization of debug store and PMU_FL_EXCL_CNTRS counters.
      
      While this probably won't happen on bare metal, virtual CPU can
      define x86_pmu.extra_regs together with PMU version 1 and thus
      be using core_pmu -> using shared_regs data without it being
      allocated. That could could leave to following panic:
      
      	BUG: unable to handle kernel NULL pointer dereference at (null)
      	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40
      
      	SNIP
      
      	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
      	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
      	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
      	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
      	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
      	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
      	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
      	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
      	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
      	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
      	 [<ffffffff811a993a>] ? dput+0x9a/0x150
      	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
      	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
      	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
      	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
      	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
      	 [<ffffffff8119c731>] ? path_put+0x31/0x40
      	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
      	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
      	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
      	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
      	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
      	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
      	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
      	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
      	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
      	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
      	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0
      
      Adding cpu_(prepare|starting|dying) for core_pmu to have
      shared_regs data allocated for core_pmu. AFAICS there's no harm
      to initialize debug store and PMU_FL_EXCL_CNTRS either for
      core_pmu.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3b6e0421
  9. 17 4月, 2015 1 次提交
  10. 02 4月, 2015 13 次提交