1. 17 2月, 2016 1 次提交
  2. 09 2月, 2016 1 次提交
  3. 06 1月, 2016 1 次提交
  4. 19 12月, 2015 1 次提交
  5. 02 4月, 2015 2 次提交
  6. 27 8月, 2014 1 次提交
    • C
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89cbc767
  7. 02 9月, 2013 1 次提交
  8. 28 5月, 2013 1 次提交
  9. 21 4月, 2013 1 次提交
  10. 16 2月, 2013 1 次提交
  11. 07 2月, 2013 5 次提交
  12. 24 10月, 2012 1 次提交
  13. 06 7月, 2012 1 次提交
  14. 18 5月, 2012 1 次提交
  15. 09 5月, 2012 1 次提交
    • R
      perf/x86-ibs: Precise event sampling with IBS for AMD CPUs · 450bbd49
      Robert Richter 提交于
      This patch adds support for precise event sampling with IBS. There are
      two counting modes to count either cycles or micro-ops. If the
      corresponding performance counter events (hw events) are setup with
      the precise flag set, the request is redirected to the ibs pmu:
      
       perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
       perf record -a -e r076:p ...          # same as -e cpu-cycles:p
       perf record -a -e r0C1:p ...          # use ibs op counting micro-ops
      
      Each ibs sample contains a linear address that points to the
      instruction that was causing the sample to trigger. With ibs we have
      skid 0. Thus, ibs supports precise levels 1 and 2. Samples are marked
      with the PERF_EFLAGS_EXACT flag set. In rare cases the rip is invalid
      when IBS was not able to record the rip correctly. Then the
      PERF_EFLAGS_EXACT flag is cleared and the rip is taken from pt_regs.
      
      V2:
      * don't drop samples in precise level 2 if rip is invalid, instead
        support the PERF_EFLAGS_EXACT flag
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20120502103309.GP18810@erda.amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      450bbd49
  16. 26 4月, 2012 1 次提交
  17. 17 3月, 2012 1 次提交
  18. 05 3月, 2012 1 次提交
  19. 02 3月, 2012 1 次提交
  20. 06 12月, 2011 1 次提交
    • R
      perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
      Robert Richter 提交于
      The current x86 event scheduler fails to resolve scheduling problems
      of certain combinations of events and constraints. This happens if the
      counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight, e.g. constraints
      of the AMD family 15h pmu:
      
                              counter mask    weight
      
       amd_f15_PMC30          0x09            2  <--- overlapping counters
       amd_f15_PMC20          0x07            3
       amd_f15_PMC53          0x38            3
      
      The scheduler does not find then an existing solution. Here is an
      example:
      
       event code     counter         failure         possible solution
      
       0x02E          PMC[3,0]        0               3
       0x043          PMC[2:0]        1               0
       0x045          PMC[2:0]        2               1
       0x046          PMC[2:0]        FAIL            2
      
      The event scheduler may not select the correct counter in the first
      cycle because it needs to know which subsequent events will be
      scheduled. It may fail to schedule the events then.
      
      To solve this, we now save the scheduler state of events with
      overlapping counter counstraints.  If we fail to schedule the events
      we rollback to those states and try to use another free counter.
      
      Constraints with overlapping counters are marked with a new introduced
      overlap flag. We set the overlap flag for such constraints to give the
      scheduler a hint which events to select for counter rescheduling. The
      EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
      
      Care must be taken as the rescheduling algorithm is O(n!) which will
      increase scheduling cycles for an over-commited system dramatically.
      The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
      masks must be kept at a minimum. Thus, the current stack is limited to
      2 states to limit the number of loops the algorithm takes in the worst
      case.
      
      On systems with no overlapping-counter constraints, this
      implementation does not increase the loop count compared to the
      previous algorithm.
      
      V2:
      * Renamed redo -> overlap.
      * Reimplementation using perf scheduling helper functions.
      
      V3:
      * Added WARN_ON_ONCE() if out of save states.
      * Changed function interface of perf_sched_restore_state() to use bool
        as return value.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      bc1738f6
  21. 10 10月, 2011 1 次提交
  22. 06 10月, 2011 1 次提交
  23. 28 9月, 2011 1 次提交
  24. 26 9月, 2011 1 次提交
  25. 14 8月, 2011 1 次提交
  26. 01 7月, 2011 1 次提交
    • P
      perf, arch: Add generic NODE cache events · 89d6c0b5
      Peter Zijlstra 提交于
      Add a NODE level to the generic cache events which is used to measure
      local vs remote memory accesses. Like all other cache events, an
      ACCESS is HIT+MISS, if there is no way to distinguish between reads
      and writes do reads only etc..
      
      The below needs filling out for !x86 (which I filled out with
      unsupported events).
      
      I'm fairly sure ARM can leave it like that since it doesn't strike me as
      an architecture that even has NUMA support. SH might have something since
      it does appear to have some NUMA bits.
      
      Sparc64, PowerPC and MIPS certainly want a good look there since they
      clearly are NUMA capable.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>
      89d6c0b5
  27. 29 4月, 2011 1 次提交
  28. 19 4月, 2011 2 次提交
  29. 16 2月, 2011 1 次提交
    • R
      perf, x86: Add support for AMD family 15h core counters · 4979d272
      Robert Richter 提交于
      This patch adds support for AMD family 15h core counters. There are
      major changes compared to family 10h. First, there is a new perfctr
      msr range for up to 6 counters. Northbridge counters are separate
      now. This patch only adds support for core counters. Second, certain
      events may only be scheduled on certain counters. For this we need to
      extend the event scheduling and constraints.
      
      We use cpu feature flags to calculate family 15h msr address offsets.
      This way we later can implement a faster ALTERNATIVE() version for
      this.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110215135210.GB5874@erda.amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4979d272
  30. 09 12月, 2010 1 次提交
  31. 11 11月, 2010 1 次提交
  32. 19 10月, 2010 1 次提交
  33. 03 7月, 2010 1 次提交
  34. 03 4月, 2010 1 次提交