1. 13 3月, 2012 1 次提交
    • P
      perf/x86: Fix local vs remote memory events for NHM/WSM · 87e24f4b
      Peter Zijlstra 提交于
      Verified using the below proglet.. before:
      
      [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0
      remote write
      
       Performance counter stats for './numa 0':
      
               2,101,554 node-stores
               2,096,931 node-store-misses
      
             5.021546079 seconds time elapsed
      
      [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1
      local write
      
       Performance counter stats for './numa 1':
      
                 501,137 node-stores
                     199 node-store-misses
      
             5.124451068 seconds time elapsed
      
      After:
      
      [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0
      remote write
      
       Performance counter stats for './numa 0':
      
               2,107,516 node-stores
               2,097,187 node-store-misses
      
             5.012755149 seconds time elapsed
      
      [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1
      local write
      
       Performance counter stats for './numa 1':
      
               2,063,355 node-stores
                     165 node-store-misses
      
             5.082091494 seconds time elapsed
      
      #define _GNU_SOURCE
      
      #include <sched.h>
      #include <stdio.h>
      #include <errno.h>
      #include <sys/mman.h>
      #include <sys/types.h>
      #include <dirent.h>
      #include <signal.h>
      #include <unistd.h>
      #include <numaif.h>
      #include <stdlib.h>
      
      #define SIZE (32*1024*1024)
      
      volatile int done;
      
      void sig_done(int sig)
      {
      	done = 1;
      }
      
      int main(int argc, char **argv)
      {
      	cpu_set_t *mask, *mask2;
      	size_t size;
      	int i, err, t;
      	int nrcpus = 1024;
      	char *mem;
      	unsigned long nodemask = 0x01; /* node 0 */
      	DIR *node;
      	struct dirent *de;
      	int read = 0;
      	int local = 0;
      
      	if (argc < 2) {
      		printf("usage: %s [0-3]\n", argv[0]);
      		printf("  bit0 - local/remote\n");
      		printf("  bit1 - read/write\n");
      		exit(0);
      	}
      
      	switch (atoi(argv[1])) {
      	case 0:
      		printf("remote write\n");
      		break;
      	case 1:
      		printf("local write\n");
      		local = 1;
      		break;
      	case 2:
      		printf("remote read\n");
      		read = 1;
      		break;
      	case 3:
      		printf("local read\n");
      		local = 1;
      		read = 1;
      		break;
      	}
      
      	mask = CPU_ALLOC(nrcpus);
      	size = CPU_ALLOC_SIZE(nrcpus);
      	CPU_ZERO_S(size, mask);
      
      	node = opendir("/sys/devices/system/node/node0/");
      	if (!node)
      		perror("opendir");
      	while ((de = readdir(node))) {
      		int cpu;
      
      		if (sscanf(de->d_name, "cpu%d", &cpu) == 1)
      			CPU_SET_S(cpu, size, mask);
      	}
      	closedir(node);
      
      	mask2 = CPU_ALLOC(nrcpus);
      	CPU_ZERO_S(size, mask2);
      	for (i = 0; i < size; i++)
      		CPU_SET_S(i, size, mask2);
      	CPU_XOR_S(size, mask2, mask2, mask); // invert
      
      	if (!local)
      		mask = mask2;
      
      	err = sched_setaffinity(0, size, mask);
      	if (err)
      		perror("sched_setaffinity");
      
      	mem = mmap(0, SIZE, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	err = mbind(mem, SIZE, MPOL_BIND, &nodemask, 8*sizeof(nodemask), MPOL_MF_MOVE);
      	if (err)
      		perror("mbind");
      
      	signal(SIGALRM, sig_done);
      	alarm(5);
      
      	if (!read) {
      		while (!done) {
      			for (i = 0; i < SIZE; i++)
      				mem[i] = 0x01;
      		}
      	} else {
      		while (!done) {
      			for (i = 0; i < SIZE; i++)
      				t += *(volatile char *)(mem + i);
      		}
      	}
      
      	return 0;
      }
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/n/tip-tq73sxus35xmqpojf7ootxgs@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      87e24f4b
  2. 24 12月, 2011 1 次提交
  3. 21 12月, 2011 2 次提交
  4. 07 12月, 2011 2 次提交
  5. 05 12月, 2011 1 次提交
  6. 01 11月, 2011 1 次提交
    • P
      x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE · 69c60c88
      Paul Gortmaker 提交于
      These files were implicitly getting EXPORT_SYMBOL via device.h
      which was including module.h, but that will be fixed up shortly.
      
      By fixing these now, we can avoid seeing things like:
      
      arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’
      
      [ with input from Randy Dunlap <rdunlap@xenotime.net> and also
        from Stephen Rothwell <sfr@canb.auug.org.au> ]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      69c60c88
  7. 10 10月, 2011 1 次提交
  8. 26 9月, 2011 1 次提交
  9. 14 8月, 2011 1 次提交
  10. 09 8月, 2011 1 次提交
  11. 01 7月, 2011 7 次提交
  12. 06 5月, 2011 2 次提交
  13. 29 4月, 2011 3 次提交
  14. 28 4月, 2011 1 次提交
  15. 27 4月, 2011 4 次提交
  16. 26 4月, 2011 1 次提交
  17. 22 4月, 2011 2 次提交
    • P
      perf, x86: Update/fix Intel Nehalem cache events · f4929bd3
      Peter Zijlstra 提交于
      Change the Nehalem cache events to use retired memory instruction counters
      (similar to Westmere), this greatly improves the provided stats.
      
      Using:
      
      main ()
      {
              int i;
      
              for (i = 0; i < 1000000000; i++) {
                      asm("mov (%%rsp), %%rbx;"
                          "mov %%rbx, (%%rsp);" : : : "rbx");
              }
      }
      
      We find:
      
       $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
        Performance counter stats for './loop_1b_loads+stores' (10 runs):
            4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
            4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
            1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
               1.565184942  seconds time elapsed   ( +-   0.005% )
      
      The 5b is surprising - we'd expect 1b:
      
       $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
        Performance counter stats for './loop_1b_loads+stores' (10 runs):
            4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
            1,000,021,961 r10b:u                     ( +-   0.000% )
            1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
               1.565055422  seconds time elapsed   ( +-   0.003% )
      
      Which this patch thus fixes.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f4929bd3
    • A
      perf: Support Xeon E7's via the Westmere PMU driver · b2508e82
      Andi Kleen 提交于
      There's a new model number public, 47, for Xeon E7 (aka Westmere EX).
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: a.p.zijlstra@chello.nl
      Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b2508e82
  18. 05 3月, 2011 1 次提交
  19. 04 3月, 2011 3 次提交
    • A
      perf: Fix LLC-* events on Intel Nehalem/Westmere · e994d7d2
      Andi Kleen 提交于
      On Intel Nehalem and Westmere CPUs the generic perf LLC-* events count the
      L2 caches, not the real L3 LLC - this was inconsistent with behavior on
      other CPUs.
      
      Fixing this requires the use of the special OFFCORE_RESPONSE
      events which need a separate mask register.
      
      This has been implemented by the previous patch, now use this infrastructure
      to set correct events for the LLC-* on Nehalem and Westmere.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1299119690-13991-3-git-send-email-ming.m.lin@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e994d7d2
    • A
      perf: Add support for supplementary event registers · a7e3ed1e
      Andi Kleen 提交于
      Change logs against Andi's original version:
      
      - Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra)
      - Fixed a major event scheduling issue. There cannot be a ref++ on an
        event that has already done ref++ once and without calling
        put_constraint() in between. (Stephane Eranian)
      - Use thread_cpumask for percore allocation. (Lin Ming)
      - Use MSR names in the extra reg lists. (Lin Ming)
      - Remove redundant "c = NULL" in intel_percore_constraints
      - Fix comment of perf_event_attr::config1
      
      Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event
      that can be used to monitor any offcore accesses from a core.
      This is a very useful event for various tunings, and it's
      also needed to implement the generic LLC-* events correctly.
      
      Unfortunately this event requires programming a mask in a separate
      register. And worse this separate register is per core, not per
      CPU thread.
      
      This patch:
      
      - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters.
        The extra parameters are passed by user space in the
        perf_event_attr::config1 field.
      
      - Adds support to the Intel perf_event core to schedule per
        core resources. This adds fairly generic infrastructure that
        can be also used for other per core resources.
        The basic code has is patterned after the similar AMD northbridge
        constraints code.
      
      Thanks to Stephane Eranian who pointed out some problems
      in the original version and suggested improvements.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a7e3ed1e
    • S
      perf_events: Update PEBS event constraints · 17e31629
      Stephane Eranian 提交于
      This patch updates PEBS event constraints for Intel Atom, Nehalem, Westmere.
      
      This patch also reorganizes the PEBS format/constraint detection code. It is
      now based on processor model and not PEBS format. Two processors may use the
      same PEBS format without have the same list of PEBS events.
      
      In this second version, we simplified the initialization of the PEBS
      constraints by leveraging the existing switch() statement in perf_event_intel.c.
      We also renamed the constraint tables to be more consistent with regular
      constraints.
      
      In this 3rd version, we drop BR_INST_RETIRED.MISPRED from Intel Atom as it does
      not seem to work. Use MISPREDICTED_BRANCH_RETIRED instead. Also add FP_ASSIST.*
      o both Intel Nehalem and Westmere. I misssed those in the earlier patches.
      Events were tested using libpfm4 perf_examples.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d6e6b02.815bdf0a.637b.07a7@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      17e31629
  20. 02 3月, 2011 1 次提交
    • L
      perf, x86: Add Intel SandyBridge CPU support · b06b3d49
      Lin Ming 提交于
      This patch adds basic SandyBridge support, including hardware
      cache events and PEBS events support.
      
      It has been tested on SandyBridge CPUs with perf stat and also
      with PEBS based profiling - both work fine.
      
      The patch does not affect other models.
      
      v2 -> v3:
       - fix PEBS event 0xd0 with right umask combinations
       - move snb pebs constraint assignment to intel_pmu_init
      
      v1 -> v2:
       - add more raw and PEBS events constraints
       - use offcore events for LLC-* cache events
       - remove the call to Nehalem workaround enable_all function
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      LKML-Reference: <1299072424.2175.24.camel@localhost>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b06b3d49
  21. 16 2月, 2011 1 次提交
  22. 30 12月, 2010 1 次提交
  23. 16 12月, 2010 1 次提交