提交 27151f17 编写于 作者: L Linus Torvalds

Merge tag 'perf-tools-for-v5.15-2021-09-04' of...

Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Improvements for the flamegraph python script, including:
       - Display perf.data header
       - Display PIDs of user stacks
       - Added option to change color scheme
       - Default to blue/green color scheme to improve accessibility
       - Correctly identify kernel stacks when debuginfo is available

   - Improvements for 'perf bench futex':
       - Add --mlockall parameter
       - Add --broadcast and --pi to the 'requeue' sub benchmark

   - Add support for PMU aliases.

   - Introduce an ARM Coresight ETE decoder.

   - Add a 'perf bench' entry for evlist open/close operations, to help
     quantify improvements with multithreading 'perf record'.

   - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
     script's python scripting.

   - Add a 'perf test' entry for PMU aliases.

   - Add a 'perf test' entry for 'perf record/perf report/perf script'
     pipe mode.

  Fixes:

   - perf script dlfilter (API for filtering via dynamically loaded
     shared object introduced in v5.14) fixes and a 'perf test' entry
     for it.

   - Fix get_current_dir_name() compilation on Android.

   - Fix issues with asciidoc and double dashes uses.

   - Fix memory leaks in the BTF handling code.

   - Fix leftover problems in the Documentation from the infrastructure
     originally lifted from the git codebase.

   - Fix *probe_vfs_getname.sh 'perf test' failures.

   - Handle fd gaps in 'perf test's test__dso_data_reopen().

   - Make sure to show disasembly warnings for 'perf annotate --stdio'.

   - Fix output from pipe to file and vice-versa in 'perf
     record/report/script'.

   - Correct 'perf data -h' output.

   - Fix wrong comm in system-wide mode with 'perf record --delay'.

   - Do not allow --for-each-cgroup without cpu in 'perf stat'

   - Make 'perf test --skip' work on shell tests.

   - Fix libperf's verbose printing.

  Misc improvements:

   - Preparatory patches for multithreading various 'perf record' phases
     (synthesizing, opening, recording, etc).

   - Add sparse context/locking annotations in compiler-types.h, also to
     help with the multithreading effort.

   - Optimize the generation of the arch specific erno tables used in
     'perf trace'.

   - Optimize libperf's perf_cpu_map__max().

   - Improve ARM's CoreSight warnings.

   - Report collisions in AUX records.

   - Improve warnings for the LLVM 'perf test' entry.

   - Improve the PMU events 'perf test' codebase.

   - perf test: Do not compare overheads in the zstd comp test

   - Better support annotation on ARM.

   - Update 'perf trace's cmd string table to decode sys_bpf() first
     arg.

  Vendor events:

   - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
     Elhart Lake.

   - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
     servers.

  Hardware tracing:

   - Improvements for the ARM hardware tracing auxtrace support"

* tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
  perf tests: Add test for PMU aliases
  perf pmu: Add PMU alias support
  perf session: Report collisions in AUX records
  perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
  perf build: Report failure for testing feature libopencsd
  perf cs-etm: Show a warning for an unknown magic number
  perf cs-etm: Print the decoder name
  perf cs-etm: Create ETE decoder
  perf cs-etm: Update OpenCSD decoder for ETE
  perf cs-etm: Fix typo
  perf cs-etm: Save TRCDEVARCH register
  perf cs-etm: Refactor out ETMv4 header saving
  perf cs-etm: Initialise architecture based on TRCIDR1
  perf cs-etm: Refactor initialisation of decoder params.
  tools build: Fix feature detect clean for out of source builds
  perf evlist: Add evlist__for_each_entry_from() macro
  perf evsel: Handle precise_ip fallback in evsel__open_cpu()
  perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
  perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
  perf evsel: Move ignore_missing_thread() to fallback code
  ...
...@@ -32,7 +32,7 @@ all: $(OUTPUT)fixdep ...@@ -32,7 +32,7 @@ all: $(OUTPUT)fixdep
# Make sure there's anything to clean, # Make sure there's anything to clean,
# feature contains check for existing OUTPUT # feature contains check for existing OUTPUT
TMP_O := $(if $(OUTPUT),$(OUTPUT)/feature,./) TMP_O := $(if $(OUTPUT),$(OUTPUT)feature/,./)
clean: clean:
$(call QUIET_CLEAN, fixdep) $(call QUIET_CLEAN, fixdep)
......
...@@ -34,7 +34,6 @@ FEATURE_TESTS_BASIC := \ ...@@ -34,7 +34,6 @@ FEATURE_TESTS_BASIC := \
dwarf_getlocations \ dwarf_getlocations \
eventfd \ eventfd \
fortify-source \ fortify-source \
sync-compare-and-swap \
get_current_dir_name \ get_current_dir_name \
gettid \ gettid \
glibc \ glibc \
......
...@@ -9,7 +9,6 @@ FILES= \ ...@@ -9,7 +9,6 @@ FILES= \
test-dwarf_getlocations.bin \ test-dwarf_getlocations.bin \
test-eventfd.bin \ test-eventfd.bin \
test-fortify-source.bin \ test-fortify-source.bin \
test-sync-compare-and-swap.bin \
test-get_current_dir_name.bin \ test-get_current_dir_name.bin \
test-glibc.bin \ test-glibc.bin \
test-gtk2.bin \ test-gtk2.bin \
...@@ -260,9 +259,6 @@ $(OUTPUT)test-libdw-dwarf-unwind.bin: ...@@ -260,9 +259,6 @@ $(OUTPUT)test-libdw-dwarf-unwind.bin:
$(OUTPUT)test-libbabeltrace.bin: $(OUTPUT)test-libbabeltrace.bin:
$(BUILD) # -lbabeltrace provided by $(FEATURE_CHECK_LDFLAGS-libbabeltrace) $(BUILD) # -lbabeltrace provided by $(FEATURE_CHECK_LDFLAGS-libbabeltrace)
$(OUTPUT)test-sync-compare-and-swap.bin:
$(BUILD)
$(OUTPUT)test-compile-32.bin: $(OUTPUT)test-compile-32.bin:
$(CC) -m32 -o $@ test-compile.c $(CC) -m32 -o $@ test-compile.c
......
...@@ -106,10 +106,6 @@ ...@@ -106,10 +106,6 @@
# include "test-libdw-dwarf-unwind.c" # include "test-libdw-dwarf-unwind.c"
#undef main #undef main
#define main main_test_sync_compare_and_swap
# include "test-sync-compare-and-swap.c"
#undef main
#define main main_test_zlib #define main main_test_zlib
# include "test-zlib.c" # include "test-zlib.c"
#undef main #undef main
......
...@@ -4,9 +4,9 @@ ...@@ -4,9 +4,9 @@
/* /*
* Check OpenCSD library version is sufficient to provide required features * Check OpenCSD library version is sufficient to provide required features
*/ */
#define OCSD_MIN_VER ((1 << 16) | (0 << 8) | (0)) #define OCSD_MIN_VER ((1 << 16) | (1 << 8) | (1))
#if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER) #if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER)
#error "OpenCSD >= 1.0.0 is required" #error "OpenCSD >= 1.1.1 is required"
#endif #endif
int main(void) int main(void)
......
// SPDX-License-Identifier: GPL-2.0
#include <stdint.h>
volatile uint64_t x;
int main(int argc, char *argv[])
{
uint64_t old, new = argc;
(void)argv;
do {
old = __sync_val_compare_and_swap(&x, 0, 0);
} while (!__sync_bool_compare_and_swap(&x, old, new));
return old == new;
}
...@@ -13,6 +13,24 @@ ...@@ -13,6 +13,24 @@
#define __has_builtin(x) (0) #define __has_builtin(x) (0)
#endif #endif
#ifdef __CHECKER__
/* context/locking */
# define __must_hold(x) __attribute__((context(x,1,1)))
# define __acquires(x) __attribute__((context(x,0,1)))
# define __releases(x) __attribute__((context(x,1,0)))
# define __acquire(x) __context__(x,1)
# define __release(x) __context__(x,-1)
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
#else /* __CHECKER__ */
/* context/locking */
# define __must_hold(x)
# define __acquires(x)
# define __releases(x)
# define __acquire(x) (void)0
# define __release(x) (void)0
# define __cond_lock(x,c) (c)
#endif /* __CHECKER__ */
/* Compiler specific macros. */ /* Compiler specific macros. */
#ifdef __GNUC__ #ifdef __GNUC__
#include <linux/compiler-gcc.h> #include <linux/compiler-gcc.h>
......
...@@ -68,6 +68,11 @@ static struct perf_cpu_map *cpu_map__default_new(void) ...@@ -68,6 +68,11 @@ static struct perf_cpu_map *cpu_map__default_new(void)
return cpus; return cpus;
} }
struct perf_cpu_map *perf_cpu_map__default_new(void)
{
return cpu_map__default_new();
}
static int cmp_int(const void *a, const void *b) static int cmp_int(const void *a, const void *b)
{ {
return *(const int *)a - *(const int*)b; return *(const int *)a - *(const int*)b;
...@@ -277,14 +282,8 @@ int perf_cpu_map__idx(struct perf_cpu_map *cpus, int cpu) ...@@ -277,14 +282,8 @@ int perf_cpu_map__idx(struct perf_cpu_map *cpus, int cpu)
int perf_cpu_map__max(struct perf_cpu_map *map) int perf_cpu_map__max(struct perf_cpu_map *map)
{ {
int i, max = -1; // cpu_map__trim_new() qsort()s it, cpu_map__default_new() sorts it as well.
return map->nr > 0 ? map->map[map->nr - 1] : -1;
for (i = 0; i < map->nr; i++) {
if (map->map[i] > max)
max = map->map[i];
}
return max;
} }
/* /*
......
...@@ -23,6 +23,8 @@ static inline int get_verbose(char **argv, int argc) ...@@ -23,6 +23,8 @@ static inline int get_verbose(char **argv, int argc)
break; break;
} }
} }
optind = 1;
return verbose; return verbose;
} }
......
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
struct perf_cpu_map; struct perf_cpu_map;
LIBPERF_API struct perf_cpu_map *perf_cpu_map__dummy_new(void); LIBPERF_API struct perf_cpu_map *perf_cpu_map__dummy_new(void);
LIBPERF_API struct perf_cpu_map *perf_cpu_map__default_new(void);
LIBPERF_API struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list); LIBPERF_API struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list);
LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file); LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map); LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
......
...@@ -133,6 +133,7 @@ struct option { ...@@ -133,6 +133,7 @@ struct option {
#define OPT_SET_PTR(s, l, v, h, p) { .type = OPTION_SET_PTR, .short_name = (s), .long_name = (l), .value = (v), .help = (h), .defval = (p) } #define OPT_SET_PTR(s, l, v, h, p) { .type = OPTION_SET_PTR, .short_name = (s), .long_name = (l), .value = (v), .help = (h), .defval = (p) }
#define OPT_INTEGER(s, l, v, h) { .type = OPTION_INTEGER, .short_name = (s), .long_name = (l), .value = check_vtype(v, int *), .help = (h) } #define OPT_INTEGER(s, l, v, h) { .type = OPTION_INTEGER, .short_name = (s), .long_name = (l), .value = check_vtype(v, int *), .help = (h) }
#define OPT_UINTEGER(s, l, v, h) { .type = OPTION_UINTEGER, .short_name = (s), .long_name = (l), .value = check_vtype(v, unsigned int *), .help = (h) } #define OPT_UINTEGER(s, l, v, h) { .type = OPTION_UINTEGER, .short_name = (s), .long_name = (l), .value = check_vtype(v, unsigned int *), .help = (h) }
#define OPT_UINTEGER_OPTARG(s, l, v, d, h) { .type = OPTION_UINTEGER, .short_name = (s), .long_name = (l), .value = check_vtype(v, unsigned int *), .help = (h), .flags = PARSE_OPT_OPTARG, .defval = (intptr_t)(d) }
#define OPT_LONG(s, l, v, h) { .type = OPTION_LONG, .short_name = (s), .long_name = (l), .value = check_vtype(v, long *), .help = (h) } #define OPT_LONG(s, l, v, h) { .type = OPTION_LONG, .short_name = (s), .long_name = (l), .value = check_vtype(v, long *), .help = (h) }
#define OPT_ULONG(s, l, v, h) { .type = OPTION_ULONG, .short_name = (s), .long_name = (l), .value = check_vtype(v, unsigned long *), .help = (h) } #define OPT_ULONG(s, l, v, h) { .type = OPTION_ULONG, .short_name = (s), .long_name = (l), .value = check_vtype(v, unsigned long *), .help = (h) }
#define OPT_U64(s, l, v, h) { .type = OPTION_U64, .short_name = (s), .long_name = (l), .value = check_vtype(v, u64 *), .help = (h) } #define OPT_U64(s, l, v, h) { .type = OPTION_U64, .short_name = (s), .long_name = (l), .value = check_vtype(v, u64 *), .help = (h) }
......
...@@ -2,6 +2,10 @@ ...@@ -2,6 +2,10 @@
include ../../scripts/Makefile.include include ../../scripts/Makefile.include
include ../../scripts/utilities.mak include ../../scripts/utilities.mak
ARTICLES =
# with their own formatting rules.
SP_ARTICLES =
MAN1_TXT= \ MAN1_TXT= \
$(filter-out $(addsuffix .txt, $(ARTICLES) $(SP_ARTICLES)), \ $(filter-out $(addsuffix .txt, $(ARTICLES) $(SP_ARTICLES)), \
$(wildcard perf-*.txt)) \ $(wildcard perf-*.txt)) \
...@@ -16,13 +20,6 @@ _MAN_HTML=$(patsubst %.txt,%.html,$(MAN_TXT)) ...@@ -16,13 +20,6 @@ _MAN_HTML=$(patsubst %.txt,%.html,$(MAN_TXT))
MAN_XML=$(addprefix $(OUTPUT),$(_MAN_XML)) MAN_XML=$(addprefix $(OUTPUT),$(_MAN_XML))
MAN_HTML=$(addprefix $(OUTPUT),$(_MAN_HTML)) MAN_HTML=$(addprefix $(OUTPUT),$(_MAN_HTML))
ARTICLES =
# with their own formatting rules.
SP_ARTICLES =
API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technical/api-index.txt, $(wildcard technical/api-*.txt)))
SP_ARTICLES += $(API_DOCS)
SP_ARTICLES += technical/api-index
_DOC_HTML = $(_MAN_HTML) _DOC_HTML = $(_MAN_HTML)
_DOC_HTML+=$(patsubst %,%.html,$(ARTICLES) $(SP_ARTICLES)) _DOC_HTML+=$(patsubst %,%.html,$(ARTICLES) $(SP_ARTICLES))
DOC_HTML=$(addprefix $(OUTPUT),$(_DOC_HTML)) DOC_HTML=$(addprefix $(OUTPUT),$(_DOC_HTML))
...@@ -173,7 +170,7 @@ ifneq ($(V),1) ...@@ -173,7 +170,7 @@ ifneq ($(V),1)
endif endif
endif endif
all: html man all: html man info
html: $(DOC_HTML) html: $(DOC_HTML)
...@@ -186,8 +183,6 @@ man7: $(DOC_MAN7) ...@@ -186,8 +183,6 @@ man7: $(DOC_MAN7)
info: $(OUTPUT)perf.info $(OUTPUT)perfman.info info: $(OUTPUT)perf.info $(OUTPUT)perfman.info
pdf: $(OUTPUT)user-manual.pdf
install: install-man install: install-man
check-man-tools: check-man-tools:
...@@ -225,11 +220,6 @@ install-info: info ...@@ -225,11 +220,6 @@ install-info: info
echo "No directory found in $(DESTDIR)$(infodir)" >&2 ; \ echo "No directory found in $(DESTDIR)$(infodir)" >&2 ; \
fi fi
install-pdf: pdf
$(call QUIET_INSTALL, Documentation-pdf) \
$(INSTALL) -d -m 755 $(DESTDIR)$(pdfdir); \
$(INSTALL) -m 644 $(OUTPUT)user-manual.pdf $(DESTDIR)$(pdfdir)
#install-html: html #install-html: html
# '$(SHELL_PATH_SQ)' ./install-webdoc.sh $(DESTDIR)$(htmldir) # '$(SHELL_PATH_SQ)' ./install-webdoc.sh $(DESTDIR)$(htmldir)
...@@ -244,33 +234,13 @@ $(OUTPUT)doc.dep : $(wildcard *.txt) build-docdep.perl ...@@ -244,33 +234,13 @@ $(OUTPUT)doc.dep : $(wildcard *.txt) build-docdep.perl
-include $(OUTPUT)doc.dep -include $(OUTPUT)doc.dep
_cmds_txt = cmds-ancillaryinterrogators.txt \
cmds-ancillarymanipulators.txt \
cmds-mainporcelain.txt \
cmds-plumbinginterrogators.txt \
cmds-plumbingmanipulators.txt \
cmds-synchingrepositories.txt \
cmds-synchelpers.txt \
cmds-purehelpers.txt \
cmds-foreignscminterface.txt
cmds_txt=$(addprefix $(OUTPUT),$(_cmds_txt))
$(cmds_txt): $(OUTPUT)cmd-list.made
$(OUTPUT)cmd-list.made: cmd-list.perl ../command-list.txt $(MAN1_TXT)
$(QUIET_GEN)$(RM) $@ && \
$(PERL_PATH) ./cmd-list.perl ../command-list.txt $(QUIET_STDERR) && \
date >$@
CLEAN_FILES = \ CLEAN_FILES = \
$(MAN_XML) $(addsuffix +,$(MAN_XML)) \ $(MAN_XML) $(addsuffix +,$(MAN_XML)) \
$(MAN_HTML) $(addsuffix +,$(MAN_HTML)) \ $(MAN_HTML) $(addsuffix +,$(MAN_HTML)) \
$(DOC_HTML) $(DOC_MAN1) $(DOC_MAN5) $(DOC_MAN7) \ $(DOC_HTML) $(DOC_MAN1) $(DOC_MAN5) $(DOC_MAN7) \
$(OUTPUT)*.texi $(OUTPUT)*.texi+ $(OUTPUT)*.texi++ \ $(OUTPUT)*.texi $(OUTPUT)*.texi+ $(OUTPUT)*.texi++ \
$(OUTPUT)perf.info $(OUTPUT)perfman.info \ $(OUTPUT)perf.info $(OUTPUT)perfman.info $(OUTPUT)doc.dep \
$(OUTPUT)howto-index.txt $(OUTPUT)howto/*.html $(OUTPUT)doc.dep \ $(OUTPUT)technical/api-*.html $(OUTPUT)technical/api-index.txt
$(OUTPUT)technical/api-*.html $(OUTPUT)technical/api-index.txt \
$(cmds_txt) $(OUTPUT)*.made
clean: clean:
$(call QUIET_CLEAN, Documentation) $(RM) $(CLEAN_FILES) $(call QUIET_CLEAN, Documentation) $(RM) $(CLEAN_FILES)
...@@ -304,24 +274,6 @@ $(OUTPUT)%.xml : %.txt ...@@ -304,24 +274,6 @@ $(OUTPUT)%.xml : %.txt
XSLT = docbook.xsl XSLT = docbook.xsl
XSLTOPTS = --xinclude --stringparam html.stylesheet docbook-xsl.css XSLTOPTS = --xinclude --stringparam html.stylesheet docbook-xsl.css
$(OUTPUT)user-manual.html: $(OUTPUT)user-manual.xml
$(QUIET_XSLTPROC)xsltproc $(XSLTOPTS) -o $@ $(XSLT) $<
$(OUTPUT)perf.info: $(OUTPUT)user-manual.texi
$(QUIET_MAKEINFO)$(MAKEINFO) --no-split -o $@ $(OUTPUT)user-manual.texi
$(OUTPUT)user-manual.texi: $(OUTPUT)user-manual.xml
$(QUIET_DB2TEXI)$(RM) $@+ $@ && \
$(DOCBOOK2X_TEXI) $(OUTPUT)user-manual.xml --encoding=UTF-8 --to-stdout >$@++ && \
$(PERL_PATH) fix-texi.perl <$@++ >$@+ && \
rm $@++ && \
mv $@+ $@
$(OUTPUT)user-manual.pdf: $(OUTPUT)user-manual.xml
$(QUIET_DBLATEX)$(RM) $@+ $@ && \
$(DBLATEX) -o $@+ -p /etc/asciidoc/dblatex/asciidoc-dblatex.xsl -s /etc/asciidoc/dblatex/asciidoc-dblatex.sty $< && \
mv $@+ $@
$(OUTPUT)perfman.texi: $(MAN_XML) cat-texi.perl $(OUTPUT)perfman.texi: $(MAN_XML) cat-texi.perl
$(QUIET_DB2TEXI)$(RM) $@+ $@ && \ $(QUIET_DB2TEXI)$(RM) $@+ $@ && \
($(foreach xml,$(MAN_XML),$(DOCBOOK2X_TEXI) --encoding=UTF-8 \ ($(foreach xml,$(MAN_XML),$(DOCBOOK2X_TEXI) --encoding=UTF-8 \
...@@ -331,28 +283,18 @@ $(OUTPUT)perfman.texi: $(MAN_XML) cat-texi.perl ...@@ -331,28 +283,18 @@ $(OUTPUT)perfman.texi: $(MAN_XML) cat-texi.perl
mv $@+ $@ mv $@+ $@
$(OUTPUT)perfman.info: $(OUTPUT)perfman.texi $(OUTPUT)perfman.info: $(OUTPUT)perfman.texi
$(QUIET_MAKEINFO)$(MAKEINFO) --no-split --no-validate $*.texi $(QUIET_MAKEINFO)$(MAKEINFO) --no-split --no-validate -o $@ $*.texi
$(patsubst %.txt,%.texi,$(MAN_TXT)): %.texi : %.xml $(patsubst %.txt,%.texi,$(MAN_TXT)): %.texi : %.xml
$(QUIET_DB2TEXI)$(RM) $@+ $@ && \ $(QUIET_DB2TEXI)$(RM) $@+ $@ && \
$(DOCBOOK2X_TEXI) --to-stdout $*.xml >$@+ && \ $(DOCBOOK2X_TEXI) --to-stdout $*.xml >$@+ && \
mv $@+ $@ mv $@+ $@
howto-index.txt: howto-index.sh $(wildcard howto/*.txt)
$(QUIET_GEN)$(RM) $@+ $@ && \
'$(SHELL_PATH_SQ)' ./howto-index.sh $(wildcard howto/*.txt) >$@+ && \
mv $@+ $@
$(patsubst %,%.html,$(ARTICLES)) : %.html : %.txt $(patsubst %,%.html,$(ARTICLES)) : %.html : %.txt
$(QUIET_ASCIIDOC)$(ASCIIDOC) -b $(ASCIIDOC_HTML) $*.txt $(QUIET_ASCIIDOC)$(ASCIIDOC) -b $(ASCIIDOC_HTML) $*.txt
WEBDOC_DEST = /pub/software/tools/perf/docs WEBDOC_DEST = /pub/software/tools/perf/docs
$(patsubst %.txt,%.html,$(wildcard howto/*.txt)): %.html : %.txt
$(QUIET_ASCIIDOC)$(RM) $@+ $@ && \
sed -e '1,/^$$/d' $< | $(ASCIIDOC) -b $(ASCIIDOC_HTML) - >$@+ && \
mv $@+ $@
# UNIMPLEMENTED # UNIMPLEMENTED
#install-webdoc : html #install-webdoc : html
# '$(SHELL_PATH_SQ)' ./install-webdoc.sh $(WEBDOC_DEST) # '$(SHELL_PATH_SQ)' ./install-webdoc.sh $(WEBDOC_DEST)
......
#!/usr/bin/perl
my %include = ();
my %included = ();
for my $text (<*.txt>) {
open I, '<', $text || die "cannot read: $text";
while (<I>) {
if (/^include::/) {
chomp;
s/^include::\s*//;
s/\[\]//;
$include{$text}{$_} = 1;
$included{$_} = 1;
}
}
close I;
}
# Do we care about chained includes???
my $changed = 1;
while ($changed) {
$changed = 0;
while (my ($text, $included) = each %include) {
for my $i (keys %$included) {
# $text has include::$i; if $i includes $j
# $text indirectly includes $j.
if (exists $include{$i}) {
for my $j (keys %{$include{$i}}) {
if (!exists $include{$text}{$j}) {
$include{$text}{$j} = 1;
$included{$j} = 1;
$changed = 1;
}
}
}
}
}
}
while (my ($text, $included) = each %include) {
if (! exists $included{$text} &&
(my $base = $text) =~ s/\.txt$//) {
print "$base.html $base.xml : ", join(" ", keys %$included), "\n";
}
}
#!/usr/bin/perl -w
use strict;
use warnings;
my @menu = ();
my $output = $ARGV[0];
open my $tmp, '>', "$output.tmp";
while (<STDIN>) {
next if (/^\\input texinfo/../\@node Top/);
next if (/^\@bye/ || /^\.ft/);
if (s/^\@top (.*)/\@node $1,,,Top/) {
push @menu, $1;
}
s/\(\@pxref\{\[(URLS|REMOTES)\]}\)//;
s/\@anchor\{[^{}]*\}//g;
print $tmp $_;
}
close $tmp;
print '\input texinfo
@setfilename gitman.info
@documentencoding UTF-8
@dircategory Development
@direntry
* Git Man Pages: (gitman). Manual pages for Git revision control system
@end direntry
@node Top,,, (dir)
@top Git Manual Pages
@documentlanguage en
@menu
';
for (@menu) {
print "* ${_}::\n";
}
print "\@end menu\n";
open $tmp, '<', "$output.tmp";
while (<$tmp>) {
print;
}
close $tmp;
print "\@bye\n";
unlink "$output.tmp";
...@@ -140,7 +140,7 @@ displayed. The percentage is the event's running time/enabling time. ...@@ -140,7 +140,7 @@ displayed. The percentage is the event's running time/enabling time.
One example, 'triad_loop' runs on cpu16 (atom core), while we can see the One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
scaled value for core cycles is 160,444,092 and the percentage is 0.47%. scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
perf stat -e cycles -- taskset -c 16 ./triad_loop perf stat -e cycles \-- taskset -c 16 ./triad_loop
As previous, two events are created. As previous, two events are created.
......
...@@ -9,7 +9,7 @@ SYNOPSIS ...@@ -9,7 +9,7 @@ SYNOPSIS
-------- --------
[verse] [verse]
'perf c2c record' [<options>] <command> 'perf c2c record' [<options>] <command>
'perf c2c record' [<options>] -- [<record command options>] <command> 'perf c2c record' [<options>] \-- [<record command options>] <command>
'perf c2c report' [<options>] 'perf c2c report' [<options>]
DESCRIPTION DESCRIPTION
......
...@@ -32,7 +32,7 @@ The API for filtering consists of the following: ...@@ -32,7 +32,7 @@ The API for filtering consists of the following:
---- ----
#include <perf/perf_dlfilter.h> #include <perf/perf_dlfilter.h>
const struct perf_dlfilter_fns perf_dlfilter_fns; struct perf_dlfilter_fns perf_dlfilter_fns;
int start(void **data, void *ctx); int start(void **data, void *ctx);
int stop(void *data, void *ctx); int stop(void *data, void *ctx);
...@@ -214,7 +214,7 @@ Filter out everything except branches from "foo" to "bar": ...@@ -214,7 +214,7 @@ Filter out everything except branches from "foo" to "bar":
#include <perf/perf_dlfilter.h> #include <perf/perf_dlfilter.h>
#include <string.h> #include <string.h>
const struct perf_dlfilter_fns perf_dlfilter_fns; struct perf_dlfilter_fns perf_dlfilter_fns;
int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx) int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{ {
...@@ -246,6 +246,14 @@ To use the filter with perf script: ...@@ -246,6 +246,14 @@ To use the filter with perf script:
perf script --dlfilter dlfilter-example.so perf script --dlfilter dlfilter-example.so
NOTES
-----
The dlfilter .so file will be dependent on shared libraries. If those change,
it may be necessary to rebuild the .so. Also there may be unexpected results
if the .so uses different versions of the shared libraries that perf uses.
Versions can be checked using the ldd command.
SEE ALSO SEE ALSO
-------- --------
linkperf:perf-script[1] linkperf:perf-script[1]
...@@ -9,7 +9,7 @@ SYNOPSIS ...@@ -9,7 +9,7 @@ SYNOPSIS
-------- --------
[verse] [verse]
'perf iostat' list 'perf iostat' list
'perf iostat' <ports> -- <command> [<options>] 'perf iostat' <ports> \-- <command> [<options>]
DESCRIPTION DESCRIPTION
----------- -----------
...@@ -85,4 +85,4 @@ EXAMPLES ...@@ -85,4 +85,4 @@ EXAMPLES
SEE ALSO SEE ALSO
-------- --------
linkperf:perf-stat[1] linkperf:perf-stat[1]
\ No newline at end of file
...@@ -9,7 +9,7 @@ SYNOPSIS ...@@ -9,7 +9,7 @@ SYNOPSIS
-------- --------
[verse] [verse]
'perf record' [-e <EVENT> | --event=EVENT] [-a] <command> 'perf record' [-e <EVENT> | --event=EVENT] [-a] <command>
'perf record' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>] 'perf record' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
DESCRIPTION DESCRIPTION
----------- -----------
......
...@@ -167,7 +167,7 @@ below). ...@@ -167,7 +167,7 @@ below).
Following those are the 'event handler' functions generated one for Following those are the 'event handler' functions generated one for
every event in the 'perf record' output. The handler functions take every event in the 'perf record' output. The handler functions take
the form subsystem__event_name, and contain named parameters, one for the form subsystem\__event_name, and contain named parameters, one for
each field in the event; in this case, there's only one event, each field in the event; in this case, there's only one event,
raw_syscalls__sys_enter(). (see the EVENT HANDLERS section below for raw_syscalls__sys_enter(). (see the EVENT HANDLERS section below for
more info on event handlers). more info on event handlers).
......
...@@ -106,7 +106,7 @@ OPTIONS ...@@ -106,7 +106,7 @@ OPTIONS
Pass 'arg' as an argument to the dlfilter. --dlarg may be repeated Pass 'arg' as an argument to the dlfilter. --dlarg may be repeated
to add more arguments. to add more arguments.
--list-dlfilters=:: --list-dlfilters::
Display a list of available dlfilters. Use with option -v (must come Display a list of available dlfilters. Use with option -v (must come
before option --list-dlfilters) to show long descriptions. before option --list-dlfilters) to show long descriptions.
......
...@@ -9,8 +9,8 @@ SYNOPSIS ...@@ -9,8 +9,8 @@ SYNOPSIS
-------- --------
[verse] [verse]
'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command> 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>] 'perf stat' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>] 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] \-- <command> [<options>]
'perf stat' report [-i file] 'perf stat' report [-i file]
DESCRIPTION DESCRIPTION
...@@ -217,8 +217,8 @@ Append to the output file designated with the -o option. Ignored if -o is not sp ...@@ -217,8 +217,8 @@ Append to the output file designated with the -o option. Ignored if -o is not sp
Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive
with it. --append may be used here. Examples: with it. --append may be used here. Examples:
3>results perf stat --log-fd 3 -- $cmd 3>results perf stat --log-fd 3 \-- $cmd
3>>results perf stat --log-fd 3 --append -- $cmd 3>>results perf stat --log-fd 3 --append \-- $cmd
--control=fifo:ctl-fifo[,ack-fifo]:: --control=fifo:ctl-fifo[,ack-fifo]::
--control=fd:ctl-fd[,ack-fd]:: --control=fd:ctl-fd[,ack-fd]::
...@@ -245,7 +245,7 @@ disable events during measurements: ...@@ -245,7 +245,7 @@ disable events during measurements:
perf stat -D -1 -e cpu-cycles -a -I 1000 \ perf stat -D -1 -e cpu-cycles -a -I 1000 \
--control fd:${ctl_fd},${ctl_fd_ack} \ --control fd:${ctl_fd},${ctl_fd_ack} \
-- sleep 30 & \-- sleep 30 &
perf_pid=$! perf_pid=$!
sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})" sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
...@@ -265,7 +265,7 @@ disable events during measurements: ...@@ -265,7 +265,7 @@ disable events during measurements:
--post:: --post::
Pre and post measurement hooks, e.g.: Pre and post measurement hooks, e.g.:
perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defconfig-build/ bzImage
-I msecs:: -I msecs::
--interval-print msecs:: --interval-print msecs::
...@@ -496,7 +496,7 @@ $ perf config stat.no-csv-summary=true ...@@ -496,7 +496,7 @@ $ perf config stat.no-csv-summary=true
EXAMPLES EXAMPLES
-------- --------
$ perf stat -- make $ perf stat \-- make
Performance counter stats for 'make': Performance counter stats for 'make':
......
...@@ -133,10 +133,10 @@ FEATURE_CHECK_LDFLAGS-libunwind = $(LIBUNWIND_LDFLAGS) $(LIBUNWIND_LIBS) ...@@ -133,10 +133,10 @@ FEATURE_CHECK_LDFLAGS-libunwind = $(LIBUNWIND_LDFLAGS) $(LIBUNWIND_LIBS)
FEATURE_CHECK_CFLAGS-libunwind-debug-frame = $(LIBUNWIND_CFLAGS) FEATURE_CHECK_CFLAGS-libunwind-debug-frame = $(LIBUNWIND_CFLAGS)
FEATURE_CHECK_LDFLAGS-libunwind-debug-frame = $(LIBUNWIND_LDFLAGS) $(LIBUNWIND_LIBS) FEATURE_CHECK_LDFLAGS-libunwind-debug-frame = $(LIBUNWIND_LDFLAGS) $(LIBUNWIND_LIBS)
FEATURE_CHECK_LDFLAGS-libunwind-arm = -lunwind -lunwind-arm FEATURE_CHECK_LDFLAGS-libunwind-arm += -lunwind -lunwind-arm
FEATURE_CHECK_LDFLAGS-libunwind-aarch64 = -lunwind -lunwind-aarch64 FEATURE_CHECK_LDFLAGS-libunwind-aarch64 += -lunwind -lunwind-aarch64
FEATURE_CHECK_LDFLAGS-libunwind-x86 = -lunwind -llzma -lunwind-x86 FEATURE_CHECK_LDFLAGS-libunwind-x86 += -lunwind -llzma -lunwind-x86
FEATURE_CHECK_LDFLAGS-libunwind-x86_64 = -lunwind -llzma -lunwind-x86_64 FEATURE_CHECK_LDFLAGS-libunwind-x86_64 += -lunwind -llzma -lunwind-x86_64
FEATURE_CHECK_LDFLAGS-libcrypto = -lcrypto FEATURE_CHECK_LDFLAGS-libcrypto = -lcrypto
...@@ -349,10 +349,6 @@ CXXFLAGS += $(INC_FLAGS) ...@@ -349,10 +349,6 @@ CXXFLAGS += $(INC_FLAGS)
LIBPERF_CFLAGS := $(CORE_CFLAGS) $(EXTRA_CFLAGS) LIBPERF_CFLAGS := $(CORE_CFLAGS) $(EXTRA_CFLAGS)
ifeq ($(feature-sync-compare-and-swap), 1)
CFLAGS += -DHAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
endif
ifeq ($(feature-pthread-attr-setaffinity-np), 1) ifeq ($(feature-pthread-attr-setaffinity-np), 1)
CFLAGS += -DHAVE_PTHREAD_ATTR_SETAFFINITY_NP CFLAGS += -DHAVE_PTHREAD_ATTR_SETAFFINITY_NP
endif endif
...@@ -493,6 +489,8 @@ ifdef CORESIGHT ...@@ -493,6 +489,8 @@ ifdef CORESIGHT
CFLAGS += -DCS_RAW_PACKED CFLAGS += -DCS_RAW_PACKED
endif endif
endif endif
else
dummy := $(error Error: No libopencsd library found or the version is not up-to-date. Please install recent libopencsd to build with CORESIGHT=1)
endif endif
endif endif
......
...@@ -360,8 +360,11 @@ ifndef NO_JVMTI ...@@ -360,8 +360,11 @@ ifndef NO_JVMTI
PROGRAMS += $(OUTPUT)$(LIBJVMTI) PROGRAMS += $(OUTPUT)$(LIBJVMTI)
endif endif
DLFILTERS := dlfilter-test-api-v0.so
DLFILTERS := $(patsubst %,$(OUTPUT)dlfilters/%,$(DLFILTERS))
# what 'all' will build and 'install' will install, in perfexecdir # what 'all' will build and 'install' will install, in perfexecdir
ALL_PROGRAMS = $(PROGRAMS) $(SCRIPTS) ALL_PROGRAMS = $(PROGRAMS) $(SCRIPTS) $(DLFILTERS)
# what 'all' will build but not install in perfexecdir # what 'all' will build but not install in perfexecdir
OTHER_PROGRAMS = $(OUTPUT)perf OTHER_PROGRAMS = $(OUTPUT)perf
...@@ -780,6 +783,13 @@ $(OUTPUT)perf-read-vdsox32: perf-read-vdso.c util/find-map.c ...@@ -780,6 +783,13 @@ $(OUTPUT)perf-read-vdsox32: perf-read-vdso.c util/find-map.c
$(QUIET_CC)$(CC) -mx32 $(filter -static,$(LDFLAGS)) -Wall -Werror -o $@ perf-read-vdso.c $(QUIET_CC)$(CC) -mx32 $(filter -static,$(LDFLAGS)) -Wall -Werror -o $@ perf-read-vdso.c
endif endif
$(OUTPUT)dlfilters/%.o: dlfilters/%.c include/perf/perf_dlfilter.h
$(Q)$(MKDIR) -p $(OUTPUT)dlfilters
$(QUIET_CC)$(CC) -c -Iinclude $(EXTRA_CFLAGS) -o $@ -fpic $<
$(OUTPUT)dlfilters/%.so: $(OUTPUT)dlfilters/%.o
$(QUIET_LINK)$(CC) $(EXTRA_CFLAGS) -shared -o $@ $<
ifndef NO_JVMTI ifndef NO_JVMTI
LIBJVMTI_IN := $(OUTPUT)jvmti/jvmti-in.o LIBJVMTI_IN := $(OUTPUT)jvmti/jvmti-in.o
...@@ -925,7 +935,7 @@ install-tools: all install-gtk ...@@ -925,7 +935,7 @@ install-tools: all install-gtk
$(INSTALL) $(OUTPUT)perf '$(DESTDIR_SQ)$(bindir_SQ)'; \ $(INSTALL) $(OUTPUT)perf '$(DESTDIR_SQ)$(bindir_SQ)'; \
$(LN) '$(DESTDIR_SQ)$(bindir_SQ)/perf' '$(DESTDIR_SQ)$(bindir_SQ)/trace'; \ $(LN) '$(DESTDIR_SQ)$(bindir_SQ)/perf' '$(DESTDIR_SQ)$(bindir_SQ)/trace'; \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(includedir_SQ)/perf'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(includedir_SQ)/perf'; \
$(INSTALL) util/perf_dlfilter.h -t '$(DESTDIR_SQ)$(includedir_SQ)/perf' $(INSTALL) -m 644 include/perf/perf_dlfilter.h -t '$(DESTDIR_SQ)$(includedir_SQ)/perf'
ifndef NO_PERF_READ_VDSO32 ifndef NO_PERF_READ_VDSO32
$(call QUIET_INSTALL, perf-read-vdso32) \ $(call QUIET_INSTALL, perf-read-vdso32) \
$(INSTALL) $(OUTPUT)perf-read-vdso32 '$(DESTDIR_SQ)$(bindir_SQ)'; $(INSTALL) $(OUTPUT)perf-read-vdso32 '$(DESTDIR_SQ)$(bindir_SQ)';
...@@ -978,6 +988,9 @@ ifndef NO_LIBPYTHON ...@@ -978,6 +988,9 @@ ifndef NO_LIBPYTHON
$(INSTALL) scripts/python/*.py -m 644 -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'; \ $(INSTALL) scripts/python/*.py -m 644 -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'; \
$(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' $(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
endif endif
$(call QUIET_INSTALL, dlfilters) \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/dlfilters'; \
$(INSTALL) $(DLFILTERS) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/dlfilters';
$(call QUIET_INSTALL, perf_completion-script) \ $(call QUIET_INSTALL, perf_completion-script) \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d'; \
$(INSTALL) perf-completion.sh '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf' $(INSTALL) perf-completion.sh '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf'
......
...@@ -107,3 +107,35 @@ struct auxtrace_record ...@@ -107,3 +107,35 @@ struct auxtrace_record
*err = 0; *err = 0;
return NULL; return NULL;
} }
#if defined(__arm__)
u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
{
struct perf_event_mmap_page *pc = mm->userpg;
u64 result;
__asm__ __volatile__(
" ldrd %0, %H0, [%1]"
: "=&r" (result)
: "r" (&pc->aux_head), "Qo" (pc->aux_head)
);
return result;
}
int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
{
struct perf_event_mmap_page *pc = mm->userpg;
/* Ensure all reads are done before we write the tail out */
smp_mb();
__asm__ __volatile__(
" strd %2, %H2, [%1]"
: "=Qo" (pc->aux_tail)
: "r" (&pc->aux_tail), "r" (tail)
);
return 0;
}
#endif
...@@ -47,15 +47,17 @@ static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = { ...@@ -47,15 +47,17 @@ static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = {
[CS_ETM_ETMIDR] = "mgmt/etmidr", [CS_ETM_ETMIDR] = "mgmt/etmidr",
}; };
static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = { static const char * const metadata_etmv4_ro[] = {
[CS_ETMV4_TRCIDR0] = "trcidr/trcidr0", [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0",
[CS_ETMV4_TRCIDR1] = "trcidr/trcidr1", [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1",
[CS_ETMV4_TRCIDR2] = "trcidr/trcidr2", [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2",
[CS_ETMV4_TRCIDR8] = "trcidr/trcidr8", [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8",
[CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus", [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus",
[CS_ETE_TRCDEVARCH] = "mgmt/trcdevarch"
}; };
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu); static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu);
static bool cs_etm_is_ete(struct auxtrace_record *itr, int cpu);
static int cs_etm_set_context_id(struct auxtrace_record *itr, static int cs_etm_set_context_id(struct auxtrace_record *itr,
struct evsel *evsel, int cpu) struct evsel *evsel, int cpu)
...@@ -73,7 +75,7 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr, ...@@ -73,7 +75,7 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr,
if (!cs_etm_is_etmv4(itr, cpu)) if (!cs_etm_is_etmv4(itr, cpu))
goto out; goto out;
/* Get a handle on TRCIRD2 */ /* Get a handle on TRCIDR2 */
snprintf(path, PATH_MAX, "cpu%d/%s", snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]); cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]);
err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val); err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
...@@ -533,7 +535,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, ...@@ -533,7 +535,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
struct evlist *evlist __maybe_unused) struct evlist *evlist __maybe_unused)
{ {
int i; int i;
int etmv3 = 0, etmv4 = 0; int etmv3 = 0, etmv4 = 0, ete = 0;
struct perf_cpu_map *event_cpus = evlist->core.cpus; struct perf_cpu_map *event_cpus = evlist->core.cpus;
struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL); struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
...@@ -544,7 +546,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, ...@@ -544,7 +546,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
!cpu_map__has(online_cpus, i)) !cpu_map__has(online_cpus, i))
continue; continue;
if (cs_etm_is_etmv4(itr, i)) if (cs_etm_is_ete(itr, i))
ete++;
else if (cs_etm_is_etmv4(itr, i))
etmv4++; etmv4++;
else else
etmv3++; etmv3++;
...@@ -555,7 +559,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, ...@@ -555,7 +559,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
if (!cpu_map__has(online_cpus, i)) if (!cpu_map__has(online_cpus, i))
continue; continue;
if (cs_etm_is_etmv4(itr, i)) if (cs_etm_is_ete(itr, i))
ete++;
else if (cs_etm_is_etmv4(itr, i))
etmv4++; etmv4++;
else else
etmv3++; etmv3++;
...@@ -565,6 +571,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, ...@@ -565,6 +571,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
perf_cpu_map__put(online_cpus); perf_cpu_map__put(online_cpus);
return (CS_ETM_HEADER_SIZE + return (CS_ETM_HEADER_SIZE +
(ete * CS_ETE_PRIV_SIZE) +
(etmv4 * CS_ETMV4_PRIV_SIZE) + (etmv4 * CS_ETMV4_PRIV_SIZE) +
(etmv3 * CS_ETMV3_PRIV_SIZE)); (etmv3 * CS_ETMV3_PRIV_SIZE));
} }
...@@ -607,6 +614,49 @@ static int cs_etm_get_ro(struct perf_pmu *pmu, int cpu, const char *path) ...@@ -607,6 +614,49 @@ static int cs_etm_get_ro(struct perf_pmu *pmu, int cpu, const char *path)
return val; return val;
} }
#define TRCDEVARCH_ARCHPART_SHIFT 0
#define TRCDEVARCH_ARCHPART_MASK GENMASK(11, 0)
#define TRCDEVARCH_ARCHPART(x) (((x) & TRCDEVARCH_ARCHPART_MASK) >> TRCDEVARCH_ARCHPART_SHIFT)
#define TRCDEVARCH_ARCHVER_SHIFT 12
#define TRCDEVARCH_ARCHVER_MASK GENMASK(15, 12)
#define TRCDEVARCH_ARCHVER(x) (((x) & TRCDEVARCH_ARCHVER_MASK) >> TRCDEVARCH_ARCHVER_SHIFT)
static bool cs_etm_is_ete(struct auxtrace_record *itr, int cpu)
{
struct cs_etm_recording *ptr = container_of(itr, struct cs_etm_recording, itr);
struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
int trcdevarch = cs_etm_get_ro(cs_etm_pmu, cpu, metadata_etmv4_ro[CS_ETE_TRCDEVARCH]);
/*
* ETE if ARCHVER is 5 (ARCHVER is 4 for ETM) and ARCHPART is 0xA13.
* See ETM_DEVARCH_ETE_ARCH in coresight-etm4x.h
*/
return TRCDEVARCH_ARCHVER(trcdevarch) == 5 && TRCDEVARCH_ARCHPART(trcdevarch) == 0xA13;
}
static void cs_etm_save_etmv4_header(__u64 data[], struct auxtrace_record *itr, int cpu)
{
struct cs_etm_recording *ptr = container_of(itr, struct cs_etm_recording, itr);
struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
/* Get trace configuration register */
data[CS_ETMV4_TRCCONFIGR] = cs_etmv4_get_config(itr);
/* Get traceID from the framework */
data[CS_ETMV4_TRCTRACEIDR] = coresight_get_trace_id(cpu);
/* Get read-only information from sysFS */
data[CS_ETMV4_TRCIDR0] = cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
data[CS_ETMV4_TRCIDR1] = cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TRCIDR1]);
data[CS_ETMV4_TRCIDR2] = cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TRCIDR2]);
data[CS_ETMV4_TRCIDR8] = cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TRCIDR8]);
data[CS_ETMV4_TRCAUTHSTATUS] = cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TRCAUTHSTATUS]);
}
static void cs_etm_get_metadata(int cpu, u32 *offset, static void cs_etm_get_metadata(int cpu, u32 *offset,
struct auxtrace_record *itr, struct auxtrace_record *itr,
struct perf_record_auxtrace_info *info) struct perf_record_auxtrace_info *info)
...@@ -618,31 +668,20 @@ static void cs_etm_get_metadata(int cpu, u32 *offset, ...@@ -618,31 +668,20 @@ static void cs_etm_get_metadata(int cpu, u32 *offset,
struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu; struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
/* first see what kind of tracer this cpu is affined to */ /* first see what kind of tracer this cpu is affined to */
if (cs_etm_is_etmv4(itr, cpu)) { if (cs_etm_is_ete(itr, cpu)) {
magic = __perf_cs_etmv4_magic; magic = __perf_cs_ete_magic;
/* Get trace configuration register */ /* ETE uses the same registers as ETMv4 plus TRCDEVARCH */
info->priv[*offset + CS_ETMV4_TRCCONFIGR] = cs_etm_save_etmv4_header(&info->priv[*offset], itr, cpu);
cs_etmv4_get_config(itr); info->priv[*offset + CS_ETE_TRCDEVARCH] =
/* Get traceID from the framework */
info->priv[*offset + CS_ETMV4_TRCTRACEIDR] =
coresight_get_trace_id(cpu);
/* Get read-only information from sysFS */
info->priv[*offset + CS_ETMV4_TRCIDR0] =
cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
info->priv[*offset + CS_ETMV4_TRCIDR1] =
cs_etm_get_ro(cs_etm_pmu, cpu, cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TRCIDR1]); metadata_etmv4_ro[CS_ETE_TRCDEVARCH]);
info->priv[*offset + CS_ETMV4_TRCIDR2] =
cs_etm_get_ro(cs_etm_pmu, cpu, /* How much space was used */
metadata_etmv4_ro[CS_ETMV4_TRCIDR2]); increment = CS_ETE_PRIV_MAX;
info->priv[*offset + CS_ETMV4_TRCIDR8] = nr_trc_params = CS_ETE_PRIV_MAX - CS_ETM_COMMON_BLK_MAX_V1;
cs_etm_get_ro(cs_etm_pmu, cpu, } else if (cs_etm_is_etmv4(itr, cpu)) {
metadata_etmv4_ro[CS_ETMV4_TRCIDR8]); magic = __perf_cs_etmv4_magic;
info->priv[*offset + CS_ETMV4_TRCAUTHSTATUS] = cs_etm_save_etmv4_header(&info->priv[*offset], itr, cpu);
cs_etm_get_ro(cs_etm_pmu, cpu,
metadata_etmv4_ro
[CS_ETMV4_TRCAUTHSTATUS]);
/* How much space was used */ /* How much space was used */
increment = CS_ETMV4_PRIV_MAX; increment = CS_ETMV4_PRIV_MAX;
......
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
#include <string.h> #include <string.h>
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
#include <fcntl.h>
#include <linux/stddef.h> #include <linux/stddef.h>
#include <linux/perf_event.h> #include <linux/perf_event.h>
#include <linux/zalloc.h>
#include <api/fs/fs.h>
#include <errno.h>
#include "../../../util/intel-pt.h" #include "../../../util/intel-pt.h"
#include "../../../util/intel-bts.h" #include "../../../util/intel-bts.h"
#include "../../../util/pmu.h" #include "../../../util/pmu.h"
#include "../../../util/fncache.h"
#define TEMPLATE_ALIAS "%s/bus/event_source/devices/%s/alias"
struct pmu_alias {
char *name;
char *alias;
struct list_head list;
};
static LIST_HEAD(pmu_alias_name_list);
static bool cached_list;
struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused) struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
{ {
...@@ -18,3 +36,138 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __mayb ...@@ -18,3 +36,138 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __mayb
#endif #endif
return NULL; return NULL;
} }
static void pmu_alias__delete(struct pmu_alias *pmu_alias)
{
if (!pmu_alias)
return;
zfree(&pmu_alias->name);
zfree(&pmu_alias->alias);
free(pmu_alias);
}
static struct pmu_alias *pmu_alias__new(char *name, char *alias)
{
struct pmu_alias *pmu_alias = zalloc(sizeof(*pmu_alias));
if (pmu_alias) {
pmu_alias->name = strdup(name);
if (!pmu_alias->name)
goto out_delete;
pmu_alias->alias = strdup(alias);
if (!pmu_alias->alias)
goto out_delete;
}
return pmu_alias;
out_delete:
pmu_alias__delete(pmu_alias);
return NULL;
}
static int setup_pmu_alias_list(void)
{
char path[PATH_MAX];
DIR *dir;
struct dirent *dent;
const char *sysfs = sysfs__mountpoint();
struct pmu_alias *pmu_alias;
char buf[MAX_PMU_NAME_LEN];
FILE *file;
int ret = -ENOMEM;
if (!sysfs)
return -1;
snprintf(path, PATH_MAX,
"%s" EVENT_SOURCE_DEVICE_PATH, sysfs);
dir = opendir(path);
if (!dir)
return -errno;
while ((dent = readdir(dir))) {
if (!strcmp(dent->d_name, ".") ||
!strcmp(dent->d_name, ".."))
continue;
snprintf(path, PATH_MAX,
TEMPLATE_ALIAS, sysfs, dent->d_name);
if (!file_available(path))
continue;
file = fopen(path, "r");
if (!file)
continue;
if (!fgets(buf, sizeof(buf), file)) {
fclose(file);
continue;
}
fclose(file);
/* Remove the last '\n' */
buf[strlen(buf) - 1] = 0;
pmu_alias = pmu_alias__new(dent->d_name, buf);
if (!pmu_alias)
goto close_dir;
list_add_tail(&pmu_alias->list, &pmu_alias_name_list);
}
ret = 0;
close_dir:
closedir(dir);
return ret;
}
static char *__pmu_find_real_name(const char *name)
{
struct pmu_alias *pmu_alias;
list_for_each_entry(pmu_alias, &pmu_alias_name_list, list) {
if (!strcmp(name, pmu_alias->alias))
return pmu_alias->name;
}
return (char *)name;
}
char *pmu_find_real_name(const char *name)
{
if (cached_list)
return __pmu_find_real_name(name);
setup_pmu_alias_list();
cached_list = true;
return __pmu_find_real_name(name);
}
static char *__pmu_find_alias_name(const char *name)
{
struct pmu_alias *pmu_alias;
list_for_each_entry(pmu_alias, &pmu_alias_name_list, list) {
if (!strcmp(name, pmu_alias->name))
return pmu_alias->alias;
}
return NULL;
}
char *pmu_find_alias_name(const char *name)
{
if (cached_list)
return __pmu_find_alias_name(name);
setup_pmu_alias_list();
cached_list = true;
return __pmu_find_alias_name(name);
}
...@@ -13,6 +13,7 @@ perf-y += synthesize.o ...@@ -13,6 +13,7 @@ perf-y += synthesize.o
perf-y += kallsyms-parse.o perf-y += kallsyms-parse.o
perf-y += find-bit-bench.o perf-y += find-bit-bench.o
perf-y += inject-buildid.o perf-y += inject-buildid.o
perf-y += evlist-open-close.o
perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o
......
...@@ -48,6 +48,7 @@ int bench_epoll_ctl(int argc, const char **argv); ...@@ -48,6 +48,7 @@ int bench_epoll_ctl(int argc, const char **argv);
int bench_synthesize(int argc, const char **argv); int bench_synthesize(int argc, const char **argv);
int bench_kallsyms_parse(int argc, const char **argv); int bench_kallsyms_parse(int argc, const char **argv);
int bench_inject_build_id(int argc, const char **argv); int bench_inject_build_id(int argc, const char **argv);
int bench_evlist_open_close(int argc, const char **argv);
#define BENCH_FORMAT_DEFAULT_STR "default" #define BENCH_FORMAT_DEFAULT_STR "default"
#define BENCH_FORMAT_DEFAULT 0 #define BENCH_FORMAT_DEFAULT 0
......
// SPDX-License-Identifier: GPL-2.0
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include "bench.h"
#include "../util/debug.h"
#include "../util/stat.h"
#include "../util/evlist.h"
#include "../util/evsel.h"
#include "../util/strbuf.h"
#include "../util/record.h"
#include "../util/parse-events.h"
#include "internal/threadmap.h"
#include "internal/cpumap.h"
#include <linux/perf_event.h>
#include <linux/kernel.h>
#include <linux/time64.h>
#include <linux/string.h>
#include <subcmd/parse-options.h>
#define MMAP_FLUSH_DEFAULT 1
static int iterations = 100;
static int nr_events = 1;
static const char *event_string = "dummy";
static struct record_opts opts = {
.sample_time = true,
.mmap_pages = UINT_MAX,
.user_freq = UINT_MAX,
.user_interval = ULLONG_MAX,
.freq = 4000,
.target = {
.uses_mmap = true,
.default_per_cpu = true,
},
.mmap_flush = MMAP_FLUSH_DEFAULT,
.nr_threads_synthesize = 1,
.ctl_fd = -1,
.ctl_fd_ack = -1,
};
static const struct option options[] = {
OPT_STRING('e', "event", &event_string, "event", "event selector. use 'perf list' to list available events"),
OPT_INTEGER('n', "nr-events", &nr_events,
"number of dummy events to create (default 1). If used with -e, it clones those events n times (1 = no change)"),
OPT_INTEGER('i', "iterations", &iterations, "Number of iterations used to compute average (default=100)"),
OPT_BOOLEAN('a', "all-cpus", &opts.target.system_wide, "system-wide collection from all CPUs"),
OPT_STRING('C', "cpu", &opts.target.cpu_list, "cpu", "list of cpus where to open events"),
OPT_STRING('p', "pid", &opts.target.pid, "pid", "record events on existing process id"),
OPT_STRING('t', "tid", &opts.target.tid, "tid", "record events on existing thread id"),
OPT_STRING('u', "uid", &opts.target.uid_str, "user", "user to profile"),
OPT_BOOLEAN(0, "per-thread", &opts.target.per_thread, "use per-thread mmaps"),
OPT_END()
};
static const char *const bench_usage[] = {
"perf bench internals evlist-open-close <options>",
NULL
};
static int evlist__count_evsel_fds(struct evlist *evlist)
{
struct evsel *evsel;
int cnt = 0;
evlist__for_each_entry(evlist, evsel)
cnt += evsel->core.threads->nr * evsel->core.cpus->nr;
return cnt;
}
static struct evlist *bench__create_evlist(char *evstr)
{
struct parse_events_error err = { .idx = 0, };
struct evlist *evlist = evlist__new();
int ret;
if (!evlist) {
pr_err("Not enough memory to create evlist\n");
return NULL;
}
ret = parse_events(evlist, evstr, &err);
if (ret) {
parse_events_print_error(&err, evstr);
pr_err("Run 'perf list' for a list of valid events\n");
ret = 1;
goto out_delete_evlist;
}
ret = evlist__create_maps(evlist, &opts.target);
if (ret < 0) {
pr_err("Not enough memory to create thread/cpu maps\n");
goto out_delete_evlist;
}
evlist__config(evlist, &opts, NULL);
return evlist;
out_delete_evlist:
evlist__delete(evlist);
return NULL;
}
static int bench__do_evlist_open_close(struct evlist *evlist)
{
char sbuf[STRERR_BUFSIZE];
int err = evlist__open(evlist);
if (err < 0) {
pr_err("evlist__open: %s\n", str_error_r(errno, sbuf, sizeof(sbuf)));
return err;
}
err = evlist__mmap(evlist, opts.mmap_pages);
if (err < 0) {
pr_err("evlist__mmap: %s\n", str_error_r(errno, sbuf, sizeof(sbuf)));
return err;
}
evlist__enable(evlist);
evlist__disable(evlist);
evlist__munmap(evlist);
evlist__close(evlist);
return 0;
}
static int bench_evlist_open_close__run(char *evstr)
{
// used to print statistics only
struct evlist *evlist = bench__create_evlist(evstr);
double time_average, time_stddev;
struct timeval start, end, diff;
struct stats time_stats;
u64 runtime_us;
int i, err;
if (!evlist)
return -ENOMEM;
init_stats(&time_stats);
printf(" Number of cpus:\t%d\n", evlist->core.cpus->nr);
printf(" Number of threads:\t%d\n", evlist->core.threads->nr);
printf(" Number of events:\t%d (%d fds)\n",
evlist->core.nr_entries, evlist__count_evsel_fds(evlist));
printf(" Number of iterations:\t%d\n", iterations);
evlist__delete(evlist);
for (i = 0; i < iterations; i++) {
pr_debug("Started iteration %d\n", i);
evlist = bench__create_evlist(evstr);
if (!evlist)
return -ENOMEM;
gettimeofday(&start, NULL);
err = bench__do_evlist_open_close(evlist);
if (err) {
evlist__delete(evlist);
return err;
}
gettimeofday(&end, NULL);
timersub(&end, &start, &diff);
runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
update_stats(&time_stats, runtime_us);
evlist__delete(evlist);
pr_debug("Iteration %d took:\t%" PRIu64 "us\n", i, runtime_us);
}
time_average = avg_stats(&time_stats);
time_stddev = stddev_stats(&time_stats);
printf(" Average open-close took: %.3f usec (+- %.3f usec)\n", time_average, time_stddev);
return 0;
}
static char *bench__repeat_event_string(const char *evstr, int n)
{
char sbuf[STRERR_BUFSIZE];
struct strbuf buf;
int i, str_size = strlen(evstr),
final_size = str_size * n + n,
err = strbuf_init(&buf, final_size);
if (err) {
pr_err("strbuf_init: %s\n", str_error_r(err, sbuf, sizeof(sbuf)));
goto out_error;
}
for (i = 0; i < n; i++) {
err = strbuf_add(&buf, evstr, str_size);
if (err) {
pr_err("strbuf_add: %s\n", str_error_r(err, sbuf, sizeof(sbuf)));
goto out_error;
}
err = strbuf_addch(&buf, i == n-1 ? '\0' : ',');
if (err) {
pr_err("strbuf_addch: %s\n", str_error_r(err, sbuf, sizeof(sbuf)));
goto out_error;
}
}
return strbuf_detach(&buf, NULL);
out_error:
strbuf_release(&buf);
return NULL;
}
int bench_evlist_open_close(int argc, const char **argv)
{
char *evstr, errbuf[BUFSIZ];
int err;
argc = parse_options(argc, argv, options, bench_usage, 0);
if (argc) {
usage_with_options(bench_usage, options);
exit(EXIT_FAILURE);
}
err = target__validate(&opts.target);
if (err) {
target__strerror(&opts.target, err, errbuf, sizeof(errbuf));
pr_err("%s\n", errbuf);
goto out;
}
err = target__parse_uid(&opts.target);
if (err) {
target__strerror(&opts.target, err, errbuf, sizeof(errbuf));
pr_err("%s", errbuf);
goto out;
}
/* Enable ignoring missing threads when -u/-p option is defined. */
opts.ignore_missing_thread = opts.target.uid != UINT_MAX || opts.target.pid;
evstr = bench__repeat_event_string(event_string, nr_events);
if (!evstr) {
err = -ENOMEM;
goto out;
}
err = bench_evlist_open_close__run(evstr);
free(evstr);
out:
return err;
}
...@@ -20,6 +20,7 @@ ...@@ -20,6 +20,7 @@
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/zalloc.h> #include <linux/zalloc.h>
#include <sys/time.h> #include <sys/time.h>
#include <sys/mman.h>
#include <perf/cpumap.h> #include <perf/cpumap.h>
#include "../util/stat.h" #include "../util/stat.h"
...@@ -29,11 +30,7 @@ ...@@ -29,11 +30,7 @@
#include <err.h> #include <err.h>
static unsigned int nthreads = 0; static bool done = false;
static unsigned int nsecs = 10;
/* amount of futexes per thread */
static unsigned int nfutexes = 1024;
static bool fshared = false, done = false, silent = false;
static int futex_flag = 0; static int futex_flag = 0;
struct timeval bench__start, bench__end, bench__runtime; struct timeval bench__start, bench__end, bench__runtime;
...@@ -49,12 +46,18 @@ struct worker { ...@@ -49,12 +46,18 @@ struct worker {
unsigned long ops; unsigned long ops;
}; };
static struct bench_futex_parameters params = {
.nfutexes = 1024,
.runtime = 10,
};
static const struct option options[] = { static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"), OPT_UINTEGER('t', "threads", &params.nthreads, "Specify amount of threads"),
OPT_UINTEGER('r', "runtime", &nsecs, "Specify runtime (in seconds)"), OPT_UINTEGER('r', "runtime", &params.runtime, "Specify runtime (in seconds)"),
OPT_UINTEGER('f', "futexes", &nfutexes, "Specify amount of futexes per threads"), OPT_UINTEGER('f', "futexes", &params.nfutexes, "Specify amount of futexes per threads"),
OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"), OPT_BOOLEAN( 's', "silent", &params.silent, "Silent mode: do not display data/details"),
OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"), OPT_BOOLEAN( 'S', "shared", &params.fshared, "Use shared futexes instead of private ones"),
OPT_BOOLEAN( 'm', "mlockall", &params.mlockall, "Lock all current and future memory"),
OPT_END() OPT_END()
}; };
...@@ -78,7 +81,7 @@ static void *workerfn(void *arg) ...@@ -78,7 +81,7 @@ static void *workerfn(void *arg)
pthread_mutex_unlock(&thread_lock); pthread_mutex_unlock(&thread_lock);
do { do {
for (i = 0; i < nfutexes; i++, ops++) { for (i = 0; i < params.nfutexes; i++, ops++) {
/* /*
* We want the futex calls to fail in order to stress * We want the futex calls to fail in order to stress
* the hashing of uaddr and not measure other steps, * the hashing of uaddr and not measure other steps,
...@@ -86,7 +89,7 @@ static void *workerfn(void *arg) ...@@ -86,7 +89,7 @@ static void *workerfn(void *arg)
* the critical region protected by hb->lock. * the critical region protected by hb->lock.
*/ */
ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag); ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag);
if (!silent && if (!params.silent &&
(!ret || errno != EAGAIN || errno != EWOULDBLOCK)) (!ret || errno != EAGAIN || errno != EWOULDBLOCK))
warn("Non-expected futex return call"); warn("Non-expected futex return call");
} }
...@@ -112,7 +115,7 @@ static void print_summary(void) ...@@ -112,7 +115,7 @@ static void print_summary(void)
double stddev = stddev_stats(&throughput_stats); double stddev = stddev_stats(&throughput_stats);
printf("%sAveraged %ld operations/sec (+- %.2f%%), total secs = %d\n", printf("%sAveraged %ld operations/sec (+- %.2f%%), total secs = %d\n",
!silent ? "\n" : "", avg, rel_stddev_stats(stddev, avg), !params.silent ? "\n" : "", avg, rel_stddev_stats(stddev, avg),
(int)bench__runtime.tv_sec); (int)bench__runtime.tv_sec);
} }
...@@ -141,30 +144,35 @@ int bench_futex_hash(int argc, const char **argv) ...@@ -141,30 +144,35 @@ int bench_futex_hash(int argc, const char **argv)
act.sa_sigaction = toggle_done; act.sa_sigaction = toggle_done;
sigaction(SIGINT, &act, NULL); sigaction(SIGINT, &act, NULL);
if (!nthreads) /* default to the number of CPUs */ if (params.mlockall) {
nthreads = cpu->nr; if (mlockall(MCL_CURRENT | MCL_FUTURE))
err(EXIT_FAILURE, "mlockall");
}
if (!params.nthreads) /* default to the number of CPUs */
params.nthreads = cpu->nr;
worker = calloc(nthreads, sizeof(*worker)); worker = calloc(params.nthreads, sizeof(*worker));
if (!worker) if (!worker)
goto errmem; goto errmem;
if (!fshared) if (!params.fshared)
futex_flag = FUTEX_PRIVATE_FLAG; futex_flag = FUTEX_PRIVATE_FLAG;
printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n", printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n",
getpid(), nthreads, nfutexes, fshared ? "shared":"private", nsecs); getpid(), params.nthreads, params.nfutexes, params.fshared ? "shared":"private", params.runtime);
init_stats(&throughput_stats); init_stats(&throughput_stats);
pthread_mutex_init(&thread_lock, NULL); pthread_mutex_init(&thread_lock, NULL);
pthread_cond_init(&thread_parent, NULL); pthread_cond_init(&thread_parent, NULL);
pthread_cond_init(&thread_worker, NULL); pthread_cond_init(&thread_worker, NULL);
threads_starting = nthreads; threads_starting = params.nthreads;
pthread_attr_init(&thread_attr); pthread_attr_init(&thread_attr);
gettimeofday(&bench__start, NULL); gettimeofday(&bench__start, NULL);
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
worker[i].tid = i; worker[i].tid = i;
worker[i].futex = calloc(nfutexes, sizeof(*worker[i].futex)); worker[i].futex = calloc(params.nfutexes, sizeof(*worker[i].futex));
if (!worker[i].futex) if (!worker[i].futex)
goto errmem; goto errmem;
...@@ -189,10 +197,10 @@ int bench_futex_hash(int argc, const char **argv) ...@@ -189,10 +197,10 @@ int bench_futex_hash(int argc, const char **argv)
pthread_cond_broadcast(&thread_worker); pthread_cond_broadcast(&thread_worker);
pthread_mutex_unlock(&thread_lock); pthread_mutex_unlock(&thread_lock);
sleep(nsecs); sleep(params.runtime);
toggle_done(0, NULL, NULL); toggle_done(0, NULL, NULL);
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
ret = pthread_join(worker[i].thread, NULL); ret = pthread_join(worker[i].thread, NULL);
if (ret) if (ret)
err(EXIT_FAILURE, "pthread_join"); err(EXIT_FAILURE, "pthread_join");
...@@ -203,18 +211,18 @@ int bench_futex_hash(int argc, const char **argv) ...@@ -203,18 +211,18 @@ int bench_futex_hash(int argc, const char **argv)
pthread_cond_destroy(&thread_worker); pthread_cond_destroy(&thread_worker);
pthread_mutex_destroy(&thread_lock); pthread_mutex_destroy(&thread_lock);
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
unsigned long t = bench__runtime.tv_sec > 0 ? unsigned long t = bench__runtime.tv_sec > 0 ?
worker[i].ops / bench__runtime.tv_sec : 0; worker[i].ops / bench__runtime.tv_sec : 0;
update_stats(&throughput_stats, t); update_stats(&throughput_stats, t);
if (!silent) { if (!params.silent) {
if (nfutexes == 1) if (params.nfutexes == 1)
printf("[thread %2d] futex: %p [ %ld ops/sec ]\n", printf("[thread %2d] futex: %p [ %ld ops/sec ]\n",
worker[i].tid, &worker[i].futex[0], t); worker[i].tid, &worker[i].futex[0], t);
else else
printf("[thread %2d] futexes: %p ... %p [ %ld ops/sec ]\n", printf("[thread %2d] futexes: %p ... %p [ %ld ops/sec ]\n",
worker[i].tid, &worker[i].futex[0], worker[i].tid, &worker[i].futex[0],
&worker[i].futex[nfutexes-1], t); &worker[i].futex[params.nfutexes-1], t);
} }
zfree(&worker[i].futex); zfree(&worker[i].futex);
......
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
#include <err.h> #include <err.h>
#include <stdlib.h> #include <stdlib.h>
#include <sys/time.h> #include <sys/time.h>
#include <sys/mman.h>
struct worker { struct worker {
int tid; int tid;
...@@ -31,22 +32,24 @@ struct worker { ...@@ -31,22 +32,24 @@ struct worker {
static u_int32_t global_futex = 0; static u_int32_t global_futex = 0;
static struct worker *worker; static struct worker *worker;
static unsigned int nsecs = 10; static bool done = false;
static bool silent = false, multi = false;
static bool done = false, fshared = false;
static unsigned int nthreads = 0;
static int futex_flag = 0; static int futex_flag = 0;
static pthread_mutex_t thread_lock; static pthread_mutex_t thread_lock;
static unsigned int threads_starting; static unsigned int threads_starting;
static struct stats throughput_stats; static struct stats throughput_stats;
static pthread_cond_t thread_parent, thread_worker; static pthread_cond_t thread_parent, thread_worker;
static struct bench_futex_parameters params = {
.runtime = 10,
};
static const struct option options[] = { static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"), OPT_UINTEGER('t', "threads", &params.nthreads, "Specify amount of threads"),
OPT_UINTEGER('r', "runtime", &nsecs, "Specify runtime (in seconds)"), OPT_UINTEGER('r', "runtime", &params.runtime, "Specify runtime (in seconds)"),
OPT_BOOLEAN( 'M', "multi", &multi, "Use multiple futexes"), OPT_BOOLEAN( 'M', "multi", &params.multi, "Use multiple futexes"),
OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"), OPT_BOOLEAN( 's', "silent", &params.silent, "Silent mode: do not display data/details"),
OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"), OPT_BOOLEAN( 'S', "shared", &params.fshared, "Use shared futexes instead of private ones"),
OPT_BOOLEAN( 'm', "mlockall", &params.mlockall, "Lock all current and future memory"),
OPT_END() OPT_END()
}; };
...@@ -61,7 +64,7 @@ static void print_summary(void) ...@@ -61,7 +64,7 @@ static void print_summary(void)
double stddev = stddev_stats(&throughput_stats); double stddev = stddev_stats(&throughput_stats);
printf("%sAveraged %ld operations/sec (+- %.2f%%), total secs = %d\n", printf("%sAveraged %ld operations/sec (+- %.2f%%), total secs = %d\n",
!silent ? "\n" : "", avg, rel_stddev_stats(stddev, avg), !params.silent ? "\n" : "", avg, rel_stddev_stats(stddev, avg),
(int)bench__runtime.tv_sec); (int)bench__runtime.tv_sec);
} }
...@@ -93,7 +96,7 @@ static void *workerfn(void *arg) ...@@ -93,7 +96,7 @@ static void *workerfn(void *arg)
ret = futex_lock_pi(w->futex, NULL, futex_flag); ret = futex_lock_pi(w->futex, NULL, futex_flag);
if (ret) { /* handle lock acquisition */ if (ret) { /* handle lock acquisition */
if (!silent) if (!params.silent)
warn("thread %d: Could not lock pi-lock for %p (%d)", warn("thread %d: Could not lock pi-lock for %p (%d)",
w->tid, w->futex, ret); w->tid, w->futex, ret);
if (done) if (done)
...@@ -104,7 +107,7 @@ static void *workerfn(void *arg) ...@@ -104,7 +107,7 @@ static void *workerfn(void *arg)
usleep(1); usleep(1);
ret = futex_unlock_pi(w->futex, futex_flag); ret = futex_unlock_pi(w->futex, futex_flag);
if (ret && !silent) if (ret && !params.silent)
warn("thread %d: Could not unlock pi-lock for %p (%d)", warn("thread %d: Could not unlock pi-lock for %p (%d)",
w->tid, w->futex, ret); w->tid, w->futex, ret);
ops++; /* account for thread's share of work */ ops++; /* account for thread's share of work */
...@@ -120,12 +123,12 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr, ...@@ -120,12 +123,12 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr,
cpu_set_t cpuset; cpu_set_t cpuset;
unsigned int i; unsigned int i;
threads_starting = nthreads; threads_starting = params.nthreads;
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
worker[i].tid = i; worker[i].tid = i;
if (multi) { if (params.multi) {
worker[i].futex = calloc(1, sizeof(u_int32_t)); worker[i].futex = calloc(1, sizeof(u_int32_t));
if (!worker[i].futex) if (!worker[i].futex)
err(EXIT_FAILURE, "calloc"); err(EXIT_FAILURE, "calloc");
...@@ -164,25 +167,30 @@ int bench_futex_lock_pi(int argc, const char **argv) ...@@ -164,25 +167,30 @@ int bench_futex_lock_pi(int argc, const char **argv)
act.sa_sigaction = toggle_done; act.sa_sigaction = toggle_done;
sigaction(SIGINT, &act, NULL); sigaction(SIGINT, &act, NULL);
if (!nthreads) if (params.mlockall) {
nthreads = cpu->nr; if (mlockall(MCL_CURRENT | MCL_FUTURE))
err(EXIT_FAILURE, "mlockall");
}
if (!params.nthreads)
params.nthreads = cpu->nr;
worker = calloc(nthreads, sizeof(*worker)); worker = calloc(params.nthreads, sizeof(*worker));
if (!worker) if (!worker)
err(EXIT_FAILURE, "calloc"); err(EXIT_FAILURE, "calloc");
if (!fshared) if (!params.fshared)
futex_flag = FUTEX_PRIVATE_FLAG; futex_flag = FUTEX_PRIVATE_FLAG;
printf("Run summary [PID %d]: %d threads doing pi lock/unlock pairing for %d secs.\n\n", printf("Run summary [PID %d]: %d threads doing pi lock/unlock pairing for %d secs.\n\n",
getpid(), nthreads, nsecs); getpid(), params.nthreads, params.runtime);
init_stats(&throughput_stats); init_stats(&throughput_stats);
pthread_mutex_init(&thread_lock, NULL); pthread_mutex_init(&thread_lock, NULL);
pthread_cond_init(&thread_parent, NULL); pthread_cond_init(&thread_parent, NULL);
pthread_cond_init(&thread_worker, NULL); pthread_cond_init(&thread_worker, NULL);
threads_starting = nthreads; threads_starting = params.nthreads;
pthread_attr_init(&thread_attr); pthread_attr_init(&thread_attr);
gettimeofday(&bench__start, NULL); gettimeofday(&bench__start, NULL);
...@@ -195,10 +203,10 @@ int bench_futex_lock_pi(int argc, const char **argv) ...@@ -195,10 +203,10 @@ int bench_futex_lock_pi(int argc, const char **argv)
pthread_cond_broadcast(&thread_worker); pthread_cond_broadcast(&thread_worker);
pthread_mutex_unlock(&thread_lock); pthread_mutex_unlock(&thread_lock);
sleep(nsecs); sleep(params.runtime);
toggle_done(0, NULL, NULL); toggle_done(0, NULL, NULL);
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
ret = pthread_join(worker[i].thread, NULL); ret = pthread_join(worker[i].thread, NULL);
if (ret) if (ret)
err(EXIT_FAILURE, "pthread_join"); err(EXIT_FAILURE, "pthread_join");
...@@ -209,16 +217,16 @@ int bench_futex_lock_pi(int argc, const char **argv) ...@@ -209,16 +217,16 @@ int bench_futex_lock_pi(int argc, const char **argv)
pthread_cond_destroy(&thread_worker); pthread_cond_destroy(&thread_worker);
pthread_mutex_destroy(&thread_lock); pthread_mutex_destroy(&thread_lock);
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
unsigned long t = bench__runtime.tv_sec > 0 ? unsigned long t = bench__runtime.tv_sec > 0 ?
worker[i].ops / bench__runtime.tv_sec : 0; worker[i].ops / bench__runtime.tv_sec : 0;
update_stats(&throughput_stats, t); update_stats(&throughput_stats, t);
if (!silent) if (!params.silent)
printf("[thread %3d] futex: %p [ %ld ops/sec ]\n", printf("[thread %3d] futex: %p [ %ld ops/sec ]\n",
worker[i].tid, worker[i].futex, t); worker[i].tid, worker[i].futex, t);
if (multi) if (params.multi)
zfree(&worker[i].futex); zfree(&worker[i].futex);
} }
......
...@@ -6,7 +6,8 @@ ...@@ -6,7 +6,8 @@
* on futex2, N at a time. * on futex2, N at a time.
* *
* This program is particularly useful to measure the latency of nthread * This program is particularly useful to measure the latency of nthread
* requeues without waking up any tasks -- thus mimicking a regular futex_wait. * requeues without waking up any tasks (in the non-pi case) -- thus
* mimicking a regular futex_wait.
*/ */
/* For the CLR_() macros */ /* For the CLR_() macros */
...@@ -27,28 +28,35 @@ ...@@ -27,28 +28,35 @@
#include <err.h> #include <err.h>
#include <stdlib.h> #include <stdlib.h>
#include <sys/time.h> #include <sys/time.h>
#include <sys/mman.h>
static u_int32_t futex1 = 0, futex2 = 0; static u_int32_t futex1 = 0, futex2 = 0;
/*
* How many tasks to requeue at a time.
* Default to 1 in order to make the kernel work more.
*/
static unsigned int nrequeue = 1;
static pthread_t *worker; static pthread_t *worker;
static bool done = false, silent = false, fshared = false; static bool done = false;
static pthread_mutex_t thread_lock; static pthread_mutex_t thread_lock;
static pthread_cond_t thread_parent, thread_worker; static pthread_cond_t thread_parent, thread_worker;
static struct stats requeuetime_stats, requeued_stats; static struct stats requeuetime_stats, requeued_stats;
static unsigned int threads_starting, nthreads = 0; static unsigned int threads_starting;
static int futex_flag = 0; static int futex_flag = 0;
static struct bench_futex_parameters params = {
/*
* How many tasks to requeue at a time.
* Default to 1 in order to make the kernel work more.
*/
.nrequeue = 1,
};
static const struct option options[] = { static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"), OPT_UINTEGER('t', "threads", &params.nthreads, "Specify amount of threads"),
OPT_UINTEGER('q', "nrequeue", &nrequeue, "Specify amount of threads to requeue at once"), OPT_UINTEGER('q', "nrequeue", &params.nrequeue, "Specify amount of threads to requeue at once"),
OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"), OPT_BOOLEAN( 's', "silent", &params.silent, "Silent mode: do not display data/details"),
OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"), OPT_BOOLEAN( 'S', "shared", &params.fshared, "Use shared futexes instead of private ones"),
OPT_BOOLEAN( 'm', "mlockall", &params.mlockall, "Lock all current and future memory"),
OPT_BOOLEAN( 'B', "broadcast", &params.broadcast, "Requeue all threads at once"),
OPT_BOOLEAN( 'p', "pi", &params.pi, "Use PI-aware variants of FUTEX_CMP_REQUEUE"),
OPT_END() OPT_END()
}; };
...@@ -65,13 +73,15 @@ static void print_summary(void) ...@@ -65,13 +73,15 @@ static void print_summary(void)
printf("Requeued %d of %d threads in %.4f ms (+-%.2f%%)\n", printf("Requeued %d of %d threads in %.4f ms (+-%.2f%%)\n",
requeued_avg, requeued_avg,
nthreads, params.nthreads,
requeuetime_avg / USEC_PER_MSEC, requeuetime_avg / USEC_PER_MSEC,
rel_stddev_stats(requeuetime_stddev, requeuetime_avg)); rel_stddev_stats(requeuetime_stddev, requeuetime_avg));
} }
static void *workerfn(void *arg __maybe_unused) static void *workerfn(void *arg __maybe_unused)
{ {
int ret;
pthread_mutex_lock(&thread_lock); pthread_mutex_lock(&thread_lock);
threads_starting--; threads_starting--;
if (!threads_starting) if (!threads_starting)
...@@ -79,7 +89,34 @@ static void *workerfn(void *arg __maybe_unused) ...@@ -79,7 +89,34 @@ static void *workerfn(void *arg __maybe_unused)
pthread_cond_wait(&thread_worker, &thread_lock); pthread_cond_wait(&thread_worker, &thread_lock);
pthread_mutex_unlock(&thread_lock); pthread_mutex_unlock(&thread_lock);
futex_wait(&futex1, 0, NULL, futex_flag); while (1) {
if (!params.pi) {
ret = futex_wait(&futex1, 0, NULL, futex_flag);
if (!ret)
break;
if (ret && errno != EAGAIN) {
if (!params.silent)
warnx("futex_wait");
break;
}
} else {
ret = futex_wait_requeue_pi(&futex1, 0, &futex2,
NULL, futex_flag);
if (!ret) {
/* got the lock at futex2 */
futex_unlock_pi(&futex2, futex_flag);
break;
}
if (ret && errno != EAGAIN) {
if (!params.silent)
warnx("futex_wait_requeue_pi");
break;
}
}
}
return NULL; return NULL;
} }
...@@ -89,10 +126,10 @@ static void block_threads(pthread_t *w, ...@@ -89,10 +126,10 @@ static void block_threads(pthread_t *w,
cpu_set_t cpuset; cpu_set_t cpuset;
unsigned int i; unsigned int i;
threads_starting = nthreads; threads_starting = params.nthreads;
/* create and block all threads */ /* create and block all threads */
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
CPU_ZERO(&cpuset); CPU_ZERO(&cpuset);
CPU_SET(cpu->map[i % cpu->nr], &cpuset); CPU_SET(cpu->map[i % cpu->nr], &cpuset);
...@@ -132,22 +169,31 @@ int bench_futex_requeue(int argc, const char **argv) ...@@ -132,22 +169,31 @@ int bench_futex_requeue(int argc, const char **argv)
act.sa_sigaction = toggle_done; act.sa_sigaction = toggle_done;
sigaction(SIGINT, &act, NULL); sigaction(SIGINT, &act, NULL);
if (!nthreads) if (params.mlockall) {
nthreads = cpu->nr; if (mlockall(MCL_CURRENT | MCL_FUTURE))
err(EXIT_FAILURE, "mlockall");
}
if (!params.nthreads)
params.nthreads = cpu->nr;
worker = calloc(nthreads, sizeof(*worker)); worker = calloc(params.nthreads, sizeof(*worker));
if (!worker) if (!worker)
err(EXIT_FAILURE, "calloc"); err(EXIT_FAILURE, "calloc");
if (!fshared) if (!params.fshared)
futex_flag = FUTEX_PRIVATE_FLAG; futex_flag = FUTEX_PRIVATE_FLAG;
if (nrequeue > nthreads) if (params.nrequeue > params.nthreads)
nrequeue = nthreads; params.nrequeue = params.nthreads;
printf("Run summary [PID %d]: Requeuing %d threads (from [%s] %p to %p), " if (params.broadcast)
"%d at a time.\n\n", getpid(), nthreads, params.nrequeue = params.nthreads;
fshared ? "shared":"private", &futex1, &futex2, nrequeue);
printf("Run summary [PID %d]: Requeuing %d threads (from [%s] %p to %s%p), "
"%d at a time.\n\n", getpid(), params.nthreads,
params.fshared ? "shared":"private", &futex1,
params.pi ? "PI ": "", &futex2, params.nrequeue);
init_stats(&requeued_stats); init_stats(&requeued_stats);
init_stats(&requeuetime_stats); init_stats(&requeuetime_stats);
...@@ -157,7 +203,7 @@ int bench_futex_requeue(int argc, const char **argv) ...@@ -157,7 +203,7 @@ int bench_futex_requeue(int argc, const char **argv)
pthread_cond_init(&thread_worker, NULL); pthread_cond_init(&thread_worker, NULL);
for (j = 0; j < bench_repeat && !done; j++) { for (j = 0; j < bench_repeat && !done; j++) {
unsigned int nrequeued = 0; unsigned int nrequeued = 0, wakeups = 0;
struct timeval start, end, runtime; struct timeval start, end, runtime;
/* create, launch & block all threads */ /* create, launch & block all threads */
...@@ -174,13 +220,31 @@ int bench_futex_requeue(int argc, const char **argv) ...@@ -174,13 +220,31 @@ int bench_futex_requeue(int argc, const char **argv)
/* Ok, all threads are patiently blocked, start requeueing */ /* Ok, all threads are patiently blocked, start requeueing */
gettimeofday(&start, NULL); gettimeofday(&start, NULL);
while (nrequeued < nthreads) { while (nrequeued < params.nthreads) {
int r;
/* /*
* Do not wakeup any tasks blocked on futex1, allowing * For the regular non-pi case, do not wakeup any tasks
* us to really measure futex_wait functionality. * blocked on futex1, allowing us to really measure
* futex_wait functionality. For the PI case the first
* waiter is always awoken.
*/ */
nrequeued += futex_cmp_requeue(&futex1, 0, &futex2, 0, if (!params.pi) {
nrequeue, futex_flag); r = futex_cmp_requeue(&futex1, 0, &futex2, 0,
params.nrequeue,
futex_flag);
} else {
r = futex_cmp_requeue_pi(&futex1, 0, &futex2,
params.nrequeue,
futex_flag);
wakeups++; /* assume no error */
}
if (r < 0)
err(EXIT_FAILURE, "couldn't requeue from %p to %p",
&futex1, &futex2);
nrequeued += r;
} }
gettimeofday(&end, NULL); gettimeofday(&end, NULL);
...@@ -189,17 +253,32 @@ int bench_futex_requeue(int argc, const char **argv) ...@@ -189,17 +253,32 @@ int bench_futex_requeue(int argc, const char **argv)
update_stats(&requeued_stats, nrequeued); update_stats(&requeued_stats, nrequeued);
update_stats(&requeuetime_stats, runtime.tv_usec); update_stats(&requeuetime_stats, runtime.tv_usec);
if (!silent) { if (!params.silent) {
printf("[Run %d]: Requeued %d of %d threads in %.4f ms\n", if (!params.pi)
j + 1, nrequeued, nthreads, runtime.tv_usec / (double)USEC_PER_MSEC); printf("[Run %d]: Requeued %d of %d threads in "
"%.4f ms\n", j + 1, nrequeued,
params.nthreads,
runtime.tv_usec / (double)USEC_PER_MSEC);
else {
nrequeued -= wakeups;
printf("[Run %d]: Awoke and Requeued (%d+%d) of "
"%d threads in %.4f ms\n",
j + 1, wakeups, nrequeued,
params.nthreads,
runtime.tv_usec / (double)USEC_PER_MSEC);
}
} }
/* everybody should be blocked on futex2, wake'em up */ if (!params.pi) {
nrequeued = futex_wake(&futex2, nrequeued, futex_flag); /* everybody should be blocked on futex2, wake'em up */
if (nthreads != nrequeued) nrequeued = futex_wake(&futex2, nrequeued, futex_flag);
warnx("couldn't wakeup all tasks (%d/%d)", nrequeued, nthreads); if (params.nthreads != nrequeued)
warnx("couldn't wakeup all tasks (%d/%d)",
nrequeued, params.nthreads);
}
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
ret = pthread_join(worker[i], NULL); ret = pthread_join(worker[i], NULL);
if (ret) if (ret)
err(EXIT_FAILURE, "pthread_join"); err(EXIT_FAILURE, "pthread_join");
......
...@@ -34,6 +34,7 @@ int bench_futex_wake_parallel(int argc __maybe_unused, const char **argv __maybe ...@@ -34,6 +34,7 @@ int bench_futex_wake_parallel(int argc __maybe_unused, const char **argv __maybe
#include <err.h> #include <err.h>
#include <stdlib.h> #include <stdlib.h>
#include <sys/time.h> #include <sys/time.h>
#include <sys/mman.h>
struct thread_data { struct thread_data {
pthread_t worker; pthread_t worker;
...@@ -47,8 +48,7 @@ static unsigned int nwakes = 1; ...@@ -47,8 +48,7 @@ static unsigned int nwakes = 1;
static u_int32_t futex = 0; static u_int32_t futex = 0;
static pthread_t *blocked_worker; static pthread_t *blocked_worker;
static bool done = false, silent = false, fshared = false; static bool done = false;
static unsigned int nblocked_threads = 0, nwaking_threads = 0;
static pthread_mutex_t thread_lock; static pthread_mutex_t thread_lock;
static pthread_cond_t thread_parent, thread_worker; static pthread_cond_t thread_parent, thread_worker;
static pthread_barrier_t barrier; static pthread_barrier_t barrier;
...@@ -56,11 +56,15 @@ static struct stats waketime_stats, wakeup_stats; ...@@ -56,11 +56,15 @@ static struct stats waketime_stats, wakeup_stats;
static unsigned int threads_starting; static unsigned int threads_starting;
static int futex_flag = 0; static int futex_flag = 0;
static struct bench_futex_parameters params;
static const struct option options[] = { static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nblocked_threads, "Specify amount of threads"), OPT_UINTEGER('t', "threads", &params.nthreads, "Specify amount of threads"),
OPT_UINTEGER('w', "nwakers", &nwaking_threads, "Specify amount of waking threads"), OPT_UINTEGER('w', "nwakers", &params.nwakes, "Specify amount of waking threads"),
OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"), OPT_BOOLEAN( 's', "silent", &params.silent, "Silent mode: do not display data/details"),
OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"), OPT_BOOLEAN( 'S', "shared", &params.fshared, "Use shared futexes instead of private ones"),
OPT_BOOLEAN( 'm', "mlockall", &params.mlockall, "Lock all current and future memory"),
OPT_END() OPT_END()
}; };
...@@ -96,10 +100,10 @@ static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr) ...@@ -96,10 +100,10 @@ static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr)
pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_JOINABLE); pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_JOINABLE);
pthread_barrier_init(&barrier, NULL, nwaking_threads + 1); pthread_barrier_init(&barrier, NULL, params.nwakes + 1);
/* create and block all threads */ /* create and block all threads */
for (i = 0; i < nwaking_threads; i++) { for (i = 0; i < params.nwakes; i++) {
/* /*
* Thread creation order will impact per-thread latency * Thread creation order will impact per-thread latency
* as it will affect the order to acquire the hb spinlock. * as it will affect the order to acquire the hb spinlock.
...@@ -112,7 +116,7 @@ static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr) ...@@ -112,7 +116,7 @@ static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr)
pthread_barrier_wait(&barrier); pthread_barrier_wait(&barrier);
for (i = 0; i < nwaking_threads; i++) for (i = 0; i < params.nwakes; i++)
if (pthread_join(td[i].worker, NULL)) if (pthread_join(td[i].worker, NULL))
err(EXIT_FAILURE, "pthread_join"); err(EXIT_FAILURE, "pthread_join");
...@@ -143,10 +147,10 @@ static void block_threads(pthread_t *w, pthread_attr_t thread_attr, ...@@ -143,10 +147,10 @@ static void block_threads(pthread_t *w, pthread_attr_t thread_attr,
cpu_set_t cpuset; cpu_set_t cpuset;
unsigned int i; unsigned int i;
threads_starting = nblocked_threads; threads_starting = params.nthreads;
/* create and block all threads */ /* create and block all threads */
for (i = 0; i < nblocked_threads; i++) { for (i = 0; i < params.nthreads; i++) {
CPU_ZERO(&cpuset); CPU_ZERO(&cpuset);
CPU_SET(cpu->map[i % cpu->nr], &cpuset); CPU_SET(cpu->map[i % cpu->nr], &cpuset);
...@@ -167,7 +171,7 @@ static void print_run(struct thread_data *waking_worker, unsigned int run_num) ...@@ -167,7 +171,7 @@ static void print_run(struct thread_data *waking_worker, unsigned int run_num)
init_stats(&__wakeup_stats); init_stats(&__wakeup_stats);
init_stats(&__waketime_stats); init_stats(&__waketime_stats);
for (i = 0; i < nwaking_threads; i++) { for (i = 0; i < params.nwakes; i++) {
update_stats(&__waketime_stats, waking_worker[i].runtime.tv_usec); update_stats(&__waketime_stats, waking_worker[i].runtime.tv_usec);
update_stats(&__wakeup_stats, waking_worker[i].nwoken); update_stats(&__wakeup_stats, waking_worker[i].nwoken);
} }
...@@ -178,7 +182,7 @@ static void print_run(struct thread_data *waking_worker, unsigned int run_num) ...@@ -178,7 +182,7 @@ static void print_run(struct thread_data *waking_worker, unsigned int run_num)
printf("[Run %d]: Avg per-thread latency (waking %d/%d threads) " printf("[Run %d]: Avg per-thread latency (waking %d/%d threads) "
"in %.4f ms (+-%.2f%%)\n", run_num + 1, wakeup_avg, "in %.4f ms (+-%.2f%%)\n", run_num + 1, wakeup_avg,
nblocked_threads, waketime_avg / USEC_PER_MSEC, params.nthreads, waketime_avg / USEC_PER_MSEC,
rel_stddev_stats(waketime_stddev, waketime_avg)); rel_stddev_stats(waketime_stddev, waketime_avg));
} }
...@@ -193,7 +197,7 @@ static void print_summary(void) ...@@ -193,7 +197,7 @@ static void print_summary(void)
printf("Avg per-thread latency (waking %d/%d threads) in %.4f ms (+-%.2f%%)\n", printf("Avg per-thread latency (waking %d/%d threads) in %.4f ms (+-%.2f%%)\n",
wakeup_avg, wakeup_avg,
nblocked_threads, params.nthreads,
waketime_avg / USEC_PER_MSEC, waketime_avg / USEC_PER_MSEC,
rel_stddev_stats(waketime_stddev, waketime_avg)); rel_stddev_stats(waketime_stddev, waketime_avg));
} }
...@@ -203,7 +207,7 @@ static void do_run_stats(struct thread_data *waking_worker) ...@@ -203,7 +207,7 @@ static void do_run_stats(struct thread_data *waking_worker)
{ {
unsigned int i; unsigned int i;
for (i = 0; i < nwaking_threads; i++) { for (i = 0; i < params.nwakes; i++) {
update_stats(&waketime_stats, waking_worker[i].runtime.tv_usec); update_stats(&waketime_stats, waking_worker[i].runtime.tv_usec);
update_stats(&wakeup_stats, waking_worker[i].nwoken); update_stats(&wakeup_stats, waking_worker[i].nwoken);
} }
...@@ -238,36 +242,42 @@ int bench_futex_wake_parallel(int argc, const char **argv) ...@@ -238,36 +242,42 @@ int bench_futex_wake_parallel(int argc, const char **argv)
act.sa_sigaction = toggle_done; act.sa_sigaction = toggle_done;
sigaction(SIGINT, &act, NULL); sigaction(SIGINT, &act, NULL);
if (params.mlockall) {
if (mlockall(MCL_CURRENT | MCL_FUTURE))
err(EXIT_FAILURE, "mlockall");
}
cpu = perf_cpu_map__new(NULL); cpu = perf_cpu_map__new(NULL);
if (!cpu) if (!cpu)
err(EXIT_FAILURE, "calloc"); err(EXIT_FAILURE, "calloc");
if (!nblocked_threads) if (!params.nthreads)
nblocked_threads = cpu->nr; params.nthreads = cpu->nr;
/* some sanity checks */ /* some sanity checks */
if (nwaking_threads > nblocked_threads || !nwaking_threads) if (params.nwakes > params.nthreads ||
nwaking_threads = nblocked_threads; !params.nwakes)
params.nwakes = params.nthreads;
if (nblocked_threads % nwaking_threads) if (params.nthreads % params.nwakes)
errx(EXIT_FAILURE, "Must be perfectly divisible"); errx(EXIT_FAILURE, "Must be perfectly divisible");
/* /*
* Each thread will wakeup nwakes tasks in * Each thread will wakeup nwakes tasks in
* a single futex_wait call. * a single futex_wait call.
*/ */
nwakes = nblocked_threads/nwaking_threads; nwakes = params.nthreads/params.nwakes;
blocked_worker = calloc(nblocked_threads, sizeof(*blocked_worker)); blocked_worker = calloc(params.nthreads, sizeof(*blocked_worker));
if (!blocked_worker) if (!blocked_worker)
err(EXIT_FAILURE, "calloc"); err(EXIT_FAILURE, "calloc");
if (!fshared) if (!params.fshared)
futex_flag = FUTEX_PRIVATE_FLAG; futex_flag = FUTEX_PRIVATE_FLAG;
printf("Run summary [PID %d]: blocking on %d threads (at [%s] " printf("Run summary [PID %d]: blocking on %d threads (at [%s] "
"futex %p), %d threads waking up %d at a time.\n\n", "futex %p), %d threads waking up %d at a time.\n\n",
getpid(), nblocked_threads, fshared ? "shared":"private", getpid(), params.nthreads, params.fshared ? "shared":"private",
&futex, nwaking_threads, nwakes); &futex, params.nwakes, nwakes);
init_stats(&wakeup_stats); init_stats(&wakeup_stats);
init_stats(&waketime_stats); init_stats(&waketime_stats);
...@@ -278,7 +288,7 @@ int bench_futex_wake_parallel(int argc, const char **argv) ...@@ -278,7 +288,7 @@ int bench_futex_wake_parallel(int argc, const char **argv)
pthread_cond_init(&thread_worker, NULL); pthread_cond_init(&thread_worker, NULL);
for (j = 0; j < bench_repeat && !done; j++) { for (j = 0; j < bench_repeat && !done; j++) {
waking_worker = calloc(nwaking_threads, sizeof(*waking_worker)); waking_worker = calloc(params.nwakes, sizeof(*waking_worker));
if (!waking_worker) if (!waking_worker)
err(EXIT_FAILURE, "calloc"); err(EXIT_FAILURE, "calloc");
...@@ -297,14 +307,14 @@ int bench_futex_wake_parallel(int argc, const char **argv) ...@@ -297,14 +307,14 @@ int bench_futex_wake_parallel(int argc, const char **argv)
/* Ok, all threads are patiently blocked, start waking folks up */ /* Ok, all threads are patiently blocked, start waking folks up */
wakeup_threads(waking_worker, thread_attr); wakeup_threads(waking_worker, thread_attr);
for (i = 0; i < nblocked_threads; i++) { for (i = 0; i < params.nthreads; i++) {
ret = pthread_join(blocked_worker[i], NULL); ret = pthread_join(blocked_worker[i], NULL);
if (ret) if (ret)
err(EXIT_FAILURE, "pthread_join"); err(EXIT_FAILURE, "pthread_join");
} }
do_run_stats(waking_worker); do_run_stats(waking_worker);
if (!silent) if (!params.silent)
print_run(waking_worker, j); print_run(waking_worker, j);
free(waking_worker); free(waking_worker);
......
...@@ -27,29 +27,34 @@ ...@@ -27,29 +27,34 @@
#include <err.h> #include <err.h>
#include <stdlib.h> #include <stdlib.h>
#include <sys/time.h> #include <sys/time.h>
#include <sys/mman.h>
/* all threads will block on the same futex */ /* all threads will block on the same futex */
static u_int32_t futex1 = 0; static u_int32_t futex1 = 0;
/* static pthread_t *worker;
* How many wakeups to do at a time. static bool done = false;
* Default to 1 in order to make the kernel work more.
*/
static unsigned int nwakes = 1;
pthread_t *worker;
static bool done = false, silent = false, fshared = false;
static pthread_mutex_t thread_lock; static pthread_mutex_t thread_lock;
static pthread_cond_t thread_parent, thread_worker; static pthread_cond_t thread_parent, thread_worker;
static struct stats waketime_stats, wakeup_stats; static struct stats waketime_stats, wakeup_stats;
static unsigned int threads_starting, nthreads = 0; static unsigned int threads_starting;
static int futex_flag = 0; static int futex_flag = 0;
static struct bench_futex_parameters params = {
/*
* How many wakeups to do at a time.
* Default to 1 in order to make the kernel work more.
*/
.nwakes = 1,
};
static const struct option options[] = { static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"), OPT_UINTEGER('t', "threads", &params.nthreads, "Specify amount of threads"),
OPT_UINTEGER('w', "nwakes", &nwakes, "Specify amount of threads to wake at once"), OPT_UINTEGER('w', "nwakes", &params.nwakes, "Specify amount of threads to wake at once"),
OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"), OPT_BOOLEAN( 's', "silent", &params.silent, "Silent mode: do not display data/details"),
OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"), OPT_BOOLEAN( 'S', "shared", &params.fshared, "Use shared futexes instead of private ones"),
OPT_BOOLEAN( 'm', "mlockall", &params.mlockall, "Lock all current and future memory"),
OPT_END() OPT_END()
}; };
...@@ -84,7 +89,7 @@ static void print_summary(void) ...@@ -84,7 +89,7 @@ static void print_summary(void)
printf("Wokeup %d of %d threads in %.4f ms (+-%.2f%%)\n", printf("Wokeup %d of %d threads in %.4f ms (+-%.2f%%)\n",
wakeup_avg, wakeup_avg,
nthreads, params.nthreads,
waketime_avg / USEC_PER_MSEC, waketime_avg / USEC_PER_MSEC,
rel_stddev_stats(waketime_stddev, waketime_avg)); rel_stddev_stats(waketime_stddev, waketime_avg));
} }
...@@ -95,10 +100,10 @@ static void block_threads(pthread_t *w, ...@@ -95,10 +100,10 @@ static void block_threads(pthread_t *w,
cpu_set_t cpuset; cpu_set_t cpuset;
unsigned int i; unsigned int i;
threads_starting = nthreads; threads_starting = params.nthreads;
/* create and block all threads */ /* create and block all threads */
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
CPU_ZERO(&cpuset); CPU_ZERO(&cpuset);
CPU_SET(cpu->map[i % cpu->nr], &cpuset); CPU_SET(cpu->map[i % cpu->nr], &cpuset);
...@@ -140,19 +145,25 @@ int bench_futex_wake(int argc, const char **argv) ...@@ -140,19 +145,25 @@ int bench_futex_wake(int argc, const char **argv)
act.sa_sigaction = toggle_done; act.sa_sigaction = toggle_done;
sigaction(SIGINT, &act, NULL); sigaction(SIGINT, &act, NULL);
if (!nthreads) if (params.mlockall) {
nthreads = cpu->nr; if (mlockall(MCL_CURRENT | MCL_FUTURE))
err(EXIT_FAILURE, "mlockall");
}
if (!params.nthreads)
params.nthreads = cpu->nr;
worker = calloc(nthreads, sizeof(*worker)); worker = calloc(params.nthreads, sizeof(*worker));
if (!worker) if (!worker)
err(EXIT_FAILURE, "calloc"); err(EXIT_FAILURE, "calloc");
if (!fshared) if (!params.fshared)
futex_flag = FUTEX_PRIVATE_FLAG; futex_flag = FUTEX_PRIVATE_FLAG;
printf("Run summary [PID %d]: blocking on %d threads (at [%s] futex %p), " printf("Run summary [PID %d]: blocking on %d threads (at [%s] futex %p), "
"waking up %d at a time.\n\n", "waking up %d at a time.\n\n",
getpid(), nthreads, fshared ? "shared":"private", &futex1, nwakes); getpid(), params.nthreads, params.fshared ? "shared":"private",
&futex1, params.nwakes);
init_stats(&wakeup_stats); init_stats(&wakeup_stats);
init_stats(&waketime_stats); init_stats(&waketime_stats);
...@@ -179,20 +190,22 @@ int bench_futex_wake(int argc, const char **argv) ...@@ -179,20 +190,22 @@ int bench_futex_wake(int argc, const char **argv)
/* Ok, all threads are patiently blocked, start waking folks up */ /* Ok, all threads are patiently blocked, start waking folks up */
gettimeofday(&start, NULL); gettimeofday(&start, NULL);
while (nwoken != nthreads) while (nwoken != params.nthreads)
nwoken += futex_wake(&futex1, nwakes, futex_flag); nwoken += futex_wake(&futex1,
params.nwakes, futex_flag);
gettimeofday(&end, NULL); gettimeofday(&end, NULL);
timersub(&end, &start, &runtime); timersub(&end, &start, &runtime);
update_stats(&wakeup_stats, nwoken); update_stats(&wakeup_stats, nwoken);
update_stats(&waketime_stats, runtime.tv_usec); update_stats(&waketime_stats, runtime.tv_usec);
if (!silent) { if (!params.silent) {
printf("[Run %d]: Wokeup %d of %d threads in %.4f ms\n", printf("[Run %d]: Wokeup %d of %d threads in %.4f ms\n",
j + 1, nwoken, nthreads, runtime.tv_usec / (double)USEC_PER_MSEC); j + 1, nwoken, params.nthreads,
runtime.tv_usec / (double)USEC_PER_MSEC);
} }
for (i = 0; i < nthreads; i++) { for (i = 0; i < params.nthreads; i++) {
ret = pthread_join(worker[i], NULL); ret = pthread_join(worker[i], NULL);
if (ret) if (ret)
err(EXIT_FAILURE, "pthread_join"); err(EXIT_FAILURE, "pthread_join");
......
...@@ -13,6 +13,20 @@ ...@@ -13,6 +13,20 @@
#include <sys/types.h> #include <sys/types.h>
#include <linux/futex.h> #include <linux/futex.h>
struct bench_futex_parameters {
bool silent;
bool fshared;
bool mlockall;
bool multi; /* lock-pi */
bool pi; /* requeue-pi */
bool broadcast; /* requeue */
unsigned int runtime; /* seconds*/
unsigned int nthreads;
unsigned int nfutexes;
unsigned int nwakes;
unsigned int nrequeue;
};
/** /**
* futex() - SYS_futex syscall wrapper * futex() - SYS_futex syscall wrapper
* @uaddr: address of first futex * @uaddr: address of first futex
...@@ -20,7 +34,7 @@ ...@@ -20,7 +34,7 @@
* @val: typically expected value of uaddr, but varies by op * @val: typically expected value of uaddr, but varies by op
* @timeout: typically an absolute struct timespec (except where noted * @timeout: typically an absolute struct timespec (except where noted
* otherwise). Overloaded by some ops * otherwise). Overloaded by some ops
* @uaddr2: address of second futex for some ops\ * @uaddr2: address of second futex for some ops
* @val3: varies by op * @val3: varies by op
* @opflags: flags to be bitwise OR'd with op, such as FUTEX_PRIVATE_FLAG * @opflags: flags to be bitwise OR'd with op, such as FUTEX_PRIVATE_FLAG
* *
...@@ -77,7 +91,7 @@ futex_unlock_pi(u_int32_t *uaddr, int opflags) ...@@ -77,7 +91,7 @@ futex_unlock_pi(u_int32_t *uaddr, int opflags)
/** /**
* futex_cmp_requeue() - requeue tasks from uaddr to uaddr2 * futex_cmp_requeue() - requeue tasks from uaddr to uaddr2
* @nr_wake: wake up to this many tasks * @nr_wake: wake up to this many tasks
* @nr_requeue: requeue up to this many tasks * @nr_requeue: requeue up to this many tasks
*/ */
static inline int static inline int
futex_cmp_requeue(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2, int nr_wake, futex_cmp_requeue(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2, int nr_wake,
...@@ -86,4 +100,38 @@ futex_cmp_requeue(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2, int nr_wak ...@@ -86,4 +100,38 @@ futex_cmp_requeue(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2, int nr_wak
return futex(uaddr, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, uaddr2, return futex(uaddr, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, uaddr2,
val, opflags); val, opflags);
} }
/**
* futex_wait_requeue_pi() - block on uaddr and prepare to requeue to uaddr2
* @uaddr: non-PI futex source
* @uaddr2: PI futex target
*
* This is the first half of the requeue_pi mechanism. It shall always be
* paired with futex_cmp_requeue_pi().
*/
static inline int
futex_wait_requeue_pi(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2,
struct timespec *timeout, int opflags)
{
return futex(uaddr, FUTEX_WAIT_REQUEUE_PI, val, timeout, uaddr2, 0,
opflags);
}
/**
* futex_cmp_requeue_pi() - requeue tasks from uaddr to uaddr2
* @uaddr: non-PI futex source
* @uaddr2: PI futex target
* @nr_requeue: requeue up to this many tasks
*
* This is the second half of the requeue_pi mechanism. It shall always be
* paired with futex_wait_requeue_pi(). The first waker is always awoken.
*/
static inline int
futex_cmp_requeue_pi(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2,
int nr_requeue, int opflags)
{
return futex(uaddr, FUTEX_CMP_REQUEUE_PI, 1, nr_requeue, uaddr2,
val, opflags);
}
#endif /* _FUTEX_H */ #endif /* _FUTEX_H */
...@@ -133,7 +133,7 @@ static u64 dso_map_addr(struct bench_dso *dso) ...@@ -133,7 +133,7 @@ static u64 dso_map_addr(struct bench_dso *dso)
return 0x400000ULL + dso->ino * 8192ULL; return 0x400000ULL + dso->ino * 8192ULL;
} }
static u32 synthesize_attr(struct bench_data *data) static ssize_t synthesize_attr(struct bench_data *data)
{ {
union perf_event event; union perf_event event;
...@@ -151,7 +151,7 @@ static u32 synthesize_attr(struct bench_data *data) ...@@ -151,7 +151,7 @@ static u32 synthesize_attr(struct bench_data *data)
return writen(data->input_pipe[1], &event, event.header.size); return writen(data->input_pipe[1], &event, event.header.size);
} }
static u32 synthesize_fork(struct bench_data *data) static ssize_t synthesize_fork(struct bench_data *data)
{ {
union perf_event event; union perf_event event;
...@@ -169,8 +169,7 @@ static u32 synthesize_fork(struct bench_data *data) ...@@ -169,8 +169,7 @@ static u32 synthesize_fork(struct bench_data *data)
return writen(data->input_pipe[1], &event, event.header.size); return writen(data->input_pipe[1], &event, event.header.size);
} }
static u32 synthesize_mmap(struct bench_data *data, struct bench_dso *dso, static ssize_t synthesize_mmap(struct bench_data *data, struct bench_dso *dso, u64 timestamp)
u64 timestamp)
{ {
union perf_event event; union perf_event event;
size_t len = offsetof(struct perf_record_mmap2, filename); size_t len = offsetof(struct perf_record_mmap2, filename);
...@@ -198,23 +197,25 @@ static u32 synthesize_mmap(struct bench_data *data, struct bench_dso *dso, ...@@ -198,23 +197,25 @@ static u32 synthesize_mmap(struct bench_data *data, struct bench_dso *dso,
if (len > sizeof(event.mmap2)) { if (len > sizeof(event.mmap2)) {
/* write mmap2 event first */ /* write mmap2 event first */
writen(data->input_pipe[1], &event, len - bench_id_hdr_size); if (writen(data->input_pipe[1], &event, len - bench_id_hdr_size) < 0)
return -1;
/* zero-fill sample id header */ /* zero-fill sample id header */
memset(id_hdr_ptr, 0, bench_id_hdr_size); memset(id_hdr_ptr, 0, bench_id_hdr_size);
/* put timestamp in the right position */ /* put timestamp in the right position */
ts_idx = (bench_id_hdr_size / sizeof(u64)) - 2; ts_idx = (bench_id_hdr_size / sizeof(u64)) - 2;
id_hdr_ptr[ts_idx] = timestamp; id_hdr_ptr[ts_idx] = timestamp;
writen(data->input_pipe[1], id_hdr_ptr, bench_id_hdr_size); if (writen(data->input_pipe[1], id_hdr_ptr, bench_id_hdr_size) < 0)
} else { return -1;
ts_idx = (len / sizeof(u64)) - 2;
id_hdr_ptr[ts_idx] = timestamp; return len;
writen(data->input_pipe[1], &event, len);
} }
return len;
ts_idx = (len / sizeof(u64)) - 2;
id_hdr_ptr[ts_idx] = timestamp;
return writen(data->input_pipe[1], &event, len);
} }
static u32 synthesize_sample(struct bench_data *data, struct bench_dso *dso, static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso, u64 timestamp)
u64 timestamp)
{ {
union perf_event event; union perf_event event;
struct perf_sample sample = { struct perf_sample sample = {
...@@ -233,7 +234,7 @@ static u32 synthesize_sample(struct bench_data *data, struct bench_dso *dso, ...@@ -233,7 +234,7 @@ static u32 synthesize_sample(struct bench_data *data, struct bench_dso *dso,
return writen(data->input_pipe[1], &event, event.header.size); return writen(data->input_pipe[1], &event, event.header.size);
} }
static u32 synthesize_flush(struct bench_data *data) static ssize_t synthesize_flush(struct bench_data *data)
{ {
struct perf_event_header header = { struct perf_event_header header = {
.size = sizeof(header), .size = sizeof(header),
...@@ -348,14 +349,16 @@ static int inject_build_id(struct bench_data *data, u64 *max_rss) ...@@ -348,14 +349,16 @@ static int inject_build_id(struct bench_data *data, u64 *max_rss)
int status; int status;
unsigned int i, k; unsigned int i, k;
struct rusage rusage; struct rusage rusage;
u64 len = 0;
/* this makes the child to run */ /* this makes the child to run */
if (perf_header__write_pipe(data->input_pipe[1]) < 0) if (perf_header__write_pipe(data->input_pipe[1]) < 0)
return -1; return -1;
len += synthesize_attr(data); if (synthesize_attr(data) < 0)
len += synthesize_fork(data); return -1;
if (synthesize_fork(data) < 0)
return -1;
for (i = 0; i < nr_mmaps; i++) { for (i = 0; i < nr_mmaps; i++) {
int idx = rand() % (nr_dsos - 1); int idx = rand() % (nr_dsos - 1);
...@@ -363,13 +366,18 @@ static int inject_build_id(struct bench_data *data, u64 *max_rss) ...@@ -363,13 +366,18 @@ static int inject_build_id(struct bench_data *data, u64 *max_rss)
u64 timestamp = rand() % 1000000; u64 timestamp = rand() % 1000000;
pr_debug2(" [%d] injecting: %s\n", i+1, dso->name); pr_debug2(" [%d] injecting: %s\n", i+1, dso->name);
len += synthesize_mmap(data, dso, timestamp); if (synthesize_mmap(data, dso, timestamp) < 0)
return -1;
for (k = 0; k < nr_samples; k++) for (k = 0; k < nr_samples; k++) {
len += synthesize_sample(data, dso, timestamp + k * 1000); if (synthesize_sample(data, dso, timestamp + k * 1000) < 0)
return -1;
}
if ((i + 1) % 10 == 0) if ((i + 1) % 10 == 0) {
len += synthesize_flush(data); if (synthesize_flush(data) < 0)
return -1;
}
} }
/* this makes the child to finish */ /* this makes the child to finish */
......
...@@ -117,7 +117,7 @@ static int run_single_threaded(void) ...@@ -117,7 +117,7 @@ static int run_single_threaded(void)
int err; int err;
perf_set_singlethreaded(); perf_set_singlethreaded();
session = perf_session__new(NULL, false, NULL); session = perf_session__new(NULL, NULL);
if (IS_ERR(session)) { if (IS_ERR(session)) {
pr_err("Session creation failed.\n"); pr_err("Session creation failed.\n");
return PTR_ERR(session); return PTR_ERR(session);
...@@ -161,7 +161,7 @@ static int do_run_multi_threaded(struct target *target, ...@@ -161,7 +161,7 @@ static int do_run_multi_threaded(struct target *target,
init_stats(&time_stats); init_stats(&time_stats);
init_stats(&event_stats); init_stats(&event_stats);
for (i = 0; i < multi_iterations; i++) { for (i = 0; i < multi_iterations; i++) {
session = perf_session__new(NULL, false, NULL); session = perf_session__new(NULL, NULL);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -596,7 +596,7 @@ int cmd_annotate(int argc, const char **argv) ...@@ -596,7 +596,7 @@ int cmd_annotate(int argc, const char **argv)
data.path = input_name; data.path = input_name;
annotate.session = perf_session__new(&data, false, &annotate.tool); annotate.session = perf_session__new(&data, &annotate.tool);
if (IS_ERR(annotate.session)) if (IS_ERR(annotate.session))
return PTR_ERR(annotate.session); return PTR_ERR(annotate.session);
......
...@@ -88,6 +88,7 @@ static struct bench internals_benchmarks[] = { ...@@ -88,6 +88,7 @@ static struct bench internals_benchmarks[] = {
{ "synthesize", "Benchmark perf event synthesis", bench_synthesize }, { "synthesize", "Benchmark perf event synthesis", bench_synthesize },
{ "kallsyms-parse", "Benchmark kallsyms parsing", bench_kallsyms_parse }, { "kallsyms-parse", "Benchmark kallsyms parsing", bench_kallsyms_parse },
{ "inject-build-id", "Benchmark build-id injection", bench_inject_build_id }, { "inject-build-id", "Benchmark build-id injection", bench_inject_build_id },
{ "evlist-open-close", "Benchmark evlist open and close", bench_evlist_open_close },
{ NULL, NULL, NULL } { NULL, NULL, NULL }
}; };
......
...@@ -443,7 +443,7 @@ int cmd_buildid_cache(int argc, const char **argv) ...@@ -443,7 +443,7 @@ int cmd_buildid_cache(int argc, const char **argv)
data.path = missing_filename; data.path = missing_filename;
data.force = force; data.force = force;
session = perf_session__new(&data, false, NULL); session = perf_session__new(&data, NULL);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
} }
......
...@@ -65,7 +65,7 @@ static int perf_session__list_build_ids(bool force, bool with_hits) ...@@ -65,7 +65,7 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
if (filename__fprintf_build_id(input_name, stdout) > 0) if (filename__fprintf_build_id(input_name, stdout) > 0)
goto out; goto out;
session = perf_session__new(&data, false, &build_id__mark_dso_hit_ops); session = perf_session__new(&data, &build_id__mark_dso_hit_ops);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -2790,7 +2790,7 @@ static int perf_c2c__report(int argc, const char **argv) ...@@ -2790,7 +2790,7 @@ static int perf_c2c__report(int argc, const char **argv)
goto out; goto out;
} }
session = perf_session__new(&data, 0, &c2c.tool); session = perf_session__new(&data, &c2c.tool);
if (IS_ERR(session)) { if (IS_ERR(session)) {
err = PTR_ERR(session); err = PTR_ERR(session);
pr_debug("Error creating perf session\n"); pr_debug("Error creating perf session\n");
......
...@@ -21,46 +21,21 @@ static struct data_cmd data_cmds[]; ...@@ -21,46 +21,21 @@ static struct data_cmd data_cmds[];
#define for_each_cmd(cmd) \ #define for_each_cmd(cmd) \
for (cmd = data_cmds; cmd && cmd->name; cmd++) for (cmd = data_cmds; cmd && cmd->name; cmd++)
static const struct option data_options[] = {
OPT_END()
};
static const char * const data_subcommands[] = { "convert", NULL }; static const char * const data_subcommands[] = { "convert", NULL };
static const char *data_usage[] = { static const char *data_usage[] = {
"perf data [<common options>] <command> [<options>]", "perf data convert [<options>]",
NULL NULL
}; };
static void print_usage(void) const char *to_json;
{ const char *to_ctf;
struct data_cmd *cmd; struct perf_data_convert_opts opts = {
.force = false,
printf("Usage:\n"); .all = false,
printf("\t%s\n\n", data_usage[0]);
printf("\tAvailable commands:\n");
for_each_cmd(cmd) {
printf("\t %s\t- %s\n", cmd->name, cmd->summary);
}
printf("\n");
}
static const char * const data_convert_usage[] = {
"perf data convert [<options>]",
NULL
}; };
static int cmd_data_convert(int argc, const char **argv) const struct option data_options[] = {
{
const char *to_json = NULL;
const char *to_ctf = NULL;
struct perf_data_convert_opts opts = {
.force = false,
.all = false,
};
const struct option options[] = {
OPT_INCR('v', "verbose", &verbose, "be more verbose"), OPT_INCR('v', "verbose", &verbose, "be more verbose"),
OPT_STRING('i', "input", &input_name, "file", "input file name"), OPT_STRING('i', "input", &input_name, "file", "input file name"),
OPT_STRING(0, "to-json", &to_json, NULL, "Convert to JSON format"), OPT_STRING(0, "to-json", &to_json, NULL, "Convert to JSON format"),
...@@ -73,10 +48,13 @@ static int cmd_data_convert(int argc, const char **argv) ...@@ -73,10 +48,13 @@ static int cmd_data_convert(int argc, const char **argv)
OPT_END() OPT_END()
}; };
argc = parse_options(argc, argv, options, static int cmd_data_convert(int argc, const char **argv)
data_convert_usage, 0); {
argc = parse_options(argc, argv, data_options,
data_usage, 0);
if (argc) { if (argc) {
usage_with_options(data_convert_usage, options); usage_with_options(data_usage, data_options);
return -1; return -1;
} }
...@@ -116,14 +94,13 @@ int cmd_data(int argc, const char **argv) ...@@ -116,14 +94,13 @@ int cmd_data(int argc, const char **argv)
struct data_cmd *cmd; struct data_cmd *cmd;
const char *cmdstr; const char *cmdstr;
/* No command specified. */
if (argc < 2)
goto usage;
argc = parse_options_subcommand(argc, argv, data_options, data_subcommands, data_usage, argc = parse_options_subcommand(argc, argv, data_options, data_subcommands, data_usage,
PARSE_OPT_STOP_AT_NON_OPTION); PARSE_OPT_STOP_AT_NON_OPTION);
if (argc < 1)
goto usage; if (!argc) {
usage_with_options(data_usage, data_options);
return -1;
}
cmdstr = argv[0]; cmdstr = argv[0];
...@@ -135,7 +112,6 @@ int cmd_data(int argc, const char **argv) ...@@ -135,7 +112,6 @@ int cmd_data(int argc, const char **argv)
} }
pr_err("Unknown command: %s\n", cmdstr); pr_err("Unknown command: %s\n", cmdstr);
usage: usage_with_options(data_usage, data_options);
print_usage();
return -1; return -1;
} }
...@@ -1156,7 +1156,7 @@ static int check_file_brstack(void) ...@@ -1156,7 +1156,7 @@ static int check_file_brstack(void)
int i; int i;
data__for_each_file(i, d) { data__for_each_file(i, d) {
d->session = perf_session__new(&d->data, false, &pdiff.tool); d->session = perf_session__new(&d->data, &pdiff.tool);
if (IS_ERR(d->session)) { if (IS_ERR(d->session)) {
pr_err("Failed to open %s\n", d->data.path); pr_err("Failed to open %s\n", d->data.path);
return PTR_ERR(d->session); return PTR_ERR(d->session);
...@@ -1188,7 +1188,7 @@ static int __cmd_diff(void) ...@@ -1188,7 +1188,7 @@ static int __cmd_diff(void)
ret = -EINVAL; ret = -EINVAL;
data__for_each_file(i, d) { data__for_each_file(i, d) {
d->session = perf_session__new(&d->data, false, &pdiff.tool); d->session = perf_session__new(&d->data, &pdiff.tool);
if (IS_ERR(d->session)) { if (IS_ERR(d->session)) {
ret = PTR_ERR(d->session); ret = PTR_ERR(d->session);
pr_err("Failed to open %s\n", d->data.path); pr_err("Failed to open %s\n", d->data.path);
......
...@@ -42,7 +42,7 @@ static int __cmd_evlist(const char *file_name, struct perf_attr_details *details ...@@ -42,7 +42,7 @@ static int __cmd_evlist(const char *file_name, struct perf_attr_details *details
}; };
bool has_tracepoint = false; bool has_tracepoint = false;
session = perf_session__new(&data, 0, &tool); session = perf_session__new(&data, &tool);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -46,6 +46,7 @@ struct perf_inject { ...@@ -46,6 +46,7 @@ struct perf_inject {
bool jit_mode; bool jit_mode;
bool in_place_update; bool in_place_update;
bool in_place_update_dry_run; bool in_place_update_dry_run;
bool is_pipe;
const char *input_name; const char *input_name;
struct perf_data output; struct perf_data output;
u64 bytes_written; u64 bytes_written;
...@@ -126,7 +127,7 @@ static int perf_event__repipe_attr(struct perf_tool *tool, ...@@ -126,7 +127,7 @@ static int perf_event__repipe_attr(struct perf_tool *tool,
if (ret) if (ret)
return ret; return ret;
if (!inject->output.is_pipe) if (!inject->is_pipe)
return 0; return 0;
return perf_event__repipe_synth(tool, event); return perf_event__repipe_synth(tool, event);
...@@ -826,14 +827,14 @@ static int __cmd_inject(struct perf_inject *inject) ...@@ -826,14 +827,14 @@ static int __cmd_inject(struct perf_inject *inject)
if (!inject->itrace_synth_opts.set) if (!inject->itrace_synth_opts.set)
auxtrace_index__free(&session->auxtrace_index); auxtrace_index__free(&session->auxtrace_index);
if (!data_out->is_pipe && !inject->in_place_update) if (!inject->is_pipe && !inject->in_place_update)
lseek(fd, output_data_offset, SEEK_SET); lseek(fd, output_data_offset, SEEK_SET);
ret = perf_session__process_events(session); ret = perf_session__process_events(session);
if (ret) if (ret)
return ret; return ret;
if (!data_out->is_pipe && !inject->in_place_update) { if (!inject->is_pipe && !inject->in_place_update) {
if (inject->build_ids) if (inject->build_ids)
perf_header__set_feat(&session->header, perf_header__set_feat(&session->header,
HEADER_BUILD_ID); HEADER_BUILD_ID);
...@@ -918,6 +919,7 @@ int cmd_inject(int argc, const char **argv) ...@@ -918,6 +919,7 @@ int cmd_inject(int argc, const char **argv)
.use_stdio = true, .use_stdio = true,
}; };
int ret; int ret;
bool repipe = true;
struct option options[] = { struct option options[] = {
OPT_BOOLEAN('b', "build-ids", &inject.build_ids, OPT_BOOLEAN('b', "build-ids", &inject.build_ids,
...@@ -992,7 +994,20 @@ int cmd_inject(int argc, const char **argv) ...@@ -992,7 +994,20 @@ int cmd_inject(int argc, const char **argv)
} }
data.path = inject.input_name; data.path = inject.input_name;
inject.session = perf_session__new(&data, inject.output.is_pipe, &inject.tool); if (!strcmp(inject.input_name, "-") || inject.output.is_pipe) {
inject.is_pipe = true;
/*
* Do not repipe header when input is a regular file
* since either it can rewrite the header at the end
* or write a new pipe header.
*/
if (strcmp(inject.input_name, "-"))
repipe = false;
}
inject.session = __perf_session__new(&data, repipe,
perf_data__fd(&inject.output),
&inject.tool);
if (IS_ERR(inject.session)) { if (IS_ERR(inject.session)) {
ret = PTR_ERR(inject.session); ret = PTR_ERR(inject.session);
goto out_close_output; goto out_close_output;
...@@ -1001,6 +1016,21 @@ int cmd_inject(int argc, const char **argv) ...@@ -1001,6 +1016,21 @@ int cmd_inject(int argc, const char **argv)
if (zstd_init(&(inject.session->zstd_data), 0) < 0) if (zstd_init(&(inject.session->zstd_data), 0) < 0)
pr_warning("Decompression initialization failed.\n"); pr_warning("Decompression initialization failed.\n");
if (!data.is_pipe && inject.output.is_pipe) {
ret = perf_header__write_pipe(perf_data__fd(&inject.output));
if (ret < 0) {
pr_err("Couldn't write a new pipe header.\n");
goto out_delete;
}
ret = perf_event__synthesize_for_pipe(&inject.tool,
inject.session,
&inject.output,
perf_event__repipe);
if (ret < 0)
goto out_delete;
}
if (inject.build_ids && !inject.build_id_all) { if (inject.build_ids && !inject.build_id_all) {
/* /*
* to make sure the mmap records are ordered correctly * to make sure the mmap records are ordered correctly
......
...@@ -1953,7 +1953,7 @@ int cmd_kmem(int argc, const char **argv) ...@@ -1953,7 +1953,7 @@ int cmd_kmem(int argc, const char **argv)
data.path = input_name; data.path = input_name;
kmem_session = session = perf_session__new(&data, false, &perf_kmem); kmem_session = session = perf_session__new(&data, &perf_kmem);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -1093,7 +1093,7 @@ static int read_events(struct perf_kvm_stat *kvm) ...@@ -1093,7 +1093,7 @@ static int read_events(struct perf_kvm_stat *kvm)
}; };
kvm->tool = eops; kvm->tool = eops;
kvm->session = perf_session__new(&file, false, &kvm->tool); kvm->session = perf_session__new(&file, &kvm->tool);
if (IS_ERR(kvm->session)) { if (IS_ERR(kvm->session)) {
pr_err("Initializing perf session failed\n"); pr_err("Initializing perf session failed\n");
return PTR_ERR(kvm->session); return PTR_ERR(kvm->session);
...@@ -1447,7 +1447,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, ...@@ -1447,7 +1447,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
/* /*
* perf session * perf session
*/ */
kvm->session = perf_session__new(&data, false, &kvm->tool); kvm->session = perf_session__new(&data, &kvm->tool);
if (IS_ERR(kvm->session)) { if (IS_ERR(kvm->session)) {
err = PTR_ERR(kvm->session); err = PTR_ERR(kvm->session);
goto out; goto out;
......
...@@ -868,7 +868,7 @@ static int __cmd_report(bool display_info) ...@@ -868,7 +868,7 @@ static int __cmd_report(bool display_info)
.force = force, .force = force,
}; };
session = perf_session__new(&data, false, &eops); session = perf_session__new(&data, &eops);
if (IS_ERR(session)) { if (IS_ERR(session)) {
pr_err("Initializing perf session failed\n"); pr_err("Initializing perf session failed\n");
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -271,8 +271,7 @@ static int report_raw_events(struct perf_mem *mem) ...@@ -271,8 +271,7 @@ static int report_raw_events(struct perf_mem *mem)
.force = mem->force, .force = mem->force,
}; };
int ret; int ret;
struct perf_session *session = perf_session__new(&data, false, struct perf_session *session = perf_session__new(&data, &mem->tool);
&mem->tool);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -910,7 +910,8 @@ static int record__open(struct record *rec) ...@@ -910,7 +910,8 @@ static int record__open(struct record *rec)
* Enable the dummy event when the process is forked for * Enable the dummy event when the process is forked for
* initial_delay, immediately for system wide. * initial_delay, immediately for system wide.
*/ */
if (opts->initial_delay && !pos->immediate) if (opts->initial_delay && !pos->immediate &&
!target__has_cpu(&opts->target))
pos->core.attr.enable_on_exec = 1; pos->core.attr.enable_on_exec = 1;
else else
pos->immediate = 1; pos->immediate = 1;
...@@ -1387,7 +1388,6 @@ static int record__synthesize(struct record *rec, bool tail) ...@@ -1387,7 +1388,6 @@ static int record__synthesize(struct record *rec, bool tail)
struct perf_data *data = &rec->data; struct perf_data *data = &rec->data;
struct record_opts *opts = &rec->opts; struct record_opts *opts = &rec->opts;
struct perf_tool *tool = &rec->tool; struct perf_tool *tool = &rec->tool;
int fd = perf_data__fd(data);
int err = 0; int err = 0;
event_op f = process_synthesized_event; event_op f = process_synthesized_event;
...@@ -1395,41 +1395,12 @@ static int record__synthesize(struct record *rec, bool tail) ...@@ -1395,41 +1395,12 @@ static int record__synthesize(struct record *rec, bool tail)
return 0; return 0;
if (data->is_pipe) { if (data->is_pipe) {
/* err = perf_event__synthesize_for_pipe(tool, session, data,
* We need to synthesize events first, because some
* features works on top of them (on report side).
*/
err = perf_event__synthesize_attrs(tool, rec->evlist,
process_synthesized_event);
if (err < 0) {
pr_err("Couldn't synthesize attrs.\n");
goto out;
}
err = perf_event__synthesize_features(tool, session, rec->evlist,
process_synthesized_event); process_synthesized_event);
if (err < 0) { if (err < 0)
pr_err("Couldn't synthesize features.\n"); goto out;
return err;
}
if (have_tracepoints(&rec->evlist->core.entries)) { rec->bytes_written += err;
/*
* FIXME err <= 0 here actually means that
* there were no tracepoints so its not really
* an error, just that we don't need to
* synthesize anything. We really have to
* return this more properly and also
* propagate errors that now are calling die()
*/
err = perf_event__synthesize_tracing_data(tool, fd, rec->evlist,
process_synthesized_event);
if (err <= 0) {
pr_err("Couldn't record tracing data.\n");
goto out;
}
rec->bytes_written += err;
}
} }
err = perf_event__synth_time_conv(record__pick_pc(rec), tool, err = perf_event__synth_time_conv(record__pick_pc(rec), tool,
...@@ -1681,7 +1652,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) ...@@ -1681,7 +1652,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
signal(SIGUSR2, SIG_IGN); signal(SIGUSR2, SIG_IGN);
} }
session = perf_session__new(data, false, tool); session = perf_session__new(data, tool);
if (IS_ERR(session)) { if (IS_ERR(session)) {
pr_err("Perf session creation failed.\n"); pr_err("Perf session creation failed.\n");
return PTR_ERR(session); return PTR_ERR(session);
...@@ -2884,6 +2855,13 @@ int cmd_record(int argc, const char **argv) ...@@ -2884,6 +2855,13 @@ int cmd_record(int argc, const char **argv)
/* Enable ignoring missing threads when -u/-p option is defined. */ /* Enable ignoring missing threads when -u/-p option is defined. */
rec->opts.ignore_missing_thread = rec->opts.target.uid != UINT_MAX || rec->opts.target.pid; rec->opts.ignore_missing_thread = rec->opts.target.uid != UINT_MAX || rec->opts.target.pid;
if (evlist__fix_hybrid_cpus(rec->evlist, rec->opts.target.cpu_list)) {
pr_err("failed to use cpu list %s\n",
rec->opts.target.cpu_list);
goto out;
}
rec->opts.target.hybrid = perf_pmu__has_hybrid();
err = -ENOMEM; err = -ENOMEM;
if (evlist__create_maps(rec->evlist, &rec->opts.target) < 0) if (evlist__create_maps(rec->evlist, &rec->opts.target) < 0)
usage_with_options(record_usage, record_options); usage_with_options(record_usage, record_options);
......
...@@ -1411,7 +1411,7 @@ int cmd_report(int argc, const char **argv) ...@@ -1411,7 +1411,7 @@ int cmd_report(int argc, const char **argv)
data.force = symbol_conf.force; data.force = symbol_conf.force;
repeat: repeat:
session = perf_session__new(&data, false, &report.tool); session = perf_session__new(&data, &report.tool);
if (IS_ERR(session)) { if (IS_ERR(session)) {
ret = PTR_ERR(session); ret = PTR_ERR(session);
goto exit; goto exit;
......
...@@ -1804,7 +1804,7 @@ static int perf_sched__read_events(struct perf_sched *sched) ...@@ -1804,7 +1804,7 @@ static int perf_sched__read_events(struct perf_sched *sched)
}; };
int rc = -1; int rc = -1;
session = perf_session__new(&data, false, &sched->tool); session = perf_session__new(&data, &sched->tool);
if (IS_ERR(session)) { if (IS_ERR(session)) {
pr_debug("Error creating perf session"); pr_debug("Error creating perf session");
return PTR_ERR(session); return PTR_ERR(session);
...@@ -3011,7 +3011,7 @@ static int perf_sched__timehist(struct perf_sched *sched) ...@@ -3011,7 +3011,7 @@ static int perf_sched__timehist(struct perf_sched *sched)
symbol_conf.use_callchain = sched->show_callchain; symbol_conf.use_callchain = sched->show_callchain;
session = perf_session__new(&data, false, &sched->tool); session = perf_session__new(&data, &sched->tool);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -2212,7 +2212,7 @@ static int process_sample_event(struct perf_tool *tool, ...@@ -2212,7 +2212,7 @@ static int process_sample_event(struct perf_tool *tool,
if (filter_cpu(sample)) if (filter_cpu(sample))
goto out_put; goto out_put;
if (machine__resolve(machine, &al, sample) < 0) { if (!al.thread && machine__resolve(machine, &al, sample) < 0) {
pr_err("problem processing %d event, skipping it.\n", pr_err("problem processing %d event, skipping it.\n",
event->header.type); event->header.type);
ret = -1; ret = -1;
...@@ -2492,6 +2492,17 @@ process_lost_event(struct perf_tool *tool, ...@@ -2492,6 +2492,17 @@ process_lost_event(struct perf_tool *tool,
sample->tid); sample->tid);
} }
static int
process_throttle_event(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine)
{
if (scripting_ops && scripting_ops->process_throttle)
scripting_ops->process_throttle(event, sample, machine);
return 0;
}
static int static int
process_finished_round_event(struct perf_tool *tool __maybe_unused, process_finished_round_event(struct perf_tool *tool __maybe_unused,
union perf_event *event, union perf_event *event,
...@@ -3294,7 +3305,7 @@ int find_scripts(char **scripts_array, char **scripts_path_array, int num, ...@@ -3294,7 +3305,7 @@ int find_scripts(char **scripts_array, char **scripts_path_array, int num,
char *temp; char *temp;
int i = 0; int i = 0;
session = perf_session__new(&data, false, NULL); session = perf_session__new(&data, NULL);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
...@@ -3652,6 +3663,8 @@ int cmd_script(int argc, const char **argv) ...@@ -3652,6 +3663,8 @@ int cmd_script(int argc, const char **argv)
.stat_config = process_stat_config_event, .stat_config = process_stat_config_event,
.thread_map = process_thread_map_event, .thread_map = process_thread_map_event,
.cpu_map = process_cpu_map_event, .cpu_map = process_cpu_map_event,
.throttle = process_throttle_event,
.unthrottle = process_throttle_event,
.ordered_events = true, .ordered_events = true,
.ordering_requires_timestamps = true, .ordering_requires_timestamps = true,
}, },
...@@ -4007,7 +4020,7 @@ int cmd_script(int argc, const char **argv) ...@@ -4007,7 +4020,7 @@ int cmd_script(int argc, const char **argv)
use_browser = 0; use_browser = 0;
} }
session = perf_session__new(&data, false, &script.tool); session = perf_session__new(&data, &script.tool);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
...@@ -1996,7 +1996,7 @@ static int __cmd_record(int argc, const char **argv) ...@@ -1996,7 +1996,7 @@ static int __cmd_record(int argc, const char **argv)
return -1; return -1;
} }
session = perf_session__new(data, false, NULL); session = perf_session__new(data, NULL);
if (IS_ERR(session)) { if (IS_ERR(session)) {
pr_err("Perf session creation failed\n"); pr_err("Perf session creation failed\n");
return PTR_ERR(session); return PTR_ERR(session);
...@@ -2168,7 +2168,7 @@ static int __cmd_report(int argc, const char **argv) ...@@ -2168,7 +2168,7 @@ static int __cmd_report(int argc, const char **argv)
perf_stat.data.path = input_name; perf_stat.data.path = input_name;
perf_stat.data.mode = PERF_DATA_MODE_READ; perf_stat.data.mode = PERF_DATA_MODE_READ;
session = perf_session__new(&perf_stat.data, false, &perf_stat.tool); session = perf_session__new(&perf_stat.data, &perf_stat.tool);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
...@@ -2386,7 +2386,8 @@ int cmd_stat(int argc, const char **argv) ...@@ -2386,7 +2386,8 @@ int cmd_stat(int argc, const char **argv)
* --per-thread is aggregated per thread, we dont mix it with cpu mode * --per-thread is aggregated per thread, we dont mix it with cpu mode
*/ */
if (((stat_config.aggr_mode != AGGR_GLOBAL && if (((stat_config.aggr_mode != AGGR_GLOBAL &&
stat_config.aggr_mode != AGGR_THREAD) || nr_cgroups) && stat_config.aggr_mode != AGGR_THREAD) ||
(nr_cgroups || stat_config.cgroup_list)) &&
!target__has_cpu(&target)) { !target__has_cpu(&target)) {
fprintf(stderr, "both cgroup and no-aggregation " fprintf(stderr, "both cgroup and no-aggregation "
"modes only available in system-wide mode\n"); "modes only available in system-wide mode\n");
...@@ -2394,6 +2395,7 @@ int cmd_stat(int argc, const char **argv) ...@@ -2394,6 +2395,7 @@ int cmd_stat(int argc, const char **argv)
parse_options_usage(stat_usage, stat_options, "G", 1); parse_options_usage(stat_usage, stat_options, "G", 1);
parse_options_usage(NULL, stat_options, "A", 1); parse_options_usage(NULL, stat_options, "A", 1);
parse_options_usage(NULL, stat_options, "a", 1); parse_options_usage(NULL, stat_options, "a", 1);
parse_options_usage(NULL, stat_options, "for-each-cgroup", 0);
goto out; goto out;
} }
...@@ -2430,6 +2432,12 @@ int cmd_stat(int argc, const char **argv) ...@@ -2430,6 +2432,12 @@ int cmd_stat(int argc, const char **argv)
if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide)) if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
target.per_thread = true; target.per_thread = true;
if (evlist__fix_hybrid_cpus(evsel_list, target.cpu_list)) {
pr_err("failed to use cpu list %s\n", target.cpu_list);
goto out;
}
target.hybrid = perf_pmu__has_hybrid();
if (evlist__create_maps(evsel_list, &target) < 0) { if (evlist__create_maps(evsel_list, &target) < 0) {
if (target__has_task(&target)) { if (target__has_task(&target)) {
pr_err("Problems finding threads of monitor\n"); pr_err("Problems finding threads of monitor\n");
......
...@@ -1598,8 +1598,7 @@ static int __cmd_timechart(struct timechart *tchart, const char *output_name) ...@@ -1598,8 +1598,7 @@ static int __cmd_timechart(struct timechart *tchart, const char *output_name)
.force = tchart->force, .force = tchart->force,
}; };
struct perf_session *session = perf_session__new(&data, false, struct perf_session *session = perf_session__new(&data, &tchart->tool);
&tchart->tool);
int ret = -EINVAL; int ret = -EINVAL;
if (IS_ERR(session)) if (IS_ERR(session))
......
...@@ -1740,7 +1740,7 @@ int cmd_top(int argc, const char **argv) ...@@ -1740,7 +1740,7 @@ int cmd_top(int argc, const char **argv)
signal(SIGWINCH, winch_sig); signal(SIGWINCH, winch_sig);
} }
top.session = perf_session__new(NULL, false, NULL); top.session = perf_session__new(NULL, NULL);
if (IS_ERR(top.session)) { if (IS_ERR(top.session)) {
status = PTR_ERR(top.session); status = PTR_ERR(top.session);
goto out_delete_evlist; goto out_delete_evlist;
......
...@@ -707,7 +707,15 @@ static size_t syscall_arg__scnprintf_char_array(char *bf, size_t size, struct sy ...@@ -707,7 +707,15 @@ static size_t syscall_arg__scnprintf_char_array(char *bf, size_t size, struct sy
static const char *bpf_cmd[] = { static const char *bpf_cmd[] = {
"MAP_CREATE", "MAP_LOOKUP_ELEM", "MAP_UPDATE_ELEM", "MAP_DELETE_ELEM", "MAP_CREATE", "MAP_LOOKUP_ELEM", "MAP_UPDATE_ELEM", "MAP_DELETE_ELEM",
"MAP_GET_NEXT_KEY", "PROG_LOAD", "MAP_GET_NEXT_KEY", "PROG_LOAD", "OBJ_PIN", "OBJ_GET", "PROG_ATTACH",
"PROG_DETACH", "PROG_TEST_RUN", "PROG_GET_NEXT_ID", "MAP_GET_NEXT_ID",
"PROG_GET_FD_BY_ID", "MAP_GET_FD_BY_ID", "OBJ_GET_INFO_BY_FD",
"PROG_QUERY", "RAW_TRACEPOINT_OPEN", "BTF_LOAD", "BTF_GET_FD_BY_ID",
"TASK_FD_QUERY", "MAP_LOOKUP_AND_DELETE_ELEM", "MAP_FREEZE",
"BTF_GET_NEXT_ID", "MAP_LOOKUP_BATCH", "MAP_LOOKUP_AND_DELETE_BATCH",
"MAP_UPDATE_BATCH", "MAP_DELETE_BATCH", "LINK_CREATE", "LINK_UPDATE",
"LINK_GET_FD_BY_ID", "LINK_GET_NEXT_ID", "ENABLE_STATS", "ITER_CREATE",
"LINK_DETACH", "PROG_BIND_MAP",
}; };
static DEFINE_STRARRAY(bpf_cmd, "BPF_"); static DEFINE_STRARRAY(bpf_cmd, "BPF_");
...@@ -4228,7 +4236,7 @@ static int trace__replay(struct trace *trace) ...@@ -4228,7 +4236,7 @@ static int trace__replay(struct trace *trace)
/* add tid to output */ /* add tid to output */
trace->multiple_threads = true; trace->multiple_threads = true;
session = perf_session__new(&data, false, &trace->tool); session = perf_session__new(&data, &trace->tool);
if (IS_ERR(session)) if (IS_ERR(session))
return PTR_ERR(session); return PTR_ERR(session);
......
// SPDX-License-Identifier: GPL-2.0
/*
* dlfilter-test-api-v0.c: test original (v0) API for perf --dlfilter shared object
* Copyright (c) 2021, Intel Corporation.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
/*
* Copy original (v0) API instead of including current API
*/
#include <linux/perf_event.h>
#include <linux/types.h>
/* Definitions for perf_dlfilter_sample flags */
enum {
PERF_DLFILTER_FLAG_BRANCH = 1ULL << 0,
PERF_DLFILTER_FLAG_CALL = 1ULL << 1,
PERF_DLFILTER_FLAG_RETURN = 1ULL << 2,
PERF_DLFILTER_FLAG_CONDITIONAL = 1ULL << 3,
PERF_DLFILTER_FLAG_SYSCALLRET = 1ULL << 4,
PERF_DLFILTER_FLAG_ASYNC = 1ULL << 5,
PERF_DLFILTER_FLAG_INTERRUPT = 1ULL << 6,
PERF_DLFILTER_FLAG_TX_ABORT = 1ULL << 7,
PERF_DLFILTER_FLAG_TRACE_BEGIN = 1ULL << 8,
PERF_DLFILTER_FLAG_TRACE_END = 1ULL << 9,
PERF_DLFILTER_FLAG_IN_TX = 1ULL << 10,
PERF_DLFILTER_FLAG_VMENTRY = 1ULL << 11,
PERF_DLFILTER_FLAG_VMEXIT = 1ULL << 12,
};
/*
* perf sample event information (as per perf script and <linux/perf_event.h>)
*/
struct perf_dlfilter_sample {
__u32 size; /* Size of this structure (for compatibility checking) */
__u16 ins_lat; /* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
__u16 p_stage_cyc; /* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
__u64 ip;
__s32 pid;
__s32 tid;
__u64 time;
__u64 addr;
__u64 id;
__u64 stream_id;
__u64 period;
__u64 weight; /* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
__u64 transaction; /* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */
__u64 insn_cnt; /* For instructions-per-cycle (IPC) */
__u64 cyc_cnt; /* For instructions-per-cycle (IPC) */
__s32 cpu;
__u32 flags; /* Refer PERF_DLFILTER_FLAG_* above */
__u64 data_src; /* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */
__u64 phys_addr; /* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */
__u64 data_page_size; /* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */
__u64 code_page_size; /* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */
__u64 cgroup; /* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */
__u8 cpumode; /* Refer CPUMODE_MASK etc in <linux/perf_event.h> */
__u8 addr_correlates_sym; /* True => resolve_addr() can be called */
__u16 misc; /* Refer perf_event_header in <linux/perf_event.h> */
__u32 raw_size; /* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
const void *raw_data; /* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
__u64 brstack_nr; /* Number of brstack entries */
const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */
__u64 raw_callchain_nr; /* Number of raw_callchain entries */
const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
const char *event;
};
/*
* Address location (as per perf script)
*/
struct perf_dlfilter_al {
__u32 size; /* Size of this structure (for compatibility checking) */
__u32 symoff;
const char *sym;
__u64 addr; /* Mapped address (from dso) */
__u64 sym_start;
__u64 sym_end;
const char *dso;
__u8 sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */
__u8 is_64_bit; /* Only valid if dso is not NULL */
__u8 is_kernel_ip; /* True if in kernel space */
__u32 buildid_size;
__u8 *buildid;
/* Below members are only populated by resolve_ip() */
__u8 filtered; /* True if this sample event will be filtered out */
const char *comm;
};
struct perf_dlfilter_fns {
/* Return information about ip */
const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
/* Return information about addr (if addr_correlates_sym) */
const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
/* Return arguments from --dlarg option */
char **(*args)(void *ctx, int *dlargc);
/*
* Return information about address (al->size must be set before
* calling). Returns 0 on success, -1 otherwise.
*/
__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
/* Return instruction bytes and length */
const __u8 *(*insn)(void *ctx, __u32 *length);
/* Return source file name and line number */
const char *(*srcline)(void *ctx, __u32 *line_number);
/* Return perf_event_attr, refer <linux/perf_event.h> */
struct perf_event_attr *(*attr)(void *ctx);
/* Read object code, return numbers of bytes read */
__s32 (*object_code)(void *ctx, __u64 ip, void *buf, __u32 len);
/* Reserved */
void *(*reserved[120])(void *);
};
struct perf_dlfilter_fns perf_dlfilter_fns;
static int verbose;
#define pr_debug(fmt, ...) do { \
if (verbose) \
fprintf(stderr, fmt, ##__VA_ARGS__); \
} while (0)
static int test_fail(const char *msg)
{
pr_debug("%s\n", msg);
return -1;
}
#define CHECK(x) do { \
if (!(x)) \
return test_fail("Check '" #x "' failed\n"); \
} while (0)
struct filter_data {
__u64 ip;
__u64 addr;
int do_early;
int early_filter_cnt;
int filter_cnt;
};
static struct filter_data *filt_dat;
int start(void **data, void *ctx)
{
int dlargc;
char **dlargv;
struct filter_data *d;
static bool called;
verbose = 1;
CHECK(!filt_dat && !called);
called = true;
d = calloc(1, sizeof(*d));
if (!d)
test_fail("Failed to allocate memory");
filt_dat = d;
*data = d;
dlargv = perf_dlfilter_fns.args(ctx, &dlargc);
CHECK(dlargc == 6);
CHECK(!strcmp(dlargv[0], "first"));
verbose = strtol(dlargv[1], NULL, 0);
d->ip = strtoull(dlargv[2], NULL, 0);
d->addr = strtoull(dlargv[3], NULL, 0);
d->do_early = strtol(dlargv[4], NULL, 0);
CHECK(!strcmp(dlargv[5], "last"));
pr_debug("%s API\n", __func__);
return 0;
}
#define CHECK_SAMPLE(x) do { \
if (sample->x != expected.x) \
return test_fail("'" #x "' not expected value\n"); \
} while (0)
static int check_sample(struct filter_data *d, const struct perf_dlfilter_sample *sample)
{
struct perf_dlfilter_sample expected = {
.ip = d->ip,
.pid = 12345,
.tid = 12346,
.time = 1234567890,
.addr = d->addr,
.id = 99,
.stream_id = 101,
.period = 543212345,
.cpu = 31,
.cpumode = PERF_RECORD_MISC_USER,
.addr_correlates_sym = 1,
.misc = PERF_RECORD_MISC_USER,
};
CHECK(sample->size >= sizeof(struct perf_dlfilter_sample));
CHECK_SAMPLE(ip);
CHECK_SAMPLE(pid);
CHECK_SAMPLE(tid);
CHECK_SAMPLE(time);
CHECK_SAMPLE(addr);
CHECK_SAMPLE(id);
CHECK_SAMPLE(stream_id);
CHECK_SAMPLE(period);
CHECK_SAMPLE(cpu);
CHECK_SAMPLE(cpumode);
CHECK_SAMPLE(addr_correlates_sym);
CHECK_SAMPLE(misc);
CHECK(!sample->raw_data);
CHECK_SAMPLE(brstack_nr);
CHECK(!sample->brstack);
CHECK_SAMPLE(raw_callchain_nr);
CHECK(!sample->raw_callchain);
#define EVENT_NAME "branches:"
CHECK(!strncmp(sample->event, EVENT_NAME, strlen(EVENT_NAME)));
return 0;
}
static int check_al(void *ctx)
{
const struct perf_dlfilter_al *al;
al = perf_dlfilter_fns.resolve_ip(ctx);
if (!al)
return test_fail("resolve_ip() failed");
CHECK(al->sym && !strcmp("foo", al->sym));
CHECK(!al->symoff);
return 0;
}
static int check_addr_al(void *ctx)
{
const struct perf_dlfilter_al *addr_al;
addr_al = perf_dlfilter_fns.resolve_addr(ctx);
if (!addr_al)
return test_fail("resolve_addr() failed");
CHECK(addr_al->sym && !strcmp("bar", addr_al->sym));
CHECK(!addr_al->symoff);
return 0;
}
static int check_attr(void *ctx)
{
struct perf_event_attr *attr = perf_dlfilter_fns.attr(ctx);
CHECK(attr);
CHECK(attr->type == PERF_TYPE_HARDWARE);
CHECK(attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
return 0;
}
static int do_checks(void *data, const struct perf_dlfilter_sample *sample, void *ctx, bool early)
{
struct filter_data *d = data;
CHECK(data && filt_dat == data);
if (early) {
CHECK(!d->early_filter_cnt);
d->early_filter_cnt += 1;
} else {
CHECK(!d->filter_cnt);
CHECK(d->early_filter_cnt);
CHECK(d->do_early != 2);
d->filter_cnt += 1;
}
if (check_sample(data, sample))
return -1;
if (check_attr(ctx))
return -1;
if (early && !d->do_early)
return 0;
if (check_al(ctx) || check_addr_al(ctx))
return -1;
if (early)
return d->do_early == 2;
return 1;
}
int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
pr_debug("%s API\n", __func__);
return do_checks(data, sample, ctx, true);
}
int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
struct filter_data *d = data;
pr_debug("%s API\n", __func__);
return do_checks(data, sample, ctx, false);
}
int stop(void *data, void *ctx)
{
static bool called;
pr_debug("%s API\n", __func__);
CHECK(data && filt_dat == data && !called);
called = true;
free(data);
filt_dat = NULL;
return 0;
}
const char *filter_description(const char **long_description)
{
*long_description = "Filter used by the 'dlfilter C API' perf test";
return "dlfilter to test v0 C API";
}
...@@ -6,10 +6,13 @@ pmu-events-y += pmu-events.o ...@@ -6,10 +6,13 @@ pmu-events-y += pmu-events.o
JDIR = pmu-events/arch/$(SRCARCH) JDIR = pmu-events/arch/$(SRCARCH)
JSON = $(shell [ -d $(JDIR) ] && \ JSON = $(shell [ -d $(JDIR) ] && \
find $(JDIR) -name '*.json' -o -name 'mapfile.csv') find $(JDIR) -name '*.json' -o -name 'mapfile.csv')
JDIR_TEST = pmu-events/arch/test
JSON_TEST = $(shell [ -d $(JDIR_TEST) ] && \
find $(JDIR_TEST) -name '*.json')
# #
# Locate/process JSON files in pmu-events/arch/ # Locate/process JSON files in pmu-events/arch/
# directory and create tables in pmu-events.c. # directory and create tables in pmu-events.c.
# #
$(OUTPUT)pmu-events/pmu-events.c: $(JSON) $(JEVENTS) $(OUTPUT)pmu-events/pmu-events.c: $(JSON) $(JSON_TEST) $(JEVENTS)
$(Q)$(call echo-cmd,gen)$(JEVENTS) $(SRCARCH) pmu-events/arch $(OUTPUT)pmu-events/pmu-events.c $(V) $(Q)$(call echo-cmd,gen)$(JEVENTS) $(SRCARCH) pmu-events/arch $(OUTPUT)pmu-events/pmu-events.c $(V)
...@@ -17,5 +17,26 @@ ...@@ -17,5 +17,26 @@
"CounterMask": "0", "CounterMask": "0",
"Invert": "0", "Invert": "0",
"EdgeDetect": "0" "EdgeDetect": "0"
} },
{
"EventCode": "0x7",
"EventName": "uncore_hisi_l3c.rd_hit_cpipe",
"BriefDescription": "Total read hits",
"PublicDescription": "Total read hits",
"Unit": "hisi_sccl,l3c"
},
{
"EventCode": "0x12",
"EventName": "uncore_imc_free_running.cache_miss",
"BriefDescription": "Total cache misses",
"PublicDescription": "Total cache misses",
"Unit": "imc_free_running"
},
{
"EventCode": "0x34",
"EventName": "uncore_imc.cache_hits",
"BriefDescription": "Total cache hits",
"PublicDescription": "Total cache hits",
"Unit": "imc"
},
] ]
[
{
"BriefDescription": "ddr write-cycles event",
"EventCode": "0x2b",
"EventName": "sys_ddr_pmu.write_cycles",
"Unit": "sys_ddr_pmu",
"Compat": "v8"
},
]
[ [
{ {
"BriefDescription": "Number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT14 RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7", "CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0xC7", "EventCode": "0xC7",
"EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE",
"SampleAfterValue": "2000003", "SampleAfterValue": "2000003",
"UMask": "0x2" "UMask": "0x4"
}, },
{ {
"BriefDescription": "Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
...@@ -18,13 +18,13 @@ ...@@ -18,13 +18,13 @@
"UMask": "0x8" "UMask": "0x8"
}, },
{ {
"BriefDescription": "Number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7", "CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0xC7", "EventCode": "0xC7",
"EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE",
"SampleAfterValue": "2000003", "SampleAfterValue": "2000003",
"UMask": "0x40" "UMask": "0x10"
}, },
{ {
"BriefDescription": "Number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
...@@ -36,13 +36,13 @@ ...@@ -36,13 +36,13 @@
"UMask": "0x20" "UMask": "0x20"
}, },
{ {
"BriefDescription": "Number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7", "CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0xC7", "EventCode": "0xC7",
"EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE",
"SampleAfterValue": "2000003", "SampleAfterValue": "2000003",
"UMask": "0x1" "UMask": "0x40"
}, },
{ {
"BriefDescription": "Number of SSE/AVX computational 512-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 16 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Number of SSE/AVX computational 512-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 16 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
...@@ -54,32 +54,32 @@ ...@@ -54,32 +54,32 @@
"UMask": "0x80" "UMask": "0x80"
}, },
{ {
"BriefDescription": "Cycles with any input/output SSE or FP assist", "BriefDescription": "Number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7", "CounterHTOff": "0,1,2,3,4,5,6,7",
"CounterMask": "1", "EventCode": "0xC7",
"EventCode": "0xCA", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
"EventName": "FP_ASSIST.ANY", "SampleAfterValue": "2000003",
"PublicDescription": "Counts cycles with any input and output SSE or x87 FP assist. If an input and output assist are detected on the same cycle the event increments by 1.", "UMask": "0x1"
"SampleAfterValue": "100003",
"UMask": "0x1e"
}, },
{ {
"BriefDescription": "Number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT14 RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7", "CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0xC7", "EventCode": "0xC7",
"EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
"SampleAfterValue": "2000003", "SampleAfterValue": "2000003",
"UMask": "0x4" "UMask": "0x2"
}, },
{ {
"BriefDescription": "Number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", "BriefDescription": "Cycles with any input/output SSE or FP assist",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7", "CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0xC7", "CounterMask": "1",
"EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", "EventCode": "0xCA",
"SampleAfterValue": "2000003", "EventName": "FP_ASSIST.ANY",
"UMask": "0x10" "PublicDescription": "Counts cycles with any input and output SSE or x87 FP assist. If an input and output assist are detected on the same cycle the event increments by 1.",
"SampleAfterValue": "100003",
"UMask": "0x1e"
} }
] ]
\ No newline at end of file
...@@ -64,15 +64,6 @@ ...@@ -64,15 +64,6 @@
"UMask": "0x4", "UMask": "0x4",
"Unit": "iMC" "Unit": "iMC"
}, },
{
"BriefDescription": "Pre-charge for writes",
"Counter": "0,1,2,3",
"EventCode": "0x2",
"EventName": "UNC_M_PRE_COUNT.WR",
"PerPkg": "1",
"UMask": "0x8",
"Unit": "iMC"
},
{ {
"BriefDescription": "Write requests allocated in the PMM Write Pending Queue for Intel Optane DC persistent memory", "BriefDescription": "Write requests allocated in the PMM Write Pending Queue for Intel Optane DC persistent memory",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
...@@ -90,32 +81,32 @@ ...@@ -90,32 +81,32 @@
"Unit": "iMC" "Unit": "iMC"
}, },
{ {
"BriefDescription": "Intel Optane DC persistent memory bandwidth read (MB). Derived from unc_m_pmm_rpq_inserts", "BriefDescription": "Intel Optane DC persistent memory bandwidth read (MB/sec). Derived from unc_m_pmm_rpq_inserts",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"EventCode": "0xE3", "EventCode": "0xE3",
"EventName": "UNC_M_PMM_BANDWIDTH.READ", "EventName": "UNC_M_PMM_BANDWIDTH.READ",
"PerPkg": "1", "PerPkg": "1",
"ScaleUnit": "6.103515625E-5MB", "ScaleUnit": "6.103515625E-5MB/sec",
"Unit": "iMC" "Unit": "iMC"
}, },
{ {
"BriefDescription": "Intel Optane DC persistent memory bandwidth write (MB). Derived from unc_m_pmm_wpq_inserts", "BriefDescription": "Intel Optane DC persistent memory bandwidth write (MB/sec). Derived from unc_m_pmm_wpq_inserts",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"EventCode": "0xE7", "EventCode": "0xE7",
"EventName": "UNC_M_PMM_BANDWIDTH.WRITE", "EventName": "UNC_M_PMM_BANDWIDTH.WRITE",
"PerPkg": "1", "PerPkg": "1",
"ScaleUnit": "6.103515625E-5MB", "ScaleUnit": "6.103515625E-5MB/sec",
"Unit": "iMC" "Unit": "iMC"
}, },
{ {
"BriefDescription": "Intel Optane DC persistent memory bandwidth total (MB). Derived from unc_m_pmm_rpq_inserts", "BriefDescription": "Intel Optane DC persistent memory bandwidth total (MB/sec). Derived from unc_m_pmm_rpq_inserts",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
"EventCode": "0xE3", "EventCode": "0xE3",
"EventName": "UNC_M_PMM_BANDWIDTH.TOTAL", "EventName": "UNC_M_PMM_BANDWIDTH.TOTAL",
"MetricExpr": "UNC_M_PMM_RPQ_INSERTS + UNC_M_PMM_WPQ_INSERTS", "MetricExpr": "UNC_M_PMM_RPQ_INSERTS + UNC_M_PMM_WPQ_INSERTS",
"MetricName": "UNC_M_PMM_BANDWIDTH.TOTAL", "MetricName": "UNC_M_PMM_BANDWIDTH.TOTAL",
"PerPkg": "1", "PerPkg": "1",
"ScaleUnit": "6.103515625E-5MB", "ScaleUnit": "6.103515625E-5MB/sec",
"Unit": "iMC" "Unit": "iMC"
}, },
{ {
......
...@@ -103,15 +103,6 @@ ...@@ -103,15 +103,6 @@
"UMask": "0x04", "UMask": "0x04",
"Unit": "CHA" "Unit": "CHA"
}, },
{
"BriefDescription": "write requests from remote home agent",
"Counter": "0,1,2,3",
"EventCode": "0x50",
"EventName": "UNC_CHA_REQUESTS.WRITES_REMOTE",
"PerPkg": "1",
"UMask": "0x08",
"Unit": "CHA"
},
{ {
"BriefDescription": "UPI interconnect send bandwidth for payload. Derived from unc_upi_txl_flits.all_data", "BriefDescription": "UPI interconnect send bandwidth for payload. Derived from unc_upi_txl_flits.all_data",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
...@@ -544,7 +535,7 @@ ...@@ -544,7 +535,7 @@
"EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD", "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD",
"Filter": "config1=0x40433", "Filter": "config1=0x40433",
"PerPkg": "1", "PerPkg": "1",
"PublicDescription": "TOR Inserts : DRds issued by iA Cores that Missed the LLC : Counts the number of entries successfully inserted into the TOR that match qualifications specified by the subevent. Does not include addressless requests such as locks and interrupts.", "PublicDescription": "TOR Inserts : DRds issued by iA Cores that Missed the LLC : Counts the number of entries successfuly inserted into the TOR that match qualifications specified by the subevent. Does not include addressless requests such as locks and interrupts.",
"UMask": "0x21", "UMask": "0x21",
"Unit": "CHA" "Unit": "CHA"
}, },
...@@ -567,6 +558,98 @@ ...@@ -567,6 +558,98 @@
"PublicDescription": "Counts clockticks of the 1GHz trafiic controller clock in the IIO unit.", "PublicDescription": "Counts clockticks of the 1GHz trafiic controller clock in the IIO unit.",
"Unit": "IIO" "Unit": "IIO"
}, },
{
"BriefDescription": "PCIe Completion Buffer Inserts of completions with data: Part 0",
"Counter": "0,1,2,3",
"EventCode": "0xC2",
"EventName": "UNC_IIO_COMP_BUF_INSERTS.CMPD.PART0",
"FCMask": "0x4",
"PerPkg": "1",
"PortMask": "0x01",
"PublicDescription": "PCIe Completion Buffer Inserts of completions with data: Part 0",
"UMask": "0x03",
"Unit": "IIO"
},
{
"BriefDescription": "PCIe Completion Buffer Inserts of completions with data: Part 1",
"Counter": "0,1,2,3",
"EventCode": "0xC2",
"EventName": "UNC_IIO_COMP_BUF_INSERTS.CMPD.PART1",
"FCMask": "0x4",
"PerPkg": "1",
"PortMask": "0x02",
"PublicDescription": "PCIe Completion Buffer Inserts of completions with data: Part 1",
"UMask": "0x03",
"Unit": "IIO"
},
{
"BriefDescription": "PCIe Completion Buffer Inserts of completions with data: Part 2",
"Counter": "0,1,2,3",
"EventCode": "0xC2",
"EventName": "UNC_IIO_COMP_BUF_INSERTS.CMPD.PART2",
"FCMask": "0x4",
"PerPkg": "1",
"PortMask": "0x04",
"PublicDescription": "PCIe Completion Buffer Inserts of completions with data: Part 2",
"UMask": "0x03",
"Unit": "IIO"
},
{
"BriefDescription": "PCIe Completion Buffer Inserts of completions with data: Part 3",
"Counter": "0,1,2,3",
"EventCode": "0xC2",
"EventName": "UNC_IIO_COMP_BUF_INSERTS.CMPD.PART3",
"FCMask": "0x4",
"PerPkg": "1",
"PortMask": "0x08",
"PublicDescription": "PCIe Completion Buffer Inserts of completions with data: Part 3",
"UMask": "0x03",
"Unit": "IIO"
},
{
"BriefDescription": "PCIe Completion Buffer occupancy of completions with data: Part 0",
"Counter": "2,3",
"EventCode": "0xD5",
"EventName": "UNC_IIO_COMP_BUF_OCCUPANCY.CMPD.PART0",
"FCMask": "0x04",
"PerPkg": "1",
"PublicDescription": "PCIe Completion Buffer occupancy of completions with data: Part 0",
"UMask": "0x01",
"Unit": "IIO"
},
{
"BriefDescription": "PCIe Completion Buffer occupancy of completions with data: Part 1",
"Counter": "2,3",
"EventCode": "0xD5",
"EventName": "UNC_IIO_COMP_BUF_OCCUPANCY.CMPD.PART1",
"FCMask": "0x04",
"PerPkg": "1",
"PublicDescription": "PCIe Completion Buffer occupancy of completions with data: Part 1",
"UMask": "0x02",
"Unit": "IIO"
},
{
"BriefDescription": "PCIe Completion Buffer occupancy of completions with data: Part 2",
"Counter": "2,3",
"EventCode": "0xD5",
"EventName": "UNC_IIO_COMP_BUF_OCCUPANCY.CMPD.PART2",
"FCMask": "0x04",
"PerPkg": "1",
"PublicDescription": "PCIe Completion Buffer occupancy of completions with data: Part 2",
"UMask": "0x04",
"Unit": "IIO"
},
{
"BriefDescription": "PCIe Completion Buffer occupancy of completions with data: Part 3",
"Counter": "2,3",
"EventCode": "0xD5",
"EventName": "UNC_IIO_COMP_BUF_OCCUPANCY.CMPD.PART3",
"FCMask": "0x04",
"PerPkg": "1",
"PublicDescription": "PCIe Completion Buffer occupancy of completions with data: Part 3",
"UMask": "0x08",
"Unit": "IIO"
},
{ {
"BriefDescription": "Read request for 4 bytes made by the CPU to IIO Part0", "BriefDescription": "Read request for 4 bytes made by the CPU to IIO Part0",
"Counter": "2,3", "Counter": "2,3",
...@@ -1239,6 +1322,64 @@ ...@@ -1239,6 +1322,64 @@
"UMask": "0x02", "UMask": "0x02",
"Unit": "IIO" "Unit": "IIO"
}, },
{
"BriefDescription": "Total IRP occupancy of inbound read and write requests.",
"Counter": "0,1",
"EventCode": "0xF",
"EventName": "UNC_I_CACHE_TOTAL_OCCUPANCY.MEM",
"PerPkg": "1",
"PublicDescription": "Total IRP occupancy of inbound read and write requests. This is effectively the sum of read occupancy and write occupancy.",
"UMask": "0x4",
"Unit": "IRP"
},
{
"BriefDescription": "PCIITOM request issued by the IRP unit to the mesh with the intention of writing a full cacheline.",
"Counter": "0,1",
"EventCode": "0x10",
"EventName": "UNC_I_COHERENT_OPS.PCITOM",
"PerPkg": "1",
"PublicDescription": "PCIITOM request issued by the IRP unit to the mesh with the intention of writing a full cacheline to coherent memory, without a RFO. PCIITOM is a speculative Invalidate to Modified command that requests ownership of the cacheline and does not move data from the mesh to IRP cache.",
"UMask": "0x10",
"Unit": "IRP"
},
{
"BriefDescription": "RFO request issued by the IRP unit to the mesh with the intention of writing a partial cacheline.",
"Counter": "0,1",
"EventCode": "0x10",
"EventName": "UNC_I_COHERENT_OPS.RFO",
"PerPkg": "1",
"PublicDescription": "RFO request issued by the IRP unit to the mesh with the intention of writing a partial cacheline to coherent memory. RFO is a Read For Ownership command that requests ownership of the cacheline and moves data from the mesh to IRP cache.",
"UMask": "0x8",
"Unit": "IRP"
},
{
"BriefDescription": "Inbound read requests received by the IRP and inserted into the FAF queue.",
"Counter": "0,1",
"EventCode": "0x18",
"EventName": "UNC_I_FAF_INSERTS",
"PerPkg": "1",
"PublicDescription": "Inbound read requests to coherent memory, received by the IRP and inserted into the Fire and Forget queue (FAF), a queue used for processing inbound reads in the IRP.",
"Unit": "IRP"
},
{
"BriefDescription": "Occupancy of the IRP FAF queue.",
"Counter": "0,1",
"EventCode": "0x19",
"EventName": "UNC_I_FAF_OCCUPANCY",
"PerPkg": "1",
"PublicDescription": "Occupancy of the IRP Fire and Forget (FAF) queue, a queue used for processing inbound reads in the IRP.",
"Unit": "IRP"
},
{
"BriefDescription": "Inbound write (fast path) requests received by the IRP.",
"Counter": "0,1",
"EventCode": "0x11",
"EventName": "UNC_I_TRANSACTIONS.WR_PREF",
"PerPkg": "1",
"PublicDescription": "Inbound write (fast path) requests to coherent memory, received by the IRP resulting in write ownership requests issued by IRP to the mesh.",
"UMask": "0x8",
"Unit": "IRP"
},
{ {
"BriefDescription": "Traffic in which the M2M to iMC Bypass was not taken", "BriefDescription": "Traffic in which the M2M to iMC Bypass was not taken",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
......
[
{
"BriefDescription": "Counts the number of first level data cacheline (dirty) evictions caused by misses, stores, and prefetches.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x51",
"EventName": "DL1.DIRTY_EVICTION",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of first level data cacheline (dirty) evictions caused by misses, stores, and prefetches. Does not count evictions or dirty writebacks caused by snoops. Does not count a replacement unless a (dirty) line was written back.",
"SampleAfterValue": "200003",
"UMask": "0x1"
},
{
"BriefDescription": "Counts the number of cacheable memory requests that miss in the LLC. Counts on a per core basis.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x2e",
"EventName": "LONGEST_LAT_CACHE.MISS",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of cacheable memory requests that miss in the Last Level Cache (LLC). If the platform has an L3 cache, the LLC is the L3 cache, otherwise it is the L2 cache. Counts on a per core basis.",
"SampleAfterValue": "200003",
"UMask": "0x41"
},
{
"BriefDescription": "Counts the number of cacheable memory requests that access the LLC. Counts on a per core basis.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x2e",
"EventName": "LONGEST_LAT_CACHE.REFERENCE",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of cacheable memory requests that access the Last Level Cache (LLC). Requests include demand loads, reads for ownership (RFO), instruction fetches and L1 HW prefetches. If the platform has an L3 cache, the LLC is the L3 cache, otherwise it is the L2 cache. Counts on a per core basis.",
"SampleAfterValue": "200003",
"UMask": "0x4f"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in DRAM or MMIO (Non-DRAM).",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x34",
"EventName": "MEM_BOUND_STALLS.IFETCH_DRAM_HIT",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of cycles a core is stalled due to an instruction cache or translation lookaside buffer (TLB) access which hit in DRAM or MMIO (non-DRAM).",
"SampleAfterValue": "200003",
"UMask": "0x20"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in the L2 cache.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x34",
"EventName": "MEM_BOUND_STALLS.IFETCH_L2_HIT",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of cycles a core is stalled due to an instruction cache or Translation Lookaside Buffer (TLB) access which hit in the L2 cache.",
"SampleAfterValue": "200003",
"UMask": "0x8"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to an instruction cache or TLB miss which hit in the LLC or other core with HITE/F/M.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x34",
"EventName": "MEM_BOUND_STALLS.IFETCH_LLC_HIT",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of cycles a core is stalled due to an instruction cache or Translation Lookaside Buffer (TLB) access which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
"SampleAfterValue": "200003",
"UMask": "0x10"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x34",
"EventName": "MEM_BOUND_STALLS.LOAD_DRAM_HIT",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to a demand load which hit in the L2 cache.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x34",
"EventName": "MEM_BOUND_STALLS.LOAD_L2_HIT",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 cache.",
"SampleAfterValue": "200003",
"UMask": "0x1"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to a demand load which hit in the LLC or other core with HITE/F/M.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x34",
"EventName": "MEM_BOUND_STALLS.LOAD_LLC_HIT",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
"SampleAfterValue": "200003",
"UMask": "0x2"
},
{
"BriefDescription": "Counts the number of cycles a core is stalled due to a store buffer being full.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x34",
"EventName": "MEM_BOUND_STALLS.STORE_BUFFER_FULL",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x40"
},
{
"BriefDescription": "Counts the number of load ops retired that hit in DRAM.",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd1",
"EventName": "MEM_LOAD_UOPS_RETIRED.DRAM_HIT",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x80"
},
{
"BriefDescription": "Counts the number of load uops retired that hit in the L1 data cache.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd1",
"EventName": "MEM_LOAD_UOPS_RETIRED.L1_HIT",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x1"
},
{
"BriefDescription": "Counts the number of load uops retired that miss in the L1 data cache.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd1",
"EventName": "MEM_LOAD_UOPS_RETIRED.L1_MISS",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x8"
},
{
"BriefDescription": "Counts the number of load uops retired that hit in the L2 cache.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd1",
"EventName": "MEM_LOAD_UOPS_RETIRED.L2_HIT",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x2"
},
{
"BriefDescription": "Counts the number of load uops retired that miss in the L2 cache.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd1",
"EventName": "MEM_LOAD_UOPS_RETIRED.L2_MISS",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x10"
},
{
"BriefDescription": "Counts the number of load uops retired that hit in the L3 cache.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0xd1",
"EventName": "MEM_LOAD_UOPS_RETIRED.L3_HIT",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Counts the number of load uops retired.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd0",
"EventName": "MEM_UOPS_RETIRED.ALL_LOADS",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the total number of load uops retired.",
"SampleAfterValue": "200003",
"UMask": "0x81"
},
{
"BriefDescription": "Counts the number of store uops retired.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd0",
"EventName": "MEM_UOPS_RETIRED.ALL_STORES",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"PublicDescription": "Counts the total number of store uops retired.",
"SampleAfterValue": "200003",
"UMask": "0x82"
},
{
"BriefDescription": "Counts the number of issue slots every cycle that were not delivered by the frontend due to instruction cache misses.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0x71",
"EventName": "TOPDOWN_FE_BOUND.ICACHE",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "1000003",
"UMask": "0x20"
}
]
\ No newline at end of file
[
{
"MetricExpr": "INST_RETIRED.ANY / cycles",
"BriefDescription": "Instructions Per Cycle (per Logical Processor)",
"MetricName": "IPC"
},
{
"MetricExpr": "1 / IPC",
"BriefDescription": "Cycles Per Instruction (per Logical Processor)",
"MetricName": "CPI"
},
{
"MetricExpr": "cycles",
"BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
"MetricName": "CLKS"
},
{
"MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
"MetricName": "IpMispredict"
},
{
"MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
"BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
"MetricName": "IpBranch"
},
{
"MetricExpr": "INST_RETIRED.ANY",
"BriefDescription": "Total number of retired Instructions",
"MetricName": "Instructions"
},
{
"MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 ",
"BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
"MetricName": "L3_Cache_Fill_BW"
},
{
"MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
"BriefDescription": "Average CPU Utilization",
"MetricName": "CPU_Utilization"
},
{
"MetricExpr": "(cycles / CPU_CLK_UNHALTED.REF_TSC) * msr@tsc@ / 1000000000 ",
"BriefDescription": "Measured Average Frequency for unhalted processors [GHz]",
"MetricName": "Average_Frequency"
},
{
"MetricExpr": "cycles / CPU_CLK_UNHALTED.REF_TSC",
"BriefDescription": "Average Frequency Utilization relative nominal frequency",
"MetricName": "Turbo_Utilization"
},
{
"MetricExpr": "cycles:k / cycles",
"BriefDescription": "Fraction of cycles spent in the Operating System (OS) Kernel mode",
"MetricName": "Kernel_Utilization"
}
]
[
{
"BriefDescription": "Counts the number of cycles the floating point divider is busy. Does not imply a stall waiting for the divider.",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0xcd",
"EventName": "CYCLES_DIV_BUSY.FPDIV",
"PDIR_COUNTER": "na",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "200003",
"UMask": "0x2"
},
{
"BriefDescription": "Counts the number of floating point divide uops retired (x87 and SSE, including x87 sqrt).",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0xc2",
"EventName": "UOPS_RETIRED.FPDIV",
"PEBS": "1",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "2000003",
"UMask": "0x8"
}
]
\ No newline at end of file
此差异已折叠。
此差异已折叠。
...@@ -36,11 +36,12 @@ GenuineIntel-6-55-[01234],v1,skylakex,core ...@@ -36,11 +36,12 @@ GenuineIntel-6-55-[01234],v1,skylakex,core
GenuineIntel-6-55-[56789ABCDEF],v1,cascadelakex,core GenuineIntel-6-55-[56789ABCDEF],v1,cascadelakex,core
GenuineIntel-6-7D,v1,icelake,core GenuineIntel-6-7D,v1,icelake,core
GenuineIntel-6-7E,v1,icelake,core GenuineIntel-6-7E,v1,icelake,core
GenuineIntel-6-8[CD],v1,icelake,core GenuineIntel-6-8[CD],v1,tigerlake,core
GenuineIntel-6-A7,v1,icelake,core GenuineIntel-6-A7,v1,icelake,core
GenuineIntel-6-6A,v1,icelakex,core GenuineIntel-6-6A,v1,icelakex,core
GenuineIntel-6-6C,v1,icelakex,core GenuineIntel-6-6C,v1,icelakex,core
GenuineIntel-6-86,v1,tremontx,core GenuineIntel-6-86,v1,tremontx,core
GenuineIntel-6-96,v1,elkhartlake,core
AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
AuthenticAMD-25-[[:xdigit:]]+,v1,amdzen3,core AuthenticAMD-25-[[:xdigit:]]+,v1,amdzen3,core
...@@ -64,15 +64,6 @@ ...@@ -64,15 +64,6 @@
"UMask": "0x4", "UMask": "0x4",
"Unit": "iMC" "Unit": "iMC"
}, },
{
"BriefDescription": "Pre-charge for writes",
"Counter": "0,1,2,3",
"EventCode": "0x2",
"EventName": "UNC_M_PRE_COUNT.WR",
"PerPkg": "1",
"UMask": "0x8",
"Unit": "iMC"
},
{ {
"BriefDescription": "DRAM Page Activate commands sent due to a write request", "BriefDescription": "DRAM Page Activate commands sent due to a write request",
"Counter": "0,1,2,3", "Counter": "0,1,2,3",
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册