1. 02 4月, 2021 1 次提交
    • K
      perf/x86/intel/uncore: Parse uncore discovery tables · edae1f06
      Kan Liang 提交于
      A self-describing mechanism for the uncore PerfMon hardware has been
      introduced with the latest Intel platforms. By reading through an MMIO
      page worth of information, perf can 'discover' all the standard uncore
      PerfMon registers in a machine.
      
      The discovery mechanism relies on BIOS's support. With a proper BIOS,
      a PCI device with the unique capability ID 0x23 can be found on each
      die. Perf can retrieve the information of all available uncore PerfMons
      from the device via MMIO. The information is composed of one global
      discovery table and several unit discovery tables.
      - The global discovery table includes global uncore information of the
        die, e.g., the address of the global control register, the offset of
        the global status register, the number of uncore units, the offset of
        unit discovery tables, etc.
      - The unit discovery table includes generic uncore unit information,
        e.g., the access type, the counter width, the address of counters,
        the address of the counter control, the unit ID, the unit type, etc.
        The unit is also called "box" in the code.
      Perf can provide basic uncore support based on this information
      with the following patches.
      
      To locate the PCI device with the discovery tables, check the generic
      PCI ID first. If it doesn't match, go through the entire PCI device tree
      and locate the device with the unique capability ID.
      
      The uncore information is similar among dies. To save parsing time and
      space, only completely parse and store the discovery tables on the first
      die and the first box of each die. The parsed information is stored in
      an
      RB tree structure, intel_uncore_discovery_type. The size of the stored
      discovery tables varies among platforms. It's around 4KB for a Sapphire
      Rapids server.
      
      If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
      will exit with -ENODEV. There is nothing changed.
      
      Add a module parameter to disable the discovery feature. If a BIOS gets
      the discovery tables wrong, users can have an option to disable the
      feature. For the current patchset, the uncore driver will exit with
      -ENODEV. In the future, it may fall back to the hardcode uncore driver
      on a known platform.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1616003977-90612-2-git-send-email-kan.liang@linux.intel.com
      edae1f06
  2. 14 1月, 2021 1 次提交
  3. 17 11月, 2020 1 次提交
  4. 29 10月, 2020 1 次提交
  5. 29 9月, 2020 1 次提交
  6. 24 9月, 2020 5 次提交
  7. 15 6月, 2020 4 次提交
  8. 08 4月, 2020 1 次提交
    • K
      perf/x86/intel/uncore: Add Ice Lake server uncore support · 2b3b76b5
      Kan Liang 提交于
      The uncore subsystem in Ice Lake server is similar to previous server.
      There are some differences in config register encoding and pci device
      IDs. The uncore PMON units in Ice Lake server include Ubox, Chabox, IIO,
      IRP, M2PCIE, PCU, M2M, PCIE3 and IMC.
      
       - For CHA, filter 1 register has been removed. The filter 0 register can
         be used by and of CHA events to be filterd by Thread/Core-ID. To do
         so, the control register's tid_en bit must be set to 1.
       - For IIO, there are some changes on event constraints. The MSR address
         and MSR offsets among counters are also changed.
       - For IRP, the MSR address and MSR offsets among counters are changed.
       - For M2PCIE, the counters are accessed by MSR now. Add new MSR address
         and MSR offsets. Change event constraints.
       - To determine the number of CHAs, have to read CAPID6(Low) and CAPID7
         (High) now.
       - For M2M, update the PCICFG address and Device ID.
       - For UPI, update the PCICFG address, Device ID and counter address.
       - For M3UPI, update the PCICFG address, Device ID, counter address and
         event constraints.
       - For IMC, update the formular to calculate MMIO BAR address, which is
         MMIO_BASE + specific MEM_BAR offset.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/1585842411-150452-1-git-send-email-kan.liang@linux.intel.com
      2b3b76b5
  9. 25 3月, 2020 1 次提交
  10. 11 2月, 2020 1 次提交
  11. 28 10月, 2019 1 次提交
    • K
      perf/x86/uncore: Fix event group support · 75be6f70
      Kan Liang 提交于
      The events in the same group don't start or stop simultaneously.
      Here is the ftrace when enabling event group for uncore_iio_0:
      
        # perf stat -e "{uncore_iio_0/event=0x1/,uncore_iio_0/event=0xe/}"
      
                  <idle>-0     [000] d.h.  8959.064832: read_msr: a41, value
        b2b0b030		//Read counter reg of IIO unit0 counter0
                  <idle>-0     [000] d.h.  8959.064835: write_msr: a48, value
        400001			//Write Ctrl reg of IIO unit0 counter0 to enable
        counter0. <------ Although counter0 is enabled, Unit Ctrl is still
        freezed. Nothing will count. We are still good here.
                  <idle>-0     [000] d.h.  8959.064836: read_msr: a40, value
        30100                   //Read Unit Ctrl reg of IIO unit0
                  <idle>-0     [000] d.h.  8959.064838: write_msr: a40, value
        30000			//Write Unit Ctrl reg of IIO unit0 to enable all
        counters in the unit by clear Freeze bit  <------Unit0 is un-freezed.
        Counter0 has been enabled. Now it starts counting. But counter1 has not
        been enabled yet. The issue starts here.
                  <idle>-0     [000] d.h.  8959.064846: read_msr: a42, value 0
      			//Read counter reg of IIO unit0 counter1
                  <idle>-0     [000] d.h.  8959.064847: write_msr: a49, value
        40000e			//Write Ctrl reg of IIO unit0 counter1 to enable
        counter1.   <------ Now, counter1 just starts to count. Counter0 has
        been running for a while.
      
      Current code un-freezes the Unit Ctrl right after the first counter is
      enabled. The subsequent group events always loses some counter values.
      
      Implement pmu_enable and pmu_disable support for uncore, which can help
      to batch hardware accesses.
      
      No one uses uncore_enable_box and uncore_disable_box. Remove them.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-drivers-review@eclists.intel.com
      Cc: linux-perf@eclists.intel.com
      Fixes: 087bfbb0 ("perf/x86: Add generic Intel uncore PMU support")
      Link: https://lkml.kernel.org/r/1572014593-31591-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75be6f70
  12. 28 8月, 2019 4 次提交
  13. 17 6月, 2019 6 次提交
  14. 14 6月, 2019 1 次提交
  15. 23 5月, 2019 2 次提交
  16. 21 5月, 2019 1 次提交
  17. 16 4月, 2019 1 次提交
  18. 09 3月, 2019 1 次提交
    • K
      perf/x86/intel/uncore: Fix client IMC events return huge result · 8041ffd3
      Kan Liang 提交于
      The client IMC bandwidth events currently return very large values:
      
        $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a
      
        10.000117222 34,788.76 MiB uncore_imc/data_reads/
        10.000117222 8.26 MiB uncore_imc/data_writes/
        20.000374584 34,842.89 MiB uncore_imc/data_reads/
        20.000374584 10.45 MiB uncore_imc/data_writes/
        30.000633299 37,965.29 MiB uncore_imc/data_reads/
        30.000633299 323.62 MiB uncore_imc/data_writes/
        40.000891548 41,012.88 MiB uncore_imc/data_reads/
        40.000891548 6.98 MiB uncore_imc/data_writes/
        50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
        50.001142480 6.97 MiB uncore_imc/data_writes/
      
      The client IMC events are freerunning counters. They still use the
      old event encoding format (0x1 for data_read and 0x2 for data write).
      The counter bit width is calculated by common code, which assume that
      the standard encoding format is used for the freerunning counters.
      Error bit width information is calculated.
      
      The patch intends to convert the old client IMC event encoding to the
      standard encoding format.
      
      Current common code uses event->attr.config which directly copy from
      user space. We should not implicitly modify it for a converted event.
      The event->hw.config is used to replace the event->attr.config in
      common code.
      
      For client IMC events, the event->attr.config is used to calculate a
      converted event with standard encoding format in the custom
      event_init(). The converted event is stored in event->hw.config.
      For other events of freerunning counters, they already use the standard
      encoding format. The same value as event->attr.config is assigned to
      event->hw.config in common event_init().
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@kernel.org # v4.18+
      Fixes: 9aae1780 ("perf/x86/intel/uncore: Clean up client IMC uncore")
      Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8041ffd3
  19. 21 1月, 2019 1 次提交
    • A
      perf/core, arch/x86: Strengthen exclusion checks with PERF_PMU_CAP_NO_EXCLUDE · 88dbe3c9
      Andrew Murray 提交于
      For x86 PMUs that do not support context exclusion let's advertise the
      PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
      prevent us from handling events where any exclusion flags are set.
      Let's also remove the now unnecessary check for exclusion flags.
      
      This change means that amd/iommu and amd/uncore will now also
      indicate that they do not support exclude_{hv|idle} and intel/uncore
      that it does not support exclude_{guest|host}.
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sascha Hauer <s.hauer@pengutronix.de>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: robin.murphy@arm.com
      Cc: suzuki.poulose@arm.com
      Link: https://lkml.kernel.org/r/1547128414-50693-12-git-send-email-andrew.murray@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      88dbe3c9
  20. 13 6月, 2018 2 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
    • M
      Convert intel uncore to struct_size · 6566f907
      Matthew Wilcox 提交于
      Need to do a bit of rearranging to make this work.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6566f907
  21. 31 5月, 2018 3 次提交