1. 12 8月, 2022 21 次提交
    • L
      perf c2c: Sort on peer snooping for load operations · f37c5d91
      Leo Yan 提交于
      This patch adds a new option 'peer' so can sort on the cache hit for
      peer snooping.
      
      For displaying with option 'peer', the "Shared Data Cache Line Table"
      and "Shared Cache Line Distribution Pareto" both sort with the metrics
      "tot_peer".
      
      As result, we can get the 'peer' display:
      
        # perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio
      
        =================================================
                   Shared Data Cache Line Table
        =================================================
        #
        #        ----------- Cacheline ----------     Peer  ------- Load Peer -------    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
        # Index             Address  Node  PA cnt    Snoop    Total    Local   Remote  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
        # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
        #
              0      0xaaaac17d6000   N/A       0  100.00%       99       99        0    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0
      
        =================================================
              Shared Cache Line Distribution Pareto
        =================================================
        #
        #        -- Peer Snoop --  ------- Store Refs ------  --------- Data address ---------                                                  ---------- cycles ----------    Total       cpu                                    Shared
        #   Num      Rmt      Lcl   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt      Pid                Tid        Code address  rmt peer  lcl peer      load  records       cnt                  Symbol            Object      Source:Line  Node{cpus %peers %stores}
        # .....  .......  .......  .......  .......  .......  ..................  ....  ......  .......  .................  ..................  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
        #
          ----------------------------------------------------------------------
              0        0       99        0        0        0      0xaaaac17d6000
          ----------------------------------------------------------------------
                   0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3603:memstress      0xaaaac17c25ac         0       376        41     9314         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 2 100.0%    n/a}
                   0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3606:memstress      0xaaaac17c25ac         0       375        44     9155         1  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 1 100.0%    n/a}
                   0.00%   48.48%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3606:memstress      0xaaaac17c3e88         0       180       170       65         1  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 1 100.0%    n/a}
                   0.00%   45.45%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3603:memstress      0xaaaac17c3e88         0       180       175       70         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 2 100.0%    n/a}
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-14-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f37c5d91
    • L
      perf c2c: Refactor display string · faa30dfa
      Leo Yan 提交于
      The display type is shown by combination the display string array and a
      suffix string "HITMs", which is not friendly to extend display for other
      sorting type (e.g. extension for peer operations).
      
      This patch moves the suffix string "HITMs" into display string array for
      HITM types, so it can allow us to not necessarily to output string
      "HITMs" for new incoming display type.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-13-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      faa30dfa
    • L
      perf c2c: Refactor node header · 7c10b65a
      Leo Yan 提交于
      The node header array contains 3 items, each item is used for one of
      the 3 flavors for node accessing info.  To extend sorting on other
      snooping type and not always stick to HITMs, the second header string
      "Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
      "Node{cpus %peer %stores}").
      
      For this reason, this patch changes the node header array to three
      flat variables and uses switch-case in function setup_nodes_header(),
      thus it is easier for altering the header string.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-12-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7c10b65a
    • L
      perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' · 2be0bc75
      Leo Yan 提交于
      Use more general naming for the main sort dimension, this can allow us
      not to sort only on HITM snoop type, so it can be extended to support
      other costly snooping operations.  So rename the dimension to the prefix
      'percent_costly_".
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-11-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2be0bc75
    • L
      perf c2c: Use explicit names for display macros · c82ccc3a
      Leo Yan 提交于
      Perf c2c tool has an assumption that it heavily depends on HITM snoop
      type to detect cache false sharing, unfortunately, HITM is not supported
      on some architectures.
      
      Essentially, perf c2c tool wants to find some very costly snooping
      operations for false cache sharing, this means it's not necessarily
      to stick using HITM tags and we can explore other snooping types
      (e.g. SNOOPX_PEER).
      
      For this reason, this patch renames HITM related display macros with
      suffix '_HITM', so it can be distinct if later add more display types
      for on other snooping type.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-10-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c82ccc3a
    • L
      perf c2c: Add mean dimensions for peer operations · 682352e5
      Leo Yan 提交于
      This patch adds two dimensions for the mean value of peer operations.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-9-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      682352e5
    • L
      perf c2c: Add dimensions of peer metrics for cache line view · 9082282f
      Leo Yan 提交于
      This patch adds dimensions of peer ops, which will be used for Shared
      cache line distribution pareto.
      
      It adds the percentage dimensions for local and remote peer operations,
      and the dimensions for accounting operation numbers which is used for
      stdio mode.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-8-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9082282f
    • L
      perf c2c: Add dimensions for peer load operations · 63e74ab5
      Leo Yan 提交于
      This patch adds three dimensions for peer load operations of 'lcl_peer',
      'rmt_peer' and 'tot_peer'.  These three dimensions will be used in the
      shared data cache line table.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-7-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63e74ab5
    • L
      perf c2c: Output statistics for peer snooping · 3ef1fc17
      Leo Yan 提交于
      This patch outputs statistics for peer snooping for whole trace events
      and global shared cache line.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-6-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ef1fc17
    • L
      perf mem: Add statistics for peer snooping · e843dec5
      Leo Yan 提交于
      Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
      from peer cache line, it can come from a peer core, a peer cluster, or
      a remote NUMA node.
      
      This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER.  Note, we
      take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
      with cache level statistics.  Therefore, we account the load operations
      for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
      peer related metrics when flag PERF_MEM_SNOOPX_PEER is set.
      
      So three new metrics are introduced: 'lcl_peer' is for local cache
      access, the metric 'rmt_peer' is for remote access (includes remote DRAM
      and any caches in remote node), and the metric 'tot_peer' is accounting
      the sum value of 'lcl_peer' and 'rmt_peer'.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e843dec5
    • A
      perf arm-spe: Use SPE data source for neoverse cores · 4e6430cb
      Ali Saidi 提交于
      When synthesizing data from SPE, augment the type with source information
      for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
      the same encoding. I can't find encoding information for any other SPE
      implementations to unify their choices with Arm's thus that is left for
      future work.
      
      This change populates the mem_lvl_num for Neoverse cores as well as the
      deprecated mem_lvl namespace.
      Reviewed-by: NGerman Gomez <german.gomez@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NAli Saidi <alisaidi@amazon.com>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-4-leo.yan@linaro.orgSigned-off-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4e6430cb
    • L
      perf mem: Print snoop peer flag · f78d6250
      Leo Yan 提交于
      Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
      it is set.
      
      Before:
             memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
      
      After:
      
             memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Reviewed-by: NKajol Jain <kjain@linux.ibm.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f78d6250
    • A
      perf tools: Sync addition of PERF_MEM_SNOOPX_PEER · 2e21bcf0
      Ali Saidi 提交于
      Add a flag to the 'perf mem' data struct to signal that a request caused
      a cache-to-cache transfer of a line from a peer of the requestor and
      wasn't sourced from a lower cache level.
      
      The line being moved from one peer cache to another has latency and
      performance implications.
      
      On Arm64 Neoverse systems the data source can indicate a cache-to-cache
      transfer but not if the line is dirty or clean, so instead of
      overloading HITM define a new flag that indicates this type of transfer.
      
      Committer notes:
      
      This really is not syncing with the kernel since the patch to the kernel
      wasn't merged.
      
      But we're going ahead of this as it seems trivial and is just a matter
      of the perf kernel maintainers to give their ack or for us to find
      another way of expressing this in the perf records synthesized in
      userspace from the ARM64 hardware traces.
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NAli Saidi <alisaidi@amazon.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-2-leo.yan@linaro.orgSigned-off-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e21bcf0
    • L
      perf arm64: Add missing -I for tools/arch/arm64/include/ to find asm/sysreg.h... · 4a88c4ec
      Leo Yan 提交于
      perf arm64: Add missing -I for tools/arch/arm64/include/ to find asm/sysreg.h when building arm_spe.h
      
      This cures a current problem where tools/perf/util/arm-spe.c isn't
      finding a ARM64 specific asm header, so lets add it for now to make
      progress.
      
      Adding a .o specific rule seems clunky, lets try and find if this is
      really the right solution.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reported-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Link: https://lore.kernel.org/lkml/20220811124825.GA868014@leoy-huanghe.lanSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4a88c4ec
    • A
      perf tools: Tidy guest option documentation · 53e76d35
      Adrian Hunter 提交于
      Move common guest options into include files. Use attribute substitution to
      customize an example, using "[verse]" to define the block instead of a
      "literal" block which does not permit substitution.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20220811170411.84154-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      53e76d35
    • A
      perf inject: Fix missing guestmount option documentation · d9ca43c0
      Adrian Hunter 提交于
      The 'perf inject' documentation is missing the guestmount option. Add it.
      
      Fixes: 97406a7e ("perf inject: Add support for injecting guest sideband events")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20220811170411.84154-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d9ca43c0
    • A
      perf script: Fix missing guest option documentation · 696d0a4c
      Adrian Hunter 提交于
      The 'perf script' documentation is missing several options relating to
      guests.  Add them.
      
      Fixes: 15a108af ("perf script: Allow specifying the files to process guest samples")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20220811170411.84154-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      696d0a4c
    • N
      perf offcpu: Update offcpu test for child process · ade1d030
      Namhyung Kim 提交于
      Record off-cpu data with perf bench sched messaging workload and count
      the number of offcpu-time events.  Also update the test script not to
      run next tests if failed already and revise the error messages.
      
        $ sudo ./perf test offcpu -v
         88: perf record offcpu profiling tests                              :
        --- start ---
        test child forked, pid 344780
        Checking off-cpu privilege
        Basic off-cpu test
        Basic off-cpu test [Success]
        Child task off-cpu test
        Child task off-cpu test [Success]
        test child finished with 0
        ---- end ----
        perf record offcpu profiling tests: Ok
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220811185456.194721-5-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ade1d030
    • N
      perf offcpu: Track child processes · d2347763
      Namhyung Kim 提交于
      When -p option used or a workload is given, it needs to handle child
      processes.  The perf_event can inherit those task events
      automatically.  We can add a new BPF program in task_newtask
      tracepoint to track child processes.
      
      Before:
        $ sudo perf record --off-cpu -- perf bench sched messaging
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:        1
      
      After:
        $ sudo perf record -a --off-cpu -- perf bench sched messaging
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:      856
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220811185456.194721-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2347763
    • N
      perf offcpu: Parse process id separately · d6f415ca
      Namhyung Kim 提交于
      The current target code uses thread id for tracking tasks because
      perf_events need to be opened for each task.  But we can use tgid in
      BPF maps and check it easily.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220811185456.194721-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d6f415ca
    • N
      perf offcpu: Check process id for the given workload · 07fc958b
      Namhyung Kim 提交于
      Current task filter checks task->pid which is different for each
      thread.  But we want to profile all the threads in the process.  So
      let's compare process id (or thread-group id: tgid) instead.
      
      Before:
        $ sudo perf record --off-cpu -- perf bench sched messaging -t
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:        2
      
      After:
        $ sudo perf record --off-cpu -- perf bench sched messaging -t
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:      850
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220811185456.194721-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      07fc958b
  2. 11 8月, 2022 3 次提交
  3. 10 8月, 2022 16 次提交