1. 12 8月, 2022 9 次提交
    • L
      perf c2c: Sort on peer snooping for load operations · f37c5d91
      Leo Yan 提交于
      This patch adds a new option 'peer' so can sort on the cache hit for
      peer snooping.
      
      For displaying with option 'peer', the "Shared Data Cache Line Table"
      and "Shared Cache Line Distribution Pareto" both sort with the metrics
      "tot_peer".
      
      As result, we can get the 'peer' display:
      
        # perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio
      
        =================================================
                   Shared Data Cache Line Table
        =================================================
        #
        #        ----------- Cacheline ----------     Peer  ------- Load Peer -------    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
        # Index             Address  Node  PA cnt    Snoop    Total    Local   Remote  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
        # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
        #
              0      0xaaaac17d6000   N/A       0  100.00%       99       99        0    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0
      
        =================================================
              Shared Cache Line Distribution Pareto
        =================================================
        #
        #        -- Peer Snoop --  ------- Store Refs ------  --------- Data address ---------                                                  ---------- cycles ----------    Total       cpu                                    Shared
        #   Num      Rmt      Lcl   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt      Pid                Tid        Code address  rmt peer  lcl peer      load  records       cnt                  Symbol            Object      Source:Line  Node{cpus %peers %stores}
        # .....  .......  .......  .......  .......  .......  ..................  ....  ......  .......  .................  ..................  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
        #
          ----------------------------------------------------------------------
              0        0       99        0        0        0      0xaaaac17d6000
          ----------------------------------------------------------------------
                   0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3603:memstress      0xaaaac17c25ac         0       376        41     9314         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 2 100.0%    n/a}
                   0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3606:memstress      0xaaaac17c25ac         0       375        44     9155         1  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 1 100.0%    n/a}
                   0.00%   48.48%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3606:memstress      0xaaaac17c3e88         0       180       170       65         1  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 1 100.0%    n/a}
                   0.00%   45.45%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3603:memstress      0xaaaac17c3e88         0       180       175       70         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 2 100.0%    n/a}
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-14-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f37c5d91
    • L
      perf c2c: Refactor display string · faa30dfa
      Leo Yan 提交于
      The display type is shown by combination the display string array and a
      suffix string "HITMs", which is not friendly to extend display for other
      sorting type (e.g. extension for peer operations).
      
      This patch moves the suffix string "HITMs" into display string array for
      HITM types, so it can allow us to not necessarily to output string
      "HITMs" for new incoming display type.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-13-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      faa30dfa
    • L
      perf c2c: Refactor node header · 7c10b65a
      Leo Yan 提交于
      The node header array contains 3 items, each item is used for one of
      the 3 flavors for node accessing info.  To extend sorting on other
      snooping type and not always stick to HITMs, the second header string
      "Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
      "Node{cpus %peer %stores}").
      
      For this reason, this patch changes the node header array to three
      flat variables and uses switch-case in function setup_nodes_header(),
      thus it is easier for altering the header string.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-12-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7c10b65a
    • L
      perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' · 2be0bc75
      Leo Yan 提交于
      Use more general naming for the main sort dimension, this can allow us
      not to sort only on HITM snoop type, so it can be extended to support
      other costly snooping operations.  So rename the dimension to the prefix
      'percent_costly_".
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-11-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2be0bc75
    • L
      perf c2c: Use explicit names for display macros · c82ccc3a
      Leo Yan 提交于
      Perf c2c tool has an assumption that it heavily depends on HITM snoop
      type to detect cache false sharing, unfortunately, HITM is not supported
      on some architectures.
      
      Essentially, perf c2c tool wants to find some very costly snooping
      operations for false cache sharing, this means it's not necessarily
      to stick using HITM tags and we can explore other snooping types
      (e.g. SNOOPX_PEER).
      
      For this reason, this patch renames HITM related display macros with
      suffix '_HITM', so it can be distinct if later add more display types
      for on other snooping type.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-10-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c82ccc3a
    • L
      perf c2c: Add mean dimensions for peer operations · 682352e5
      Leo Yan 提交于
      This patch adds two dimensions for the mean value of peer operations.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-9-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      682352e5
    • L
      perf c2c: Add dimensions of peer metrics for cache line view · 9082282f
      Leo Yan 提交于
      This patch adds dimensions of peer ops, which will be used for Shared
      cache line distribution pareto.
      
      It adds the percentage dimensions for local and remote peer operations,
      and the dimensions for accounting operation numbers which is used for
      stdio mode.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-8-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9082282f
    • L
      perf c2c: Add dimensions for peer load operations · 63e74ab5
      Leo Yan 提交于
      This patch adds three dimensions for peer load operations of 'lcl_peer',
      'rmt_peer' and 'tot_peer'.  These three dimensions will be used in the
      shared data cache line table.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-7-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63e74ab5
    • L
      perf c2c: Output statistics for peer snooping · 3ef1fc17
      Leo Yan 提交于
      This patch outputs statistics for peer snooping for whole trace events
      and global shared cache line.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-6-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ef1fc17
  2. 04 6月, 2022 1 次提交
  3. 26 5月, 2022 1 次提交
    • L
      perf c2c: Use stdio interface if slang is not supported · c4040212
      Leo Yan 提交于
      If the slang lib is not installed on the system, perf c2c tool disables TUI
      mode and roll back to use stdio mode;  but the flag 'c2c.use_stdio' is
      missed to set true and thus it wrongly applies UI quirks in the function
      ui_quirks().
      
      This commit forces to use stdio interface if slang is not supported, and
      it can avoid to apply the UI quirks and show the correct metric header.
      
      Before:
      
      =================================================
            Shared Cache Line Distribution Pareto
      =================================================
        -------------------------------------------------------------------------------
            0        0        0       99        0        0        0      0xaaaac17d6000
        -------------------------------------------------------------------------------
          0.00%    0.00%    6.06%    0.00%    0.00%    0.00%   0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
          0.00%    0.00%   93.94%    0.00%    0.00%    0.00%   0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0
      
      After:
      
      =================================================
            Shared Cache Line Distribution Pareto
      =================================================
        -------------------------------------------------------------------------------
            0        0        0       99        0        0        0      0xaaaac17d6000
        -------------------------------------------------------------------------------
                 0.00%    0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
                 0.00%    0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0
      
      Fixes: 5a1a99cd ("perf c2c report: Add main TUI browser")
      Reported-by: NJoe Mario <jmario@redhat.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20220526145400.611249-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c4040212
  4. 23 5月, 2022 1 次提交
    • L
      perf c2c: Add dimensions for 'N/A' metrics of store operation · 550b4d6f
      Leo Yan 提交于
      Since now we have the statistics 'st_na' for store operations, add
      dimensions for the 'N/A' (no available memory level) metrics and the
      associated percentage calculation for the single cache line view.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adam Li <adamli@amperemail.onmicrosoft.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ali Saidi <alisaidi@amazon.com>
      Cc: Alyssa Ross <hi@alyssa.is>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Li Huafei <lihuafei1@huawei.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220518055729.1869566-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      550b4d6f
  5. 26 3月, 2022 1 次提交
    • W
      perf tools: Enhance the matching of sub-commands abbreviations · ae0f4eb3
      Wei Li 提交于
      We support short command 'rec*' for 'record' and 'rep*' for 'report' in
      lots of sub-commands, but the matching is not quite strict currnetly.
      
      It may be puzzling sometime, like we mis-type a 'recport' to report but
      it will perform 'record' in fact without any message.
      
      To fix this, add a check to ensure that the short cmd is valid prefix
      of the real command.
      
      Committer testing:
      
        [root@quaco ~]# perf c2c re sleep 1
      
         Usage: perf c2c {record|report}
      
            -v, --verbose         be more verbose (show counter open errors, etc)
      
        # perf c2c rec sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.038 MB perf.data (16 samples) ]
        # perf c2c recport sleep 1
      
         Usage: perf c2c {record|report}
      
            -v, --verbose         be more verbose (show counter open errors, etc)
      
        # perf c2c record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.038 MB perf.data (15 samples) ]
        # perf c2c records sleep 1
      
         Usage: perf c2c {record|report}
      
            -v, --verbose         be more verbose (show counter open errors, etc)
      
        #
      Signed-off-by: NWei Li <liwei391@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Link: http://lore.kernel.org/lkml/20220325092032.2956161-1-liwei391@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ae0f4eb3
  6. 16 2月, 2022 1 次提交
    • Y
      perf c2c: Replace bitmap_weight() with bitmap_empty() where appropriate · 1006c5c1
      Yury Norov 提交于
      Some code in builtin-c2c.c calls bitmap_weight() to check if any bit of
      a given bitmap is set.
      
      It's better to use bitmap_empty() in that case because bitmap_empty()
      stops traversing the bitmap as soon as it finds first set bit, while
      bitmap_weight() counts all bits unconditionally.
      Signed-off-by: NYury Norov <yury.norov@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Klimov <aklimov@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Laight <david.laight@aculab.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
      Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Link: http://lore.kernel.org/lkml/20220123183925.1052919-13-yury.norov@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1006c5c1
  7. 13 1月, 2022 2 次提交
    • I
      perf cpumap: Give CPUs their own type · 6d18804b
      Ian Rogers 提交于
      A common problem is confusing CPU map indices with the CPU, by wrapping
      the CPU with a struct then this is avoided. This approach is similar to
      atomic_t.
      
      Committer notes:
      
      To make it build with BUILD_BPF_SKEL=1 these files needed the
      conversions to 'struct perf_cpu' usage:
      
        tools/perf/util/bpf_counter.c
        tools/perf/util/bpf_counter_cgroup.c
        tools/perf/util/bpf_ftrace.c
      
      Also perf_env__get_cpu() was removed back in "perf cpumap: Switch
      cpu_map__build_map to cpu function".
      
      Additionally these needed to be fixed for the ARM builds to complete:
      
        tools/perf/arch/arm/util/cs-etm.c
        tools/perf/arch/arm64/util/pmu.c
      Suggested-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d18804b
    • I
      perf c2c: Use more intention revealing iterator · 84d2f4f0
      Ian Rogers 提交于
      Use perf_cpu_map__for_each_cpu() in setup_nodes.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-46-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      84d2f4f0
  8. 07 11月, 2021 1 次提交
  9. 09 9月, 2021 1 次提交
  10. 02 8月, 2021 1 次提交
  11. 01 6月, 2021 2 次提交
  12. 09 2月, 2021 1 次提交
  13. 21 1月, 2021 6 次提交
  14. 11 11月, 2020 3 次提交
  15. 15 10月, 2020 8 次提交
    • L
      perf c2c: Add metrics "RMT Load Hit" · 91d933c2
      Leo Yan 提交于
      The metrics "LLC Ld Miss" and "Load Dram" overlap with each other for
      accouting items:
      
        "LLC Ld Miss" = "lcl_dram" + "rmt_dram" + "rmt_hit" + "rmt_hitm"
        "Load Dram"   = "lcl_dram" + "rmt_dram"
      
      Furthermore, the metrics "LLC Ld Miss" is not directive to show
      statistics due to it contains summary value and cannot give out
      breakdown details.
      
      For this reason, add a new metrics "RMT Load Hit" which is used to
      present the remote cache hit; it contains two items:
      
        "RMT Load Hit" = remote hit ("rmt_hit") + remote hitm ("rmt_hitm")
      
      As result, the metrics "LLC Ld Miss" is perfectly divided into two
      metrics "RMT Load Hit" and "Load Dram".  It's not necessary to keep
      metrics "LLC Ld Miss", so remove it.
      
      Before:
      
        #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --      LLC  --- Load Dram ----
        # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm  Ld Miss       Lcl       Rmt
        # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  .......  ........  ........
        #
              0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765      548     2615       66       169      481        0         0         0
              1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0      187      361       27        11       78        0         0         0
              2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0      131        0       10       263        1        0         0         0
      
      After:
      
        #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
        # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
        # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
        #
              0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765      548     2615       66       169      481         0        0         0         0
              1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0      187      361       27        11       78         0        0         0         0
              2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0      131        0       10       263        1         0        0         0         0
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-9-leo.yan@linaro.org
      91d933c2
    • L
      perf c2c: Correct LLC load hit metrics · 77c15869
      Leo Yan 提交于
      "rmt_hit" is accounted into two metrics: one is accounted into the
      metrics "LLC Ld Miss" (see the function llc_miss() for calculation
      "llcmiss"); and it's accounted into metrics "LLC Load Hit".  Thus,
      for the literal meaning, it is contradictory that "rmt_hit" is
      accounted for both "LLC Ld Miss" (LLC miss) and "LLC Load Hit"
      (LLC hit).
      
      Thus this is easily to introduce confusion: "LLC Load Hit" gives
      impression that all items belong to it are LLC hit; in fact "rmt_hit"
      is LLC miss and remote cache hit.
      
      To give out clear semantics for metric "LLC Load Hit", "rmt_hit" is
      moved out from it and changes "LLC Load Hit" to contain two items:
      
        LLC Load Hit = LLC's hit ("ld_llchit") + LLC's hitm ("lcl_hitm")
      
      For output alignment, adjusts the header for "LLC Load Hit".
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-8-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      77c15869
    • L
      perf c2c: Change header for LLC local hit · ed626a3e
      Leo Yan 提交于
      Replace the header string "Lcl" with "LclHit", which is more explicit
      to express the event type is LLC local hit.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-7-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ed626a3e
    • L
      perf c2c: Use more explicit headers for HITM · 0fbe2fe9
      Leo Yan 提交于
      Local and remote HITM use the headers 'Lcl' and 'Rmt' respectively,
      suppose if we want to extend the tool to display these two dimensions
      under any one metrics, users cannot understand the semantics if only
      based on the header string 'Lcl' or 'Rmt'.
      
      To explicit express the meaning for HITM items, this patch changes the
      headers string as "LclHitm" and "RmtHitm", the strings are more readable
      and this allows to extend metrics for using HITM items.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-6-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0fbe2fe9
    • L
      perf c2c: Change header from "LLC Load Hitm" to "Load Hitm" · fdd32d7e
      Leo Yan 提交于
      The metrics "LLC Load Hitm" contains two items: one is "local Hitm" and
      another is "remote Hitm".
      
      "local Hitm" means: L3 HIT and was serviced by another processor core
      with a cross core snoop where modified copies were found; it's no doubt
      that "local Hitm" belongs to LLC access.
      
      But for "remote Hitm", based on the code in util/mem-events, it's the
      event for remote cache HIT and was serviced by another processor core
      with modified copies.  Thus the remote Hitm is a remote cache's hit and
      actually it's LLC load miss.
      
      Now the display format gives users the impression that "local Hitm" and
      "remote Hitm" both belong to the LLC load, but this is not the fact as
      described.
      
      This patch changes the header from "LLC Load Hitm" to "Load Hitm", this
      can avoid the give the wrong impression that all Hitm belong to LLC.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fdd32d7e
    • L
      perf c2c: Organize metrics based on memory hierarchy · 6d662d73
      Leo Yan 提交于
      The metrics are not organized based on memory hierarchy, e.g. the tool
      doesn't organize the metrics order based on memory nodes from the close
      node (e.g. L1/L2 cache) to far node (e.g. L3 cache and DRAM).
      
      To output metrics with more friendly form, this patch refines the
      metrics order based on memory hierarchy:
      
        "Core Load Hit" => "LLC Load Hit" => "LLC Ld Miss" => "Load Dram"
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-4-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d662d73
    • L
      perf c2c: Display "Total Stores" as a standalone metrics · 4f28641b
      Leo Yan 提交于
      The total stores is displayed under the metrics "Store Reference", to
      output the same format with total records and all loads, extract the
      total stores number as a standalone metrics "Total Stores".
      
      After this patch, the tool shows the summary numbers ("Total records",
      "Total loads", "Total Stores") in the unified form.
      
      Before:
      
        #        ----------- Cacheline ----------      Tot  ----- LLC Load Hitm -----    Total    Total  ---- Store Reference ----  --- Load Dram ----      LLC  ----- Core Load Hit -----  -- LLC Load Hit --
        # Index             Address  Node  PA cnt     Hitm    Total      Lcl      Rmt  records    Loads    Total    L1Hit   L1Miss       Lcl       Rmt  Ld Miss       FB       L1       L2       Llc       Rmt
        # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  ........  .......  .......  .......  .......  ........  ........
        #
              0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765         0         0        0      548     2615       66       169         0
              1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0         0         0        0      187      361       27        11         0
              2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0         0         0        0      131        0       10       263         0
      
      After:
      
        #        ----------- Cacheline ----------      Tot  ----- LLC Load Hitm -----    Total    Total    Total  ---- Stores ----  --- Load Dram ----      LLC  ----- Core Load Hit -----  -- LLC Load Hit --
        # Index             Address  Node  PA cnt     Hitm    Total      Lcl      Rmt  records    Loads   Stores    L1Hit   L1Miss       Lcl       Rmt  Ld Miss       FB       L1       L2       Llc       Rmt
        # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  ........  .......  .......  .......  .......  ........  ........
        #
              0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765         0         0        0      548     2615       66       169         0
              1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0         0         0        0      187      361       27        11         0
              2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0         0         0        0      131        0       10       263         0
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4f28641b
    • L
      perf c2c: Display the total numbers continuously · b596e979
      Leo Yan 提交于
      To view the statistics with "breakdown" mode, it's good to show the
      summary numbers for the total records, all stores and all loads, then
      the sequential conlumns can be used to break into more detailed items.
      
      To achieve this purpose, this patch displays the summary numbers for
      records/stores/loads continuously and places them before breakdown
      items, this can allow uses to easily read the summarized statistics.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201014050921.5591-2-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b596e979
  16. 14 10月, 2020 1 次提交