• L
    coresight: tmc-etr: Speed up for bounce buffer in flat mode · 59f2c51f
    Leo Yan 提交于
    mainline inclusion
    from mainline-v5.15-rc3
    commit 0abd0762
    category: feature
    bugzilla: https://gitee.com/openeuler/kernel/issues/I5YCYK
    CVE: NA
    
    Reference: https://lore.kernel.org/r/20210905032144.966766-1-leo.yan@linaro.org
    
    --------------------------------------------------------------------------
    
    The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the
    low level's architecture code, e.g. for Arm64, it maps the memory with
    the attribution "Normal non-cacheable"; this can be concluded from the
    definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h.
    
    Later when access the AUX bounce buffer, since the memory mapping is
    non-cacheable, it's low efficiency due to every load instruction must
    reach out DRAM.
    
    This patch changes to allocate pages with dma_alloc_noncoherent(), the
    driver can access the memory via cacheable mapping; therefore, load
    instructions can fetch data from cache lines rather than always read
    data from DRAM, the driver can boost memory performance.  After using
    the cacheable mapping, the driver uses dma_sync_single_for_cpu() to
    invalidate cacheline prior to read bounce buffer so can avoid read stale
    trace data.
    
    By measurement the duration for function tmc_update_etr_buffer() with
    ftrace function_graph tracer, it shows the performance significant
    improvement for copying 4MiB data from bounce buffer:
    
      # echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise
      # echo tmc_update_etr_buffer > set_graph_function
      # echo function_graph > current_tracer
    
      before:
    
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |
      2)               |    tmc_update_etr_buffer() {
      ...
      2) # 8148.320 us |    }
    
      after:
    
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |
      2)               |  tmc_update_etr_buffer() {
      ...
      2) # 2525.420 us |  }
    Signed-off-by: NLeo Yan <leo.yan@linaro.org>
    Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
    Link: https://lore.kernel.org/r/20210905032144.966766-1-leo.yan@linaro.orgSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
    59f2c51f
coresight-tmc-etr.c 48.1 KB