• W
    arm64: lib: improve copy_page to deal with 128 bytes at a time · 223e23e8
    Will Deacon 提交于
    We want to avoid lots of different copy_page implementations, settling
    for something that is "good enough" everywhere and hopefully easy to
    understand and maintain whilst we're at it.
    
    This patch reworks our copy_page implementation based on discussions
    with Cavium on the list and benchmarking on Cortex-A processors so that:
    
      - The loop is unrolled to copy 128 bytes per iteration
    
      - The reads are offset so that we read from the next 128-byte block
        in the same iteration that we store the previous block
    
      - Explicit prefetch instructions are removed for now, since they hurt
        performance on CPUs with hardware prefetching
    
      - The loop exit condition is calculated at the start of the loop
    Signed-off-by: NWill Deacon <will.deacon@arm.com>
    Tested-by: NAndrew Pinski <apinski@cavium.com>
    Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
    223e23e8
copy_page.S 1.7 KB