• A
    powerpc: Add 64bit optimised memcmp · 15c2d45d
    Anton Blanchard 提交于
    I noticed ksm spending quite a lot of time in memcmp on a large
    KVM box. The current memcmp loop is very unoptimised - byte at a
    time compares with no loop unrolling. We can do much much better.
    
    Optimise the loop in a few ways:
    
    - Unroll the byte at a time loop
    
    - For large (at least 32 byte) comparisons that are also 8 byte
      aligned, use an unrolled modulo scheduled loop using 8 byte
      loads. This is similar to our glibc memcmp.
    
    A simple microbenchmark testing 10000000 iterations of an 8192 byte
    memcmp was used to measure the performance:
    
    baseline:	29.93 s
    
    modified:	 1.70 s
    
    Just over 17x faster.
    
    v2: Incorporated some suggestions from Segher:
    
    - Use andi. instead of rdlicl.
    
    - Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
      and was a relic from a previous version.
    
    - Don't use cr5, we have plans to use that CR field for fast local
      atomics.
    Signed-off-by: NAnton Blanchard <anton@samba.org>
    Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
    15c2d45d
memcmp_64.S 2.8 KB