• A
    xfrm: branchless addr4_match() on 64-bit · 6c786bcb
    Alexey Dobriyan 提交于
    Current addr4_match() code has special test for /0 prefixes because of
    standard required undefined behaviour. However, it is possible to omit
    it on 64-bit because shifting can be done within a 64-bit register and
    then truncated to the expected value (which is 0 mask).
    
    Implicit truncation by htonl() fits nicely into R32-within-R64 model
    on x86-64.
    
    Space savings: none (coincidence)
    Branch savings: 1
    
    Before:
    
    	movzx  eax,BYTE PTR [rdi+0x2a]		# ->prefixlen_d
    	test   al,al
    	jne    xfrm_selector_match + 0x23f
    		...
    	movzx  eax,BYTE PTR [rbx+0x2b]		# ->prefixlen_s
    	test   al,al
    	je     xfrm_selector_match + 0x1c7
    
    After (no branches):
    
    	mov    r8d,0x20
    	mov    rdx,0xffffffffffffffff
    	mov    esi,DWORD PTR [rsi+0x2c]
    	mov    ecx,r8d
    	sub    cl,BYTE PTR [rdi+0x2a]
    	xor    esi,DWORD PTR [rbx]
    	mov    rdi,rdx
    	xor    eax,eax
    	shl    rdi,cl
    	bswap  edi
    Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
    Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
    6c786bcb
xfrm.h 50.4 KB