• A
    Extra i386+gcc bn_div.c tune-up featuring inline division and saving · 4c22909e
    Andy Polyakov 提交于
    the remainder left in %edx. Here is the resulting performance improvement
    matrix (improvement as a result of this *and* previous tune-up committed
    two days ago). The results were obtained by profiling the "div" part of
    the crypto/bn/bnspeed.c.
    
    CPU	BN_div	bn_div_words	overall	comment
    ------------------------------------------------------------------------
    PII	+16%	accumulated by	+2-3%	PII multiplies damn fast! Taking
    		inlining		multiplication out of the loop
    					didn't make too much difference.
    					Eliminating of the multiplication
    					involved in remainder calculation
    					is the major factor.
    
    Pentium	+45%	accumulated by	+7-9%	mull isn't that fast and replacing
    		inlining		multiplications with additions in
    					the loop has more visible effect:-)
    
    MIPS	+75%	+12%		+20-25%	In addition to the taking mults
    R10000					out of the loop (giving 12% in the
    					asm/mips3.s) three mults were
    					eliminated in BN_div.
    
    Alpha	+30%	+50%		+10-15%	Same as above. But remember that
    EV4					bn_div_words is a C implementation.
    					It takes 4 Alpha mults in C to do
    					the same thing as 1 MIPS mult in
    					assembler does. So the effect (50%)
    					is more impressive. But not the
    					overall one... Well, if Alpha
    					bn_mul_add would be implemented
    					in assembler overall improvement
    					would be closer to MIPS...
    4c22909e
bn_div.c 9.1 KB