“7e55055e5bb00085051ca59c570c83a820e1e0ee”上不存在“tools/perf/builtin-script.c”
powerpc32: optimise csum_partial() loop
On the 8xx, load latency is 2 cycles and taking branches also takes 2 cycles. So let's unroll the loop. This patch improves csum_partial() speed by around 10% on both: * 8xx (single issue processor with parallel execution) * 83xx (superscalar 6xx processor with dual instruction fetch and parallel execution) Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NScott Wood <oss@buserror.net>
Showing
想要评论请 注册 或 登录