提交 0bd8d6e2 编写于 作者: A Andy Polyakov

Commentary updates to SHA for sparcv9.

上级 160065c5
...@@ -8,13 +8,15 @@ ...@@ -8,13 +8,15 @@
# ==================================================================== # ====================================================================
# Performance improvement is not really impressive on pre-T1 CPU: +8% # Performance improvement is not really impressive on pre-T1 CPU: +8%
# over Sun C and +25% over gcc [3.3]. While on T1, ... And there # over Sun C and +25% over gcc [3.3]. While on T1, a.k.a. Niagara, it
# is a gimmick. X[16] vector is packed to 8 64-bit registers and as # turned to be 40% faster than 64-bit code generated by Sun C 5.8 and
# result nothing is spilled on stack. In addition input data is loaded # >2x than 64-bit code generated by gcc 3.4. And there is a gimmick.
# in compact instruction sequence, thus minimizing the window when the # X[16] vector is packed to 8 64-bit registers and as result nothing
# code is subject to [inter-thread] cache-thrashing hazard. The goal # is spilled on stack. In addition input data is loaded in compact
# is to ensure scalability on UltraSPARC T1, or rather to avoid decay # instruction sequence, thus minimizing the window when the code is
# when amount of active threads exceeds the number of physical cores. # subject to [inter-thread] cache-thrashing hazard. The goal is to
# ensure scalability on UltraSPARC T1, or rather to avoid decay when
# amount of active threads exceeds the number of physical cores.
$bits=32; $bits=32;
for (@ARGV) { $bits=64 if (/\-m64/ || /\-xarch\=v9/); } for (@ARGV) { $bits=64 if (/\-m64/ || /\-xarch\=v9/); }
......
...@@ -23,7 +23,16 @@ ...@@ -23,7 +23,16 @@
# #
# SHA512 on UltraSPARC T1. # SHA512 on UltraSPARC T1.
# #
# ... # It's not any faster than 64-bit code generated by Sun C 5.8. This is
# because 64-bit code generator has the advantage of using 64-bit
# loads to access X[16], which I consciously traded for 32-/64-bit ABI
# duality [as per above]. But it surpasses 32-bit Sun C generated code
# by 60%, not to mention that it doesn't suffer from severe decay when
# running 4 times physical cores threads and that it leaves gcc [3.4]
# behind by over 4x factor! If compared to SHA256, single thread
# performance is only 10% better, but overall throughput for maximum
# amount of threads for given CPU exceeds corresponding one of SHA256
# by 30% [again, optimal coefficient is 50%].
$bits=32; $bits=32;
for (@ARGV) { $bits=64 if (/\-m64/ || /\-xarch\=v9/); } for (@ARGV) { $bits=64 if (/\-m64/ || /\-xarch\=v9/); }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册