1. 23 7月, 2012 1 次提交
    • J
      block-sha1: avoid pointer conversion that violates alignment constraints · 5f6a1125
      Jonathan Nieder 提交于
      With 660231aa (block-sha1: support for architectures with memory
      alignment restrictions, 2009-08-12), blk_SHA1_Update was modified to
      access 32-bit chunks of memory one byte at a time on arches that
      prefer that:
      
      	#define get_be32(p)    ( \
      		(*((unsigned char *)(p) + 0) << 24) | \
      		(*((unsigned char *)(p) + 1) << 16) | \
      		(*((unsigned char *)(p) + 2) <<  8) | \
      		(*((unsigned char *)(p) + 3) <<  0) )
      
      The code previously accessed these values by just using htonl(*p).
      
      Unfortunately, Michael noticed on an Alpha machine that git was using
      plain 32-bit reads anyway.  As soon as we convert a pointer to int *,
      the compiler can assume that the object pointed to is correctly
      aligned as an int (C99 section 6.3.2.3 "pointer conversions"
      paragraph 7), and gcc takes full advantage by using a single 32-bit
      load, resulting in a whole bunch of unaligned access traps.
      
      So we need to obey the alignment constraints even when only dealing
      with pointers instead of actual values.  Do so by changing the type
      of 'data' to void *.  This patch renames 'data' to 'block' at the same
      time to make sure all references are updated to reflect the new type.
      Reported-tested-and-explained-by: NMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5f6a1125
  2. 19 8月, 2009 4 次提交
  3. 15 8月, 2009 1 次提交
  4. 14 8月, 2009 1 次提交
  5. 13 8月, 2009 3 次提交
  6. 11 8月, 2009 1 次提交
    • L
      block-sha1: improve code on large-register-set machines · 926172c5
      Linus Torvalds 提交于
      For x86 performance (especially in 32-bit mode) I added that hack to write
      the SHA1 internal temporary hash using a volatile pointer, in order to get
      gcc to not try to cache the array contents. Because gcc will do all the
      wrong things, and then spill things in insane random ways.
      
      But on architectures like PPC, where you have 32 registers, it's actually
      perfectly reasonable to put the whole temporary array[] into the register
      set, and gcc can do so.
      
      So make the 'volatile unsigned int *' cast be dependent on a
      SMALL_REGISTER_SET preprocessor symbol, and enable it (currently) on just
      x86 and x86-64.  With that, the routine is fairly reasonable even when
      compared to the hand-scheduled PPC version. Ben Herrenschmidt reports on
      a G5:
      
       * Paulus asm version:       about 3.67s
       * Yours with no change:     about 5.74s
       * Yours without "volatile": about 3.78s
      
      so with this the C version is within about 3% of the asm one.
      
      And add a lot of commentary on what the heck is going on.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      926172c5
  7. 08 8月, 2009 2 次提交
    • L
      block-sha1: improved SHA1 hashing · 66c9c6c0
      Linus Torvalds 提交于
      I think I have found a way to avoid the gcc crazyness.
      
      Lookie here:
      
      	#             TIME[s] SPEED[MB/s]
      	rfc3174         5.094       119.8
      	rfc3174         5.098       119.7
      	linus           1.462       417.5
      	linusas         2.008         304
      	linusas2        1.878         325
      	mozilla         5.566       109.6
      	mozillaas       5.866       104.1
      	openssl         1.609       379.3
      	spelvin         1.675       364.5
      	spelvina        1.601       381.3
      	nettle          1.591       383.6
      
      notice? I outperform all the hand-tuned asm on 32-bit too. By quite a
      margin, in fact.
      
      Now, I didn't try a P4, and it's possible that it won't do that there, but
      the 32-bit code generation sure looks impressive on my Nehalem box. The
      magic? I force the stores to the 512-bit hash bucket to be done in order.
      That seems to help a lot.
      
      The diff is trivial (on top of the "rename registers with cpp" patch), as
      appended. And it does seem to fix the P4 issues too, although I can
      obviously (once again) only test Prescott, and only in 64-bit mode:
      
      	#             TIME[s] SPEED[MB/s]
      	rfc3174         1.662       36.73
      	rfc3174          1.64       37.22
      	linus          0.2523       241.9
      	linusas        0.4367       139.8
      	linusas2       0.4487         136
      	mozilla        0.9704        62.9
      	mozillaas      0.9399       64.94
      
      that's some really impressive improvement. All from just saying "do the
      stores in the order I told you to, dammit!" to the compiler.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      66c9c6c0
    • L
      block-sha1: perform register rotation using cpp · 30d12d4c
      Linus Torvalds 提交于
      Instead of letting the compiler to figure out the optimal way to rotate
      register usage, explicitly rotate the register names with cpp.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      30d12d4c
  8. 07 8月, 2009 8 次提交
  9. 06 8月, 2009 17 次提交
  10. 05 8月, 2009 2 次提交