- 23 7月, 2012 1 次提交
-
-
由 Jonathan Nieder 提交于
With 660231aa (block-sha1: support for architectures with memory alignment restrictions, 2009-08-12), blk_SHA1_Update was modified to access 32-bit chunks of memory one byte at a time on arches that prefer that: #define get_be32(p) ( \ (*((unsigned char *)(p) + 0) << 24) | \ (*((unsigned char *)(p) + 1) << 16) | \ (*((unsigned char *)(p) + 2) << 8) | \ (*((unsigned char *)(p) + 3) << 0) ) The code previously accessed these values by just using htonl(*p). Unfortunately, Michael noticed on an Alpha machine that git was using plain 32-bit reads anyway. As soon as we convert a pointer to int *, the compiler can assume that the object pointed to is correctly aligned as an int (C99 section 6.3.2.3 "pointer conversions" paragraph 7), and gcc takes full advantage by using a single 32-bit load, resulting in a whole bunch of unaligned access traps. So we need to obey the alignment constraints even when only dealing with pointers instead of actual values. Do so by changing the type of 'data' to void *. This patch renames 'data' to 'block' at the same time to make sure all references are updated to reflect the new type. Reported-tested-and-explained-by: NMichael Cree <mcree@orcon.net.nz> Signed-off-by: NJonathan Nieder <jrnieder@gmail.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 19 8月, 2009 4 次提交
-
-
由 Nicolas Pitre 提交于
They are both slower than the new BLK_SHA1 implementation, so it is pointless to keep them around. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Nicolas Pitre 提交于
With this, the code should now be portable to any C compiler. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Nicolas Pitre 提交于
We rely on ntohl() and htonl() to perform byte swapping in many places. However, some platforms have libraries providing really poor implementations of those which might cause significant performance issues, especially with the block-sha1 code. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Nicolas Pitre 提交于
This is a 64-bit value, hence having it first provides a better alignment. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 15 8月, 2009 1 次提交
-
-
由 Brandon Casey 提交于
Some compilers produce errors when arithmetic is attempted on pointers to void. We want computations done on byte addresses, so cast them to char * to work them around. Signed-off-by: NBrandon Casey <casey@nrlssc.navy.mil> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 14 8月, 2009 1 次提交
-
-
由 Nicolas Pitre 提交于
In addition to X86, PowerPC and S390 are capable of unaligned memory accesses. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 13 8月, 2009 3 次提交
-
-
由 Nicolas Pitre 提交于
This is needed on architectures with poor or non-existent unaligned memory support and/or no fast byte swap instruction (such as ARM) by using byte accesses to memory and shifting the result together. This also makes the code portable, therefore the byte access methods are the defaults. Any architecture that properly supports unaligned word accesses in hardware simply has to enable the alternative methods. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Nicolas Pitre 提交于
This is to make it easier for them to be selected individually depending on the architecture instead of the other way around i.e. having each architecture select a list of hacks up front. That makes for clearer documentation as well. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Nicolas Pitre 提交于
Move the code around so specific architecture hacks are defined first. Also make one line comments actually one line. No code change. Signed-off-by: NNicolas Pitre <nico@cam.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 11 8月, 2009 1 次提交
-
-
由 Linus Torvalds 提交于
For x86 performance (especially in 32-bit mode) I added that hack to write the SHA1 internal temporary hash using a volatile pointer, in order to get gcc to not try to cache the array contents. Because gcc will do all the wrong things, and then spill things in insane random ways. But on architectures like PPC, where you have 32 registers, it's actually perfectly reasonable to put the whole temporary array[] into the register set, and gcc can do so. So make the 'volatile unsigned int *' cast be dependent on a SMALL_REGISTER_SET preprocessor symbol, and enable it (currently) on just x86 and x86-64. With that, the routine is fairly reasonable even when compared to the hand-scheduled PPC version. Ben Herrenschmidt reports on a G5: * Paulus asm version: about 3.67s * Yours with no change: about 5.74s * Yours without "volatile": about 3.78s so with this the C version is within about 3% of the asm one. And add a lot of commentary on what the heck is going on. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 08 8月, 2009 2 次提交
-
-
由 Linus Torvalds 提交于
I think I have found a way to avoid the gcc crazyness. Lookie here: # TIME[s] SPEED[MB/s] rfc3174 5.094 119.8 rfc3174 5.098 119.7 linus 1.462 417.5 linusas 2.008 304 linusas2 1.878 325 mozilla 5.566 109.6 mozillaas 5.866 104.1 openssl 1.609 379.3 spelvin 1.675 364.5 spelvina 1.601 381.3 nettle 1.591 383.6 notice? I outperform all the hand-tuned asm on 32-bit too. By quite a margin, in fact. Now, I didn't try a P4, and it's possible that it won't do that there, but the 32-bit code generation sure looks impressive on my Nehalem box. The magic? I force the stores to the 512-bit hash bucket to be done in order. That seems to help a lot. The diff is trivial (on top of the "rename registers with cpp" patch), as appended. And it does seem to fix the P4 issues too, although I can obviously (once again) only test Prescott, and only in 64-bit mode: # TIME[s] SPEED[MB/s] rfc3174 1.662 36.73 rfc3174 1.64 37.22 linus 0.2523 241.9 linusas 0.4367 139.8 linusas2 0.4487 136 mozilla 0.9704 62.9 mozillaas 0.9399 64.94 that's some really impressive improvement. All from just saying "do the stores in the order I told you to, dammit!" to the compiler. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Linus Torvalds 提交于
Instead of letting the compiler to figure out the optimal way to rotate register usage, explicitly rotate the register names with cpp. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 07 8月, 2009 8 次提交
-
-
由 Linus Torvalds 提交于
.. and simplify the ctx->size logic. We now count the size in bytes, which means that 'lenW' was always just the low 6 bits of the total size, so we don't carry it around separately any more. And we do the 'size in bits' shift at the end. Suggested by Nicolas Pitre and linux@horizon.com. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Linus Torvalds 提交于
It's an equivalent expression, but the '+' gives us some freedom in instruction selection (for example, we can use 'lea' rather than 'add'), and associates with the other additions around it to give some minor scheduling freedom. Suggested-by: linux@horizon.com Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Linus Torvalds 提交于
Avoid repeating the shared parts of the different rounds by adding a macro layer or two. It was already more cpp than C. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Linus Torvalds 提交于
The mozilla-SHA1 code did this 80-word array for the 80 iterations. But the SHA1 state is really just 512 bits, and you can actually keep it in a kind of "circular queue" of just 16 words instead. This requires us to do the xor updates as we go along (rather than as a pre-phase), but that's really what we want to do anyway. This gets me really close to the OpenSSL performance on my Nehalem. Look ma, all C code (ok, there's the rol/ror hack, but that one doesn't strictly even matter on my Nehalem, it's just a local optimization). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Linus Torvalds 提交于
This helps a teeny bit. But what I -really- want to do is to avoid the whole 80-array loop, and do the xor updates as I go along.. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Junio C Hamano 提交于
Bert Wesarg noticed non-x86 version of SHA_ROT() had a typo. Also spell in-line assembly as __asm__(), otherwise I seem to get error: implicit declaration of function 'asm' from my compiler. Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Linus Torvalds 提交于
Use the one with the smaller constant. It _can_ generate slightly smaller code (a constant of 1 is special), but perhaps more importantly it's possibly faster on any uarch that does a rotate with a loop. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Junio C Hamano 提交于
Undo the change I picked up from the mailing list discussion suggested by Nico, not because it is wrong, but it will be done at the end of the follow-up series. Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 06 8月, 2009 17 次提交
-
-
由 Linus Torvalds 提交于
Based on the mozilla SHA1 routine, but doing the input data accesses a word at a time and with 'htonl()' instead of loading bytes and shifting. It requires an architecture that is ok with unaligned 32-bit loads and a fast htonl(). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Junio C Hamano 提交于
* sb/read-tree: read-tree: migrate to parse-options read-tree: convert unhelpful usage()'s to helpful die()'s
-
由 Junio C Hamano 提交于
* jc/apply-epoch-patch: apply: notice creation/removal patches produced by GNU diff
-
由 Junio C Hamano 提交于
* sb/parse-options: prune-packed: migrate to parse-options verify-pack: migrate to parse-options verify-tag: migrate to parse-options write-tree: migrate to parse-options
-
由 Junio C Hamano 提交于
* ns/init-mkdir: git init: optionally allow a directory argument Conflicts: builtin-init-db.c
-
由 Junio C Hamano 提交于
* mk/init-db-parse-options: init-db: migrate to parse-options
-
由 Junio C Hamano 提交于
* jk/maint-show-tag: show: add space between multiple items show: suppress extra newline when showing annotated tag
-
由 Junio C Hamano 提交于
* sb/maint-pull-rebase: pull: support rebased upstream + fetch + pull --rebase t5520-pull: Test for rebased upstream + fetch + pull --rebase
-
由 Junio C Hamano 提交于
* ne/futz-upload-pack: Shift object enumeration out of upload-pack Conflicts: upload-pack.c
-
由 Junio C Hamano 提交于
* maint: gitweb/README: Document $base_url Documentation: git submodule: add missing options to synopsis Better usage string for reflog. hg-to-git: don't import the unused popen2 module send-email: remove debug trace config: Keep inner whitespace verbatim
-
由 Junio C Hamano 提交于
* maint-1.6.3: Better usage string for reflog. hg-to-git: don't import the unused popen2 module send-email: remove debug trace config: Keep inner whitespace verbatim
-
由 Jakub Narebski 提交于
Signed-off-by: NJakub Narebski <jnareb@gmail.com> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Jens Lehmann 提交于
The option --merge was missing for submodule update and --cached for submodule summary. Signed-off-by: NJens Lehmann <Jens.Lehmann@web.de> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Michael J Gruber 提交于
Currently, the documentation suggests that 'git merge-base -a' and 'git show-branch --merge-base' are equivalent (in fact it claims that the former cannot handle more than two revs). Alas, the handling of more than two revs is very different. Document this by tests and correct the documentation to reflect this. Signed-off-by: NMichael J Gruber <git@drmicha.warpmail.net> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Michael J Gruber 提交于
Make sure that usage strings and documentation coincide with each other and with the actual code. Signed-off-by: NMichael J Gruber <git@drmicha.warpmail.net> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Michael J Gruber 提交于
...so that it is easier to reuse it for other tests. Signed-off-by: NMichael J Gruber <git@drmicha.warpmail.net> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Matthieu Moy 提交于
Signed-off-by: NMatthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
- 05 8月, 2009 2 次提交
-
-
由 Miklos Vajna 提交于
Importing the popen2 module in Python-2.6 results in the "DeprecationWarning: The popen2 module is deprecated. Use the subprocess module." message. The module itself isn't used in fact, so just removing it solves the problem. Signed-off-by: NMiklos Vajna <vmiklos@frugalware.org> Signed-off-by: NJunio C Hamano <gitster@pobox.com>
-
由 Erik Faye-Lund 提交于
Signed-off-by: NErik Faye-Lund <kusmabite@gmail.com>
-