• L
    Create pathname-based hash-table lookup into index · cf558704
    Linus Torvalds 提交于
    This creates a hash index of every single file added to the index.
    Right now that hash index isn't actually used for much: I implemented a
    "cache_name_exists()" function that uses it to efficiently look up a
    filename in the index without having to do the O(logn) binary search,
    but quite frankly, that's not why this patch is interesting.
    
    No, the whole and only reason to create the hash of the filenames in the
    index is that by modifying the hash function, you can fairly easily do
    things like making it always hash equivalent names into the same bucket.
    
    That, in turn, means that suddenly questions like "does this name exist
    in the index under an _equivalent_ name?" becomes much much cheaper.
    
    Guiding principles behind this patch:
    
     - it shouldn't be too costly. In fact, my primary goal here was to
       actually speed up "git commit" with a fully populated kernel tree, by
       being faster at checking whether a file already existed in the index. I
       did succeed, but only barely:
    
    	Best before:
    		[torvalds@woody linux]$ time git commit > /dev/null
    		real    0m0.255s
    		user    0m0.168s
    		sys     0m0.088s
    
    	Best after:
    
    		[torvalds@woody linux]$ time ~/git/git commit > /dev/null
    		real    0m0.233s
    		user    0m0.144s
    		sys     0m0.088s
    
       so some things are actually faster (~8%).
    
       Caveat: that's really the best case. Other things are invariably going
       to be slightly slower, since we populate that index cache, and quite
       frankly, few things really use it to look things up.
    
       That said, the cost is really quite small. The worst case is probably
       doing a "git ls-files", which will do very little except puopulate the
       index, and never actually looks anything up in it, just lists it.
    
    	Before:
    		[torvalds@woody linux]$ time git ls-files > /dev/null
    		real    0m0.016s
    		user    0m0.016s
    		sys     0m0.000s
    
    	After:
    		[torvalds@woody linux]$ time ~/git/git ls-files > /dev/null
    		real    0m0.021s
    		user    0m0.012s
    		sys     0m0.008s
    
       and while the thing has really gotten relatively much slower, we're
       still talking about something almost unmeasurable (eg 5ms). And that
       really should be pretty much the worst case.
    
       So we lose 5ms on one "benchmark", but win 22ms on another. Pick your
       poison - this patch has the advantage that it will _likely_ speed up
       the cases that are complex and expensive more than it slows down the
       cases that are already so fast that nobody cares. But if you look at
       relative speedups/slowdowns, it doesn't look so good.
    
     - It should be simple and clean
    
       The code may be a bit subtle (the reasons I do hash removal the way I
       do etc), but it re-uses the existing hash.c files, so it really is
       fairly small and straightforward apart from a few odd details.
    
    Now, this patch on its own doesn't really do much, but I think it's worth
    looking at, if only because if done correctly, the name hashing really can
    make an improvement to the whole issue of "do we have a filename that
    looks like this in the index already". And at least it gets real testing
    by being used even by default (ie there is a real use-case for it even
    without any insane filesystems).
    
    NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just
    not ashamed of it enough to really care. I took all the numbers out of my
    nether regions - I'm sure it's good enough that it works in practice, but
    the whole point was that you can make a really much fancier hash that
    hashes characters not directly, but by their upper-case value or something
    like that, and thus you get a case-insensitive hash, while still keeping
    the name and the index itself totally case sensitive.
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    cf558704
cache.h 24.5 KB