• N
    sha1_file: be paranoid when creating loose objects · 748af44c
    Nicolas Pitre 提交于
    We don't want the data being deflated and stored into loose objects
    to be different from what we expect.  While the deflated data is
    protected by a CRC which is good enough for safe data retrieval
    operations, we still want to be doubly sure that the source data used
    at object creation time is still what we expected once that data has
    been deflated and its CRC32 computed.
    
    The most plausible data corruption may occur if the source file is
    modified while Git is deflating and writing it out in a loose object.
    Or Git itself could have a bug causing memory corruption.  Or even bad
    RAM could cause trouble.  So it is best to make sure everything is
    coherent and checksum protected from beginning to end.
    
    To do so we compute the SHA1 of the data being deflated _after_ the
    deflate operation has consumed that data, and make sure it matches
    with the expected SHA1.  This way we can rely on the CRC32 checked by
    the inflate operation to provide a good indication that the data is still
    coherent with its SHA1 hash.  One pathological case we ignore is when
    the data is modified before (or during) deflate call, but changed back
    before it is hashed.
    
    There is some overhead of course. Using 'git add' on a set of large files:
    
    Before:
    
    	real    0m25.210s
    	user    0m23.783s
    	sys     0m1.408s
    
    After:
    
    	real    0m26.537s
    	user    0m25.175s
    	sys     0m1.358s
    
    The overhead is around 5% for full data coherency guarantee.
    Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    748af44c
sha1_file.c 63.1 KB