• S
    Fix random fast-import errors when compiled with NO_MMAP · c9ced051
    Shawn O. Pearce 提交于
    fast-import was relying on the fact that on most systems mmap() and
    write() are synchronized by the filesystem's buffer cache.  We were
    relying on the ability to mmap() 20 bytes beyond the current end
    of the file, then later fill in those bytes with a future write()
    call, then read them through the previously obtained mmap() address.
    
    This isn't always true with some implementations of NFS, but it is
    especially not true with our NO_MMAP=YesPlease build time option used
    on some platforms.  If fast-import was built with NO_MMAP=YesPlease
    we used the malloc()+pread() emulation and the subsequent write()
    call does not update the trailing 20 bytes of a previously obtained
    "mmap()" (aka malloc'd) address.
    
    Under NO_MMAP that behavior causes unpack_entry() in sha1_file.c to
    be unable to read an object header (or data) that has been unlucky
    enough to be written to the packfile at a location such that it
    is in the trailing 20 bytes of a window previously opened on that
    same packfile.
    
    This bug has gone unnoticed for a very long time as it is highly data
    dependent.  Not only does the object have to be placed at the right
    position, but it also needs to be positioned behind some other object
    that has been accessed due to a branch cache invalidation.  In other
    words the stars had to align just right, and if you did run into
    this bug you probably should also have purchased a lottery ticket.
    
    Fortunately the workaround is a lot easier than the bug explanation.
    
    Before we allow unpack_entry() to read data from a pack window
    that has also (possibly) been modified through write() we force
    all existing windows on that packfile to be closed.  By closing
    the windows we ensure that any new access via the emulated mmap()
    will reread the packfile, updating to the current file content.
    
    This comes at a slight performance degredation as we cannot reuse
    previously cached windows when we update the packfile.  But it
    is a fairly minor difference as the window closes happen at only
    two points:
    
     - When the packfile is finalized and its .idx is generated:
    
       At this stage we are getting ready to update the refs and any
       data access into the packfile is going to be random, and is
       going after only the branch tips (to ensure they are valid).
       Our existing windows (if any) are not likely to be positioned
       at useful locations to access those final tip commits so we
       probably were closing them before anyway.
    
     - When the branch cache missed and we need to reload:
    
       At this point fast-import is getting change commands for the next
       commit and it needs to go re-read a tree object it previously
       had written out to the packfile.  What windows we had (if any)
       are not likely to cover the tree in question so we probably were
       closing them before anyway.
    
    We do try to avoid unnecessarily closing windows in the second case
    by checking to see if the packfile size has increased since the
    last time we called unpack_entry() on that packfile.  If the size
    has not changed then we have not written additional data, and any
    existing window is still vaild.  This nicely handles the cases where
    fast-import is going through a branch cache reload and needs to read
    many trees at once.  During such an event we are not likely to be
    updating the packfile so we do not cycle the windows between reads.
    
    With this change in place t9301-fast-export.sh (which was broken
    by c3b0dec5) finally works again.
    Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    c9ced051
cache.h 23.4 KB