• J
    pack-objects: protect against disappearing packs · 4c080182
    Jeff King 提交于
    It's possible that while pack-objects is running, a
    simultaneously running prune process might delete a pack
    that we are interested in. Because we load the pack indices
    early on, we know that the pack contains our item, but by
    the time we try to open and map it, it is gone.
    
    Since c715f783, we already protect against this in the normal
    object access code path, but pack-objects accesses the packs
    at a lower level.  In the normal access path, we call
    find_pack_entry, which will call find_pack_entry_one on each
    pack index, which does the actual lookup. If it gets a hit,
    we will actually open and verify the validity of the
    matching packfile (using c715f783's is_pack_valid). If we
    can't open it, we'll issue a warning and pretend that we
    didn't find it, causing us to go on to the next pack (or on
    to loose objects).
    
    Furthermore, we will cache the descriptor to the opened
    packfile. Which means that later, when we actually try to
    access the object, we are likely to still have that packfile
    opened, and won't care if it has been unlinked from the
    filesystem.
    
    Notice the "likely" above. If there is another pack access
    in the interim, and we run out of descriptors, we could
    close the pack. And then a later attempt to access the
    closed pack could fail (we'll try to re-open it, of course,
    but it may have been deleted). In practice, this doesn't
    happen because we tend to look up items and then access them
    immediately.
    
    Pack-objects does not follow this code path. Instead, it
    accesses the packs at a much lower level, using
    find_pack_entry_one directly. This means we skip the
    is_pack_valid check, and may end up with the name of a
    packfile, but no open descriptor.
    
    We can add the same is_pack_valid check here. Unfortunately,
    the access patterns of pack-objects are not quite as nice
    for keeping lookup and object access together. We look up
    each object as we find out about it, and the only later when
    writing the packfile do we necessarily access it. Which
    means that the opened packfile may be closed in the interim.
    
    In practice, however, adding this check still has value, for
    three reasons.
    
      1. If you have a reasonable number of packs and/or a
         reasonable file descriptor limit, you can keep all of
         your packs open simultaneously. If this is the case,
         then the race is impossible to trigger.
    
      2. Even if you can't keep all packs open at once, you
         may end up keeping the deleted one open (i.e., you may
         get lucky).
    
      3. The race window is shortened. You may notice early that
         the pack is gone, and not try to access it. Triggering
         the problem without this check means deleting the pack
         any time after we read the list of index files, but
         before we access the looked-up objects.  Triggering it
         with this check means deleting the pack means deleting
         the pack after we do a lookup (and successfully access
         the packfile), but before we access the object. Which
         is a smaller window.
    Acked-by: NNicolas Pitre <nico@fluxnic.net>
    Signed-off-by: NJeff King <peff@peff.net>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    4c080182
sha1_file.c 71.6 KB