• J
    has_sha1_file: re-check pack directory before giving up · 45e8a748
    Jeff King 提交于
    When we read a sha1 file, we first look for a packed
    version, then a loose version, and then re-check the pack
    directory again before concluding that we cannot find it.
    This lets us handle a process that is writing to the
    repository simultaneously (e.g., receive-pack writing a new
    pack followed by a ref update, or git-repack packing
    existing loose objects into a new pack).
    
    However, we do not do the same trick with has_sha1_file; we
    only check the packed objects once, followed by loose
    objects. This means that we might incorrectly report that we
    do not have an object, even though we could find it if we
    simply re-checked the pack directory.
    
    By itself, this is usually not a big deal. The other process
    is running simultaneously, so we may run has_sha1_file
    before it writes, anyway. It is a race whether we see the
    object or not.  However, we may also see other things
    the writing process has done (like updating refs); and in
    that case, we must be able to also see the new objects.
    
    For example, imagine we are doing a for_each_ref iteration,
    and somebody simultaneously pushes. Receive-pack may write
    the pack and update a ref after we have examined the
    objects/pack directory, but before the iteration gets to the
    updated ref. When we do finally see the updated ref,
    for_each_ref will call has_sha1_file to check whether the
    ref is broken. If has_sha1_file returns the wrong answer, we
    erroneously will think that the ref is broken.
    
    For a normal iteration without DO_FOR_EACH_INCLUDE_BROKEN,
    this means that the caller does not see the ref at all
    (neither the old nor the new value).  So not only will we
    fail to see the new value of the ref (which is acceptable,
    since we are running simultaneously with the writer, and we
    might well read the ref before the writer commits its
    write), but we will not see the old value either. For
    programs that act on reachability like pack-objects or
    prune, this can cause data loss, as we may see the objects
    referenced by the original ref value as dangling (and either
    omit them from the pack, or delete them via prune).
    
    There's no test included here, because the success case is
    two processes running simultaneously forever. But you can
    replicate the issue with:
    
      # base.sh
      # run this in one terminal; it creates and pushes
      # repeatedly to a repository
      git init parent &&
      (cd parent &&
    
        # create a base commit that will trigger us looking at
        # the objects/pack directory before we hit the updated ref
        echo content >file &&
        git add file &&
        git commit -m base &&
    
        # set the unpack limit abnormally low, which
        # lets us simulate full-size pushes using tiny ones
        git config receive.unpackLimit 1
      ) &&
      git clone parent child &&
      cd child &&
      n=0 &&
      while true; do
        echo $n >file && git add file && git commit -m $n &&
        git push origin HEAD:refs/remotes/child/master &&
        n=$(($n + 1))
      done
    
      # fsck.sh
      # now run this simultaneously in another terminal; it
      # repeatedly fscks, looking for us to consider the
      # newly-pushed ref broken. We cannot use for-each-ref
      # here, as it uses DO_FOR_EACH_INCLUDE_BROKEN, which
      # skips the has_sha1_file check (and if it wants
      # more information on the object, it will actually read
      # the object, which does the proper two-step lookup)
      cd parent &&
      while true; do
        broken=`git fsck 2>&1 | grep remotes/child`
        if test -n "$broken"; then
          echo $broken
          exit 1
        fi
      done
    
    Without this patch, the fsck loop fails within a few seconds
    (and almost instantly if the test repository actually has a
    large number of refs). With it, the two can run
    indefinitely.
    Signed-off-by: NJeff King <peff@peff.net>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    45e8a748
sha1_file.c 78.4 KB