1. 25 2月, 2014 1 次提交
  2. 23 1月, 2014 1 次提交
  3. 07 1月, 2014 8 次提交
  4. 27 12月, 2013 1 次提交
    • J
      sha1_object_info_extended: provide delta base sha1s · 5d642e75
      Jeff King 提交于
      A caller of sha1_object_info_extended technically has enough
      information to determine the base sha1 from the results of
      the call. It knows the pack, offset, and delta type of the
      object, which is sufficient to find the base.
      
      However, the functions to do so are not publicly available,
      and the code itself is intimate enough with the pack details
      that it should be abstracted away. We could add a public
      helper to allow callers to query the delta base separately,
      but it is simpler and slightly more efficient to optionally
      grab it along with the rest of the object_info data.
      
      For cases where the object is not stored as a delta, we
      write the null sha1 into the query field. A careful caller
      could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
      before looking at the base sha1, but using the null sha1
      provides a simple alternative (and gives a better sanity
      check for a non-careful caller than simply returning random
      bytes).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5d642e75
  5. 19 12月, 2013 1 次提交
  6. 13 12月, 2013 4 次提交
  7. 22 11月, 2013 1 次提交
    • J
      drop support for "experimental" loose objects · b039718d
      Jeff King 提交于
      In git v1.4.3, we introduced a new loose object format that
      encoded some object information outside of the zlib stream.
      Ultimately the format was dropped in v1.5.3, but we kept the
      reading side around to help people migrate objects. Each
      time we open a loose object, we use a heuristic to check
      whether it is in the normal loose format, or the
      experimental one.
      
      This heuristic is robust in the face of valid data, but it
      tends to treat corrupted or garbage data as an experimental
      object. With the regular format, we would notice quickly
      that zlib's crc does not check out and complain. With the
      experimental object, we are likely to extract a nonsensical
      object size and try to allocate a huge buffer, resulting in
      xmalloc calling "die".
      
      This latter behavior is much worse, for two reasons. One,
      git reports an allocation error when the real error is
      corruption. And two, the program dies unconditionally, so
      you cannot even run fsck (which would otherwise ignore the
      broken object and keep going).
      
      We could try to improve the heuristic to err on the side of
      normal objects in the face of corruption, but there is
      really little point. The experimental format is long-dead,
      and was never enabled by default to begin with. We can
      instead simply remove it. The only affected repository would
      be one that explicitly set core.legacyheaders in 2007, and
      then never repacked in the intervening 6 years.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b039718d
  8. 07 11月, 2013 1 次提交
  9. 29 10月, 2013 2 次提交
    • J
      sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs · b2476a60
      Johan Herland 提交于
      There are cases (e.g. when running concurrent fetches in a repo) where
      multiple Git processes concurrently attempt to create loose objects
      within the same objects/XX/ dir. The creation of the loose object files
      is (AFAICS) safe from races, but the creation of the objects/XX/ dir in
      which the loose objects reside is unsafe, for example:
      
      Two concurrent fetches - A and B. As part of its fetch, A needs to store
      12aaaaa as a loose object. B, on the other hand, needs to store 12bbbbb
      as a loose object. The objects/12 directory does not already exist.
      Concurrently, both A and B determine that they need to create the
      objects/12 directory (because their first call to git_mkstemp_mode()
      within create_tmpfile() fails witn ENOENT). One of them - let's say A -
      executes the following mkdir() call before the other. This first call
      returns success, and A moves on. When B gets around to calling mkdir(),
      it fails with EEXIST, because A won the race. The mkdir() error causes B
      to return -1 from create_tmpfile(), which propagates all the way,
      resulting in the fetch failing with:
      
        error: unable to create temporary file: File exists
        fatal: failed to write object
        fatal: unpack-objects failed
      
      Although it's hard to add a testcase reproducing this issue, it's easy
      to provoke if we insert a sleep after the
      
        if (mkdir(buffer, 0777) || adjust_shared_perm(buffer))
            return -1;
      
      block, and then run two concurrent "git fetch"es against the same repo.
      
      The fix is to simply handle mkdir() failing with EEXIST as a success.
      If EEXIST is somehow returned for the wrong reasons (because the relevant
      objects/XX is not a directory, or is otherwise unsuitable for object
      storage), the following call to adjust_shared_perm(), or ultimately the
      retried call to git_mkstemp_mode() will fail, and we end up returning
      error from create_tmpfile() in any case.
      
      Note that there are still cases where two users with unsuitable umasks
      in a shared repo can end up in two races where one user first wins the
      mkdir() race to create an objects/XX/ directory, and then the other user
      wins the adjust_shared_perms() race to chmod() that directory, but fails
      because it is (transiently, until the first users completes its chmod())
      unwriteable to the other user. However, (an equivalent of) this race also
      exists before this patch, and is made no worse by this patch.
      Signed-off-by: NJohan Herland <johan@herland.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b2476a60
    • C
      sha1_file: move comment about return value where it belongs · 3fc0dca9
      Christian Couder 提交于
      Commit 5b086407 (sha1_object_info_extended: make type calculation
      optional, Jul 12 2013) changed the return value of the
      sha1_object_info_extended function to 0/-1 for success/error.
      
      Previously this function returned the object type for success or
      -1 for error. But unfortunately the above commit forgot to change
      or move the comment above this function that says "returns enum
      object_type or negative".
      
      To fix this inconsistency, let's move the comment above the
      sha1_object_info function where it is still true.
      Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      3fc0dca9
  10. 14 9月, 2013 1 次提交
  11. 31 8月, 2013 1 次提交
    • J
      has_sha1_file: re-check pack directory before giving up · 45e8a748
      Jeff King 提交于
      When we read a sha1 file, we first look for a packed
      version, then a loose version, and then re-check the pack
      directory again before concluding that we cannot find it.
      This lets us handle a process that is writing to the
      repository simultaneously (e.g., receive-pack writing a new
      pack followed by a ref update, or git-repack packing
      existing loose objects into a new pack).
      
      However, we do not do the same trick with has_sha1_file; we
      only check the packed objects once, followed by loose
      objects. This means that we might incorrectly report that we
      do not have an object, even though we could find it if we
      simply re-checked the pack directory.
      
      By itself, this is usually not a big deal. The other process
      is running simultaneously, so we may run has_sha1_file
      before it writes, anyway. It is a race whether we see the
      object or not.  However, we may also see other things
      the writing process has done (like updating refs); and in
      that case, we must be able to also see the new objects.
      
      For example, imagine we are doing a for_each_ref iteration,
      and somebody simultaneously pushes. Receive-pack may write
      the pack and update a ref after we have examined the
      objects/pack directory, but before the iteration gets to the
      updated ref. When we do finally see the updated ref,
      for_each_ref will call has_sha1_file to check whether the
      ref is broken. If has_sha1_file returns the wrong answer, we
      erroneously will think that the ref is broken.
      
      For a normal iteration without DO_FOR_EACH_INCLUDE_BROKEN,
      this means that the caller does not see the ref at all
      (neither the old nor the new value).  So not only will we
      fail to see the new value of the ref (which is acceptable,
      since we are running simultaneously with the writer, and we
      might well read the ref before the writer commits its
      write), but we will not see the old value either. For
      programs that act on reachability like pack-objects or
      prune, this can cause data loss, as we may see the objects
      referenced by the original ref value as dangling (and either
      omit them from the pack, or delete them via prune).
      
      There's no test included here, because the success case is
      two processes running simultaneously forever. But you can
      replicate the issue with:
      
        # base.sh
        # run this in one terminal; it creates and pushes
        # repeatedly to a repository
        git init parent &&
        (cd parent &&
      
          # create a base commit that will trigger us looking at
          # the objects/pack directory before we hit the updated ref
          echo content >file &&
          git add file &&
          git commit -m base &&
      
          # set the unpack limit abnormally low, which
          # lets us simulate full-size pushes using tiny ones
          git config receive.unpackLimit 1
        ) &&
        git clone parent child &&
        cd child &&
        n=0 &&
        while true; do
          echo $n >file && git add file && git commit -m $n &&
          git push origin HEAD:refs/remotes/child/master &&
          n=$(($n + 1))
        done
      
        # fsck.sh
        # now run this simultaneously in another terminal; it
        # repeatedly fscks, looking for us to consider the
        # newly-pushed ref broken. We cannot use for-each-ref
        # here, as it uses DO_FOR_EACH_INCLUDE_BROKEN, which
        # skips the has_sha1_file check (and if it wants
        # more information on the object, it will actually read
        # the object, which does the proper two-step lookup)
        cd parent &&
        while true; do
          broken=`git fsck 2>&1 | grep remotes/child`
          if test -n "$broken"; then
            echo $broken
            exit 1
          fi
        done
      
      Without this patch, the fsck loop fails within a few seconds
      (and almost instantly if the test repository actually has a
      large number of refs). With it, the two can run
      indefinitely.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      45e8a748
  12. 03 8月, 2013 1 次提交
    • B
      Don't close pack fd when free'ing pack windows · 7c3ecb32
      Brandon Casey 提交于
      Now that close_one_pack() has been introduced to handle file
      descriptor pressure, it is not strictly necessary to close the
      pack file descriptor in unuse_one_window() when we're under memory
      pressure.
      
      Jeff King provided a justification for leaving the pack file open:
      
         If you close packfile descriptors, you can run into racy situations
         where somebody else is repacking and deleting packs, and they go away
         while you are trying to access them. If you keep a descriptor open,
         you're fine; they last to the end of the process. If you don't, then
         they disappear from under you.
      
         For normal object access, this isn't that big a deal; we just rescan
         the packs and retry. But if you are packing yourself (e.g., because
         you are a pack-objects started by upload-pack for a clone or fetch),
         it's much harder to recover (and we print some warnings).
      
      Let's do so (or uh, not do so).
      Signed-off-by: NBrandon Casey <drafnel@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7c3ecb32
  13. 02 8月, 2013 1 次提交
    • B
      sha1_file: introduce close_one_pack() to close packs on fd pressure · 88d0db55
      Brandon Casey 提交于
      When the number of open packs exceeds pack_max_fds, unuse_one_window()
      is called repeatedly to attempt to release the least-recently-used
      pack windows, which, as a side-effect, will also close a pack file
      after closing its last open window.  If a pack file has been opened,
      but no windows have been allocated into it, it will never be selected
      by unuse_one_window() and hence its file descriptor will not be
      closed.  When this happens, git may exceed the number of file
      descriptors permitted by the system.
      
      This latter situation can occur in show-ref or receive-pack during ref
      advertisement.  During ref advertisement, receive-pack will iterate
      over every ref in the repository and advertise it to the client after
      ensuring that the ref exists in the local repository.  If the ref is
      located inside a pack, then the pack is opened to ensure that it
      exists, but since the object is not actually read from the pack, no
      mmap windows are allocated.  When the number of open packs exceeds
      pack_max_fds, unuse_one_window() will not be able to find any windows to
      free and will not be able to close any packs.  Once the per-process
      file descriptor limit is exceeded, receive-pack will produce a warning,
      not an error, for each pack it cannot open, and will then most likely
      fail with an error to spawn rev-list or index-pack like:
      
         error: cannot create standard input pipe for rev-list: Too many open files
         error: Could not run 'git rev-list'
      
      This may also occur during upload-pack when refs are packed (in the
      packed-refs file) and the number of packs that must be opened to
      verify that these packed refs exist exceeds the file descriptor
      limit.  If the refs are loose, then upload-pack will read each ref
      from the object database (if the object is in a pack, allocating one
      or more mmap windows for it) in order to peel tags and advertise the
      underlying object.  But when the refs are packed and peeled,
      upload-pack will use the peeled sha1 in the packed-refs file and
      will not need to read from the pack files, so no mmap windows will
      be allocated and just like with receive-pack, unuse_one_window()
      will never select these opened packs to close.
      
      When we have file descriptor pressure, we just need to find an open
      pack to close.  We can leave the existing mmap windows open.  If
      additional windows need to be mapped into the pack file, it will be
      reopened when necessary.  If the pack file has been rewritten in the
      mean time, open_packed_git_1() should notice when it compares the file
      size or the pack's sha1 checksum to what was previously read from the
      pack index, and reject it.
      
      Let's introduce a new function close_one_pack() designed specifically
      for this purpose to search for and close the least-recently-used pack,
      where LRU is defined as (in order of preference):
      
         * pack with oldest mtime and no allocated mmap windows
         * pack with the least-recently-used windows, i.e. the pack
           with the oldest most-recently-used window, where none of
           the windows are in use
         * pack with the least-recently-used windows
      Signed-off-by: NBrandon Casey <drafnel@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      88d0db55
  14. 19 7月, 2013 1 次提交
  15. 13 7月, 2013 6 次提交
    • J
      sha1_object_info_extended: pass object_info to helpers · 23c339c0
      Jeff King 提交于
      We take in a "struct object_info" which contains pointers to
      storage for items the caller cares about. But then rather
      than pass the whole object to the low-level loose/packed
      helper functions, we pass the individual pointers.
      
      Let's pass the whole struct instead, which will make adding
      more items later easier.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      23c339c0
    • J
      sha1_object_info_extended: make type calculation optional · 5b086407
      Jeff King 提交于
      Each caller of sha1_object_info_extended sets up an
      object_info struct to tell the function which elements of
      the object it wants to get. Until now, getting the type of
      the object has always been required (and it is returned via
      the return type rather than a pointer in object_info).
      
      This can involve actually opening a loose object file to
      determine its type, or following delta chains to determine a
      packed file's base type. These effects produce a measurable
      slow-down when doing a "cat-file --batch-check" that does
      not include %(objecttype).
      
      This patch adds a "typep" query to struct object_info, so
      that it can be optionally queried just like size and
      disk_size. As a result, the return type of the function is
      no longer the object type, but rather 0/-1 for success/error.
      
      As there are only three callers total, we just fix up each
      caller rather than keep a compatibility wrapper:
      
        1. The simpler sha1_object_info wrapper continues to
           always ask for and return the type field.
      
        2. The istream_source function wants to know the type, and
           so always asks for it.
      
        3. The cat-file batch code asks for the type only when
           %(objecttype) is part of the format string.
      
      On linux.git, the best-of-five for running:
      
        $ git rev-list --objects --all >objects
        $ time git cat-file --batch-check='%(objectsize:disk)'
      
      on a fully packed repository goes from:
      
        real    0m8.680s
        user    0m8.160s
        sys     0m0.512s
      
      to:
      
        real    0m7.205s
        user    0m6.580s
        sys     0m0.608s
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5b086407
    • J
      packed_object_info: make type lookup optional · 412916ee
      Jeff King 提交于
      Currently, packed_object_info can save some work by not
      calculating the size or disk_size of the object if the
      caller is not interested. However, it always calculates the
      true object type, whether the caller cares or not, and only
      optionally returns the easy-to-get "representation type".
      
      Let's swap these types. The function will now return the
      representation type (or OBJ_BAD on failure), and will only
      optionally fill in the true type.
      
      There should be no behavior change yet, as the only caller,
      sha1_object_info_extended, will always feed it a type
      pointer.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      412916ee
    • J
      packed_object_info: hoist delta type resolution to helper · 90191d37
      Jeff King 提交于
      To calculate the type of a packed object, we must walk down
      its delta chain until we hit a true base object with a real
      type. Most of the code in packed_object_info is for handling
      this case.
      
      Let's hoist it out into a separate helper function, which
      will make it easier to make the type-lookup optional in the
      future (and keep our indentation level sane).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      90191d37
    • J
      sha1_loose_object_info: make type lookup optional · 052fe5ea
      Jeff King 提交于
      Until recently, the only items to request from
      sha1_object_info_extended were type and size. This meant
      that we always had to open a loose object file to determine
      one or the other.  But with the addition of the disk_size
      query, it's possible that we can fulfill the query without
      even opening the object file at all. However, since the
      function interface always returns the type, we have no way
      of knowing whether the caller cares about it or not.
      
      This patch only modified sha1_loose_object_info to make type
      lookup optional using an out-parameter, similar to the way
      the size is handled (and the return value is "0" or "-1" for
      success or error, respectively).
      
      There should be no functional change yet, though, as
      sha1_object_info_extended, the only caller, will always ask
      for a type.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      052fe5ea
    • J
      sha1_object_info_extended: rename "status" to "type" · f2f57e31
      Jeff King 提交于
      The value we get from each low-level object_info function
      (e.g., loose, packed) is actually the object type (or -1 for
      error). Let's explicitly call it "type", which will make
      further refactorings easier to read.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f2f57e31
  16. 08 7月, 2013 2 次提交
    • J
      teach sha1_object_info_extended a "disk_size" query · 161f00e7
      Jeff King 提交于
      Using sha1_object_info_extended, a caller can find out the
      type of an object, its size, and information about where it
      is stored. In addition to the object's "true" size, it can
      also be useful to know the size that the object takes on
      disk (e.g., to generate statistics about which refs consume
      space).
      
      This patch adds a "disk_sizep" field to "struct object_info",
      and fills it in during sha1_object_info_extended if it is
      non-NULL.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      161f00e7
    • J
      zero-initialize object_info structs · 7c07385d
      Jeff King 提交于
      The sha1_object_info_extended function expects the caller to
      provide a "struct object_info" which contains pointers to
      "query" items that will be filled in. The purpose of
      providing pointers rather than storing the response directly
      in the struct is so that callers can choose not to incur the
      expense in finding particular fields that they do not care
      about.
      
      Right now the only query item is "sizep", and all callers
      set it explicitly to choose whether or not to query it; they
      can then leave the rest of the struct uninitialized.
      
      However, as we add new query items, each caller will have to
      be updated to explicitly turn off the new ones (by setting
      them to NULL).  Instead, let's teach each caller to
      zero-initialize the struct, so that they do not have to
      learn about each new query item added.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7c07385d
  17. 15 6月, 2013 1 次提交
    • J
      unpack_entry: do not die when we fail to apply a delta · 1ee886c1
      Jeff King 提交于
      When we try to load an object from disk and fail, our
      general strategy is to see if we can get it from somewhere
      else (e.g., a loose object). That lets users fix corruption
      problems by copying known-good versions of objects into the
      object database.
      
      We already handle the case where we were not able to read
      the delta from disk. However, when we find that the delta we
      read does not apply, we simply die.  This case is harder to
      trigger, as corruption in the delta data itself would
      trigger a crc error from zlib.  However, a corruption that
      pointed us at the wrong delta base might cause it.
      
      We can do the same "fail and try to find the object
      elsewhere" trick instead of dying. This not only gives us a
      chance to recover, but also puts us on code paths that will
      alert the user to the problem (with the current message,
      they do not even know which sha1 caused the problem).
      
      Note that unlike some other pack corruptions, we do not
      recover automatically from this case when doing a repack.
      There is nothing apparently wrong with the delta, as it
      points to a valid, accessible object, and we realize the
      error only when the resulting size does not match up. And in
      theory, one could even have a case where the corrupted size
      is the same, and the problem would only be noticed by
      recomputing the sha1.
      
      We can get around this by recomputing the deltas with
      --no-reuse-delta, which our test does (and this is probably
      good advice for anyone recovering from pack corruption).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1ee886c1
  18. 10 6月, 2013 1 次提交
  19. 04 6月, 2013 2 次提交
    • T
      sha1_file: silence sha1_loose_object_info · dbea72a8
      Thomas Rast 提交于
      sha1_object_info() returns -1 (OBJ_BAD) if it cannot find the object
      for some reason, which suggests that it wants the _caller_ to report
      this error.  However, part of its work happens in
      sha1_loose_object_info, which _does_ report errors itself.  This is
      doubly strange because:
      
      * packed_object_info(), which is the other half of the duo, does _not_
        report this.
      
      * In the event that an object is packed and pruned while
        sha1_object_info_extended() goes looking for it, we would
        erroneously show the error -- even though the code of the latter
        function purports to handle this case gracefully.
      
      * A caller might invoke sha1_object_info() to find the type of an
        object even if that object is not known to exist.
      
      Silence this error.  The others remain untouched as a corrupt object
      is a much more grave error than it merely being absent.
      Signed-off-by: NThomas Rast <trast@inf.ethz.ch>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      dbea72a8
    • F
      4b8f772c
  20. 01 5月, 2013 1 次提交
    • T
      unpack_entry: avoid freeing objects in base cache · 756a0426
      Thomas Rast 提交于
      In the !delta_data error path of unpack_entry(), we run free(base).
      This became a window for use-after-free() in abe601bb (sha1_file:
      remove recursion in unpack_entry, 2013-03-27), as follows:
      
      Before abe601bb, we got the 'base' from cache_or_unpack_entry(..., 0);
      keep_cache=0 tells it to also remove that entry.  So the 'base' is at
      this point not cached, and freeing it in the error path is the right
      thing.
      
      After abe601bb, the structure changed: we use a three-phase approach
      where phase 1 finds the innermost base or a base that is already in
      the cache.  In phase 3 we therefore know that all bases we unpack are
      not part of the delta cache yet.  (Observe that we pop from the cache
      in phase 1, so this is also true for the very first base.)  So we make
      no further attempts to look up the bases in the cache, and just call
      add_delta_base_cache() on every base object we have assembled.
      
      But the !delta_data error path remained unchanged, and now calls
      free() on a base that has already been entered in the cache.  This
      means that there is a use-after-free if we later use the same base
      again.
      
      So remove that free(); we are still going to use that data.
      Reported-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NThomas Rast <trast@inf.ethz.ch>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      756a0426
  21. 28 3月, 2013 2 次提交