1. 08 7月, 2013 1 次提交
    • J
      zero-initialize object_info structs · 7c07385d
      Jeff King 提交于
      The sha1_object_info_extended function expects the caller to
      provide a "struct object_info" which contains pointers to
      "query" items that will be filled in. The purpose of
      providing pointers rather than storing the response directly
      in the struct is so that callers can choose not to incur the
      expense in finding particular fields that they do not care
      about.
      
      Right now the only query item is "sizep", and all callers
      set it explicitly to choose whether or not to query it; they
      can then leave the rest of the struct uninitialized.
      
      However, as we add new query items, each caller will have to
      be updated to explicitly turn off the new ones (by setting
      them to NULL).  Instead, let's teach each caller to
      zero-initialize the struct, so that they do not have to
      learn about each new query item added.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7c07385d
  2. 01 5月, 2013 1 次提交
    • T
      unpack_entry: avoid freeing objects in base cache · 756a0426
      Thomas Rast 提交于
      In the !delta_data error path of unpack_entry(), we run free(base).
      This became a window for use-after-free() in abe601bb (sha1_file:
      remove recursion in unpack_entry, 2013-03-27), as follows:
      
      Before abe601bb, we got the 'base' from cache_or_unpack_entry(..., 0);
      keep_cache=0 tells it to also remove that entry.  So the 'base' is at
      this point not cached, and freeing it in the error path is the right
      thing.
      
      After abe601bb, the structure changed: we use a three-phase approach
      where phase 1 finds the innermost base or a base that is already in
      the cache.  In phase 3 we therefore know that all bases we unpack are
      not part of the delta cache yet.  (Observe that we pop from the cache
      in phase 1, so this is also true for the very first base.)  So we make
      no further attempts to look up the bases in the cache, and just call
      add_delta_base_cache() on every base object we have assembled.
      
      But the !delta_data error path remained unchanged, and now calls
      free() on a base that has already been entered in the cache.  This
      means that there is a use-after-free if we later use the same base
      again.
      
      So remove that free(); we are still going to use that data.
      Reported-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NThomas Rast <trast@inf.ethz.ch>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      756a0426
  3. 28 3月, 2013 3 次提交
  4. 27 3月, 2013 1 次提交
  5. 26 3月, 2013 1 次提交
    • T
      sha1_file: remove recursion in packed_object_info · 790d96c0
      Thomas Rast 提交于
      packed_object_info() and packed_delta_info() were mutually recursive.
      The former would handle ordinary types and defer deltas to the latter;
      the latter would use the former to resolve the delta base.
      
      This arrangement, however, leads to trouble with threaded index-pack
      and long delta chains on platforms where thread stacks are small, as
      happened on OS X (512kB thread stacks by default) with the chromium
      repo.
      
      The task of the two functions is not all that hard to describe without
      any recursion, however.  It proceeds in three steps:
      
      - determine the representation type and size, based on the outermost
        object (delta or not)
      
      - follow through the delta chain, if any
      
      - determine the object type from what is found at the end of the delta
        chain
      
      The only complication stems from the error recovery.  If parsing fails
      at any step, we want to mark that object (within the pack) as bad and
      try getting the corresponding SHA1 from elsewhere.  If that also
      fails, we want to repeat this process back up the delta chain until we
      find a reasonable solution or conclude that there is no way to
      reconstruct the object.  (This is conveniently checked by t5303.)
      
      To achieve that within the pack, we keep track of the entire delta
      chain in a stack.  When things go sour, we process that stack from the
      top, marking entries as bad and attempting to re-resolve by sha1.  To
      avoid excessive malloc(), the stack starts out with a small
      stack-allocated array.  The choice of 64 is based on the default of
      pack.depth, which is 50, in the hope that it covers "most" delta
      chains without any need for malloc().
      
      It's much harder to make the actual re-resolving by sha1 nonrecursive,
      so we skip that.  If you can't afford *that* recursion, your
      corruption problems are more serious than your stack size problems.
      Reported-by: NStefan Zager <szager@google.com>
      Signed-off-by: NThomas Rast <trast@student.ethz.ch>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      790d96c0
  6. 16 2月, 2013 1 次提交
  7. 13 2月, 2013 1 次提交
  8. 09 11月, 2012 2 次提交
  9. 25 8月, 2012 1 次提交
  10. 30 7月, 2012 1 次提交
    • H
      link_alt_odb_entry: fix read over array bounds reported by valgrind · cb2912c3
      Heiko Voigt 提交于
      pfxlen can be longer than the path in objdir when relative_base
      contains the path to gits object directory.  Here we are interested
      in checking if ent->base[] (the part that corresponds to .git/objects)
      is the same string as objdir, and the code NUL-terminated ent->base[]
      to
      
      	LEADING PATH\0XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\0
      
      in preparation for these "duplicate check" step (before we return
      from the function, the first NUL is turned into '/' so that we can
      fill XX when probing for loose objects).  All we need to do is to
      compare the string with the path to our object directory.
      Signed-off-by: NHeiko Voigt <hvoigt@hvoigt.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      cb2912c3
  11. 15 5月, 2012 1 次提交
  12. 01 5月, 2012 2 次提交
  13. 08 3月, 2012 1 次提交
    • N
      parse_object: avoid putting whole blob in core · 090ea126
      Nguyễn Thái Ngọc Duy 提交于
      Traditionally, all the callers of check_sha1_signature() first
      called read_sha1_file() to prepare the whole object data in core,
      and called this function.  The function is used to revalidate what
      we read from the object database actually matches the object name we
      used to ask for the data from the object database.
      
      Update the API to allow callers to pass NULL as the object data, and
      have the function read and hash the object data using streaming API
      to recompute the object name, without having to hold everything in
      core at the same time.  This is most useful in parse_object() that
      parses a blob object, because this caller does not have to keep the
      actual blob data around in memory after a "struct blob" is returned.
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      090ea126
  14. 25 2月, 2012 1 次提交
    • J
      do not stream large files to pack when filters are in use · 4f22b101
      Jeff King 提交于
      Because git's object format requires us to specify the
      number of bytes in the object in its header, we must know
      the size before streaming a blob into the object database.
      This is not a problem when adding a regular file, as we can
      get the size from stat(). However, when filters are in use
      (such as autocrlf, or the ident, filter, or eol
      gitattributes), we have no idea what the ultimate size will
      be.
      
      The current code just punts on the whole issue and ignores
      filter configuration entirely for files larger than
      core.bigfilethreshold. This can generate confusing results
      if you use filters for large binary files, as the filter
      will suddenly stop working as the file goes over a certain
      size.  Rather than try to handle unknown input sizes with
      streaming, this patch just turns off the streaming
      optimization when filters are in use.
      
      This has a slight performance regression in a very specific
      case: if you have autocrlf on, but no gitattributes, a large
      binary file will avoid the streaming code path because we
      don't know beforehand whether it will need conversion or
      not. But if you are handling large binary files, you should
      be marking them as such via attributes (or at least not
      using autocrlf, and instead marking your text files as
      such). And the flip side is that if you have a large
      _non_-binary file, there is a correctness improvement;
      before we did not apply the conversion at all.
      
      The first half of the new t1051 script covers these failures
      on input. The second half tests the matching output code
      paths. These already work correctly, and do not need any
      adjustment.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      4f22b101
  15. 07 2月, 2012 1 次提交
    • M
      fsck: give accurate error message on empty loose object files · 33e42de0
      Matthieu Moy 提交于
      Since 3ba7a065 (A loose object is not corrupt if it
      cannot be read due to EMFILE), "git fsck" on a repository with an empty
      loose object file complains with the error message
      
        fatal: failed to read object <sha1>: Invalid argument
      
      This comes from a failure of mmap on this empty file, which sets errno to
      EINVAL. Instead of calling xmmap on empty file, we display a clean error
      message ourselves, and return a NULL pointer. The new message is
      
        error: object file .git/objects/09/<rest-of-sha1> is empty
        fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt
      
      The second line was already there before the regression in 3ba7a065,
      and the first is an additional message, that should help diagnosing the
      problem for the user.
      Signed-off-by: NMatthieu Moy <Matthieu.Moy@imag.fr>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      33e42de0
  16. 02 2月, 2012 2 次提交
  17. 22 12月, 2011 1 次提交
    • Æ
      Appease Sun Studio by renaming "tmpfile" · ab1900a3
      Ævar Arnfjörð Bjarmason 提交于
      On Solaris the system headers define the "tmpfile" name, which'll
      cause Git compiled with Sun Studio 12 Update 1 to whine about us
      redefining the name:
      
          "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile     (E_PRAGMA_REDEFINE_STATIC)
          "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
          "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile   (E_PRAGMA_REDEFINE_STATIC)
          "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
      
      Just renaming the "tmpfile" variable to "tmp_file" in the relevant
      places is the easiest way to fix this.
      Signed-off-by: NÆvar Arnfjörð Bjarmason <avarab@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ab1900a3
  18. 02 12月, 2011 1 次提交
  19. 16 11月, 2011 1 次提交
  20. 28 10月, 2011 1 次提交
  21. 15 10月, 2011 2 次提交
    • J
      downgrade "packfile cannot be accessed" errors to warnings · 58a6a9cc
      Jeff King 提交于
      These can happen if another process simultaneously prunes a
      pack. But that is not usually an error condition, because a
      properly-running prune should have repacked the object into
      a new pack. So we will notice that the pack has disappeared
      unexpectedly, print a message, try other packs (possibly
      after re-scanning the list of packs), and find it in the new
      pack.
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      58a6a9cc
    • J
      pack-objects: protect against disappearing packs · 4c080182
      Jeff King 提交于
      It's possible that while pack-objects is running, a
      simultaneously running prune process might delete a pack
      that we are interested in. Because we load the pack indices
      early on, we know that the pack contains our item, but by
      the time we try to open and map it, it is gone.
      
      Since c715f783, we already protect against this in the normal
      object access code path, but pack-objects accesses the packs
      at a lower level.  In the normal access path, we call
      find_pack_entry, which will call find_pack_entry_one on each
      pack index, which does the actual lookup. If it gets a hit,
      we will actually open and verify the validity of the
      matching packfile (using c715f783's is_pack_valid). If we
      can't open it, we'll issue a warning and pretend that we
      didn't find it, causing us to go on to the next pack (or on
      to loose objects).
      
      Furthermore, we will cache the descriptor to the opened
      packfile. Which means that later, when we actually try to
      access the object, we are likely to still have that packfile
      opened, and won't care if it has been unlinked from the
      filesystem.
      
      Notice the "likely" above. If there is another pack access
      in the interim, and we run out of descriptors, we could
      close the pack. And then a later attempt to access the
      closed pack could fail (we'll try to re-open it, of course,
      but it may have been deleted). In practice, this doesn't
      happen because we tend to look up items and then access them
      immediately.
      
      Pack-objects does not follow this code path. Instead, it
      accesses the packs at a much lower level, using
      find_pack_entry_one directly. This means we skip the
      is_pack_valid check, and may end up with the name of a
      packfile, but no open descriptor.
      
      We can add the same is_pack_valid check here. Unfortunately,
      the access patterns of pack-objects are not quite as nice
      for keeping lookup and object access together. We look up
      each object as we find out about it, and the only later when
      writing the packfile do we necessarily access it. Which
      means that the opened packfile may be closed in the interim.
      
      In practice, however, adding this check still has value, for
      three reasons.
      
        1. If you have a reasonable number of packs and/or a
           reasonable file descriptor limit, you can keep all of
           your packs open simultaneously. If this is the case,
           then the race is impossible to trigger.
      
        2. Even if you can't keep all packs open at once, you
           may end up keeping the deleted one open (i.e., you may
           get lucky).
      
        3. The race window is shortened. You may notice early that
           the pack is gone, and not try to access it. Triggering
           the problem without this check means deleting the pack
           any time after we read the list of index files, but
           before we access the looked-up objects.  Triggering it
           with this check means deleting the pack means deleting
           the pack after we do a lookup (and successfully access
           the packfile), but before we access the object. Which
           is a smaller window.
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      4c080182
  22. 08 9月, 2011 1 次提交
  23. 24 8月, 2011 1 次提交
    • J
      clone: clone from a repository with relative alternates · e6baf4a1
      Junio C Hamano 提交于
      Cloning from a local repository blindly copies or hardlinks all the files
      under objects/ hierarchy. This results in two issues:
      
       - If the repository cloned has an "objects/info/alternates" file, and the
         command line of clone specifies --reference, the ones specified on the
         command line get overwritten by the copy from the original repository.
      
       - An entry in a "objects/info/alternates" file can specify the object
         stores it borrows objects from as a path relative to the "objects/"
         directory. When cloning a repository with such an alternates file, if
         the new repository is not sitting next to the original repository, such
         relative paths needs to be adjusted so that they can be used in the new
         repository.
      
      This updates add_to_alternates_file() to take the path to the alternate
      object store, including the "/objects" part at the end (earlier, it was
      taking the path to $GIT_DIR and was adding "/objects" itself), as it is
      technically possible to specify in objects/info/alternates file the path
      of a directory whose name does not end with "/objects".
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e6baf4a1
  24. 12 8月, 2011 1 次提交
    • R
      Tolerate zlib deflation with window size < 32Kb · 7f684a2a
      Roberto Tyley 提交于
      Git currently reports loose objects as 'corrupt' if they've been
      deflated using a window size less than 32Kb, because the
      experimental_loose_object() function doesn't recognise the header
      byte as a zlib header. This patch makes the function tolerant of
      all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice
      it's accuracy in distingushing the standard loose-object format
      from the experimental (now abandoned) format.
      
      On memory constrained systems zlib may use a much smaller window
      size - working on Agit, I found that Android uses a 4KB window;
      giving a header byte of 0x48, not 0x78. Consequently all loose
      objects generated appear 'corrupt', which is why Agit is a read-only
      Git client at this time - I don't want my client to generate Git
      repos that other clients treat as broken :(
      
      This patch makes Git tolerant of different deflate settings - it
      might appear that it changes experimental_loose_object() to the point
      where it could incorrectly identify the experimental format as the
      standard one, but the two criteria (bitmask & checksum) can only
      give a false result for an experimental object where both of the
      following are true:
      
      1) object size is exactly 8 bytes when uncompressed (bitmask)
      2) [single-byte in-pack git type&size header] * 256
         + [1st byte of the following zlib header] % 31 = 0 (checksum)
      
      As it happens, for all possible combinations of valid object type
      (1-4) and window bits (0-7), the only time when the checksum will be
      divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which,
      due the fields all Commit objects must contain, could never be as
      small as 8 bytes in size.
      
      Given this, the combination of the two criteria (bitmask & checksum)
      always correctly determines the buffer format, and is more tolerant
      than the previous version.
      
      The alternative to this patch is simply removing support for the
      experimental format, which I am also totally cool with.
      
      References:
      
      Android uses a 4KB window for deflation:
      http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53
      
      Code snippet searching for false positives with the zlib checksum:
      https://gist.github.com/1118177Signed-off-by: NRoberto Tyley <roberto.tyley@guardian.co.uk>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7f684a2a
  25. 07 7月, 2011 1 次提交
    • J
      core: log offset pack data accesses happened · 5f44324d
      Junio C Hamano 提交于
      In a workload other than "git log" (without pathspec nor any option that
      causes us to inspect trees and blobs), the recency pack order is said to
      cause the access jump around quite a bit. Add a hook to allow us observe
      how bad it is.
      
      "git config core.logpackaccess /var/tmp/pal.txt" will give you the log
      in the specified file.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5f44324d
  26. 11 6月, 2011 2 次提交
    • J
      zlib: zlib can only process 4GB at a time · ef49a7a0
      Junio C Hamano 提交于
      The size of objects we read from the repository and data we try to put
      into the repository are represented in "unsigned long", so that on larger
      architectures we can handle objects that weigh more than 4GB.
      
      But the interface defined in zlib.h to communicate with inflate/deflate
      limits avail_in (how many bytes of input are we calling zlib with) and
      avail_out (how many bytes of output from zlib are we ready to accept)
      fields effectively to 4GB by defining their type to be uInt.
      
      In many places in our code, we allocate a large buffer (e.g. mmap'ing a
      large loose object file) and tell zlib its size by assigning the size to
      avail_in field of the stream, but that will truncate the high octets of
      the real size. The worst part of this story is that we often pass around
      z_stream (the state object used by zlib) to keep track of the number of
      used bytes in input/output buffer by inspecting these two fields, which
      practically limits our callchain to the same 4GB limit.
      
      Wrap z_stream in another structure git_zstream that can express avail_in
      and avail_out in unsigned long. For now, just die() when the caller gives
      a size that cannot be given to a single zlib call. In later patches in the
      series, we would make git_inflate() and git_deflate() internally loop to
      give callers an illusion that our "improved" version of zlib interface can
      operate on a buffer larger than 4GB in one go.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ef49a7a0
    • J
      zlib: wrap deflate side of the API · 55bb5c91
      Junio C Hamano 提交于
      Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
      of deflateInit2 in remote-curl.c to tell the library to use gzip header
      and trailer in git_deflate_init_gzip().
      
      There is only one caller that cares about the status from deflateEnd().
      Introduce git_deflate_end_gently() to let that sole caller retrieve the
      status and act on it (i.e. die) for now, but we would probably want to
      make inflate_end/deflate_end die when they ran out of memory and get
      rid of the _gently() kind.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      55bb5c91
  27. 09 6月, 2011 1 次提交
    • J
      sha1_file.c: "legacy" is really the current format · cc5c54e7
      Junio C Hamano 提交于
      Every time I look at the read-loose-object codepath, legacy_loose_object()
      function makes my brain go through mental contortion. When we were playing
      with the experimental loose object format, it may have made sense to call
      the traditional format "legacy", in the hope that the experimental one
      will some day replace it to become official, but it never happened.
      
      This renames the function (and negates its return value) to detect if we
      are looking at the experimental format, and move the code around in its
      caller which used to do "if we are looing at legacy, do this special case,
      otherwise the normal case is this". The codepath to read from the loose
      objects in experimental format is the "unlikely" case.
      
      Someday after Git 2.0, we should drop the support of this format.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      cc5c54e7
  28. 06 6月, 2011 1 次提交
    • J
      verify-pack: use index-pack --verify · 3de89c9d
      Junio C Hamano 提交于
      This finally gets rid of the inefficient verify-pack implementation that
      walks objects in the packfile in their object name order and replaces it
      with a call to index-pack --verify. As a side effect, it also removes
      packed_object_info_detail() API which is rather expensive.
      
      As this changes the way errors are reported (verify-pack used to rely on
      the usual runtime error detection routine unpack_entry() to diagnose the
      CRC errors in an entry in the *.idx file; index-pack --verify checks the
      whole *.idx file in one go), update a test that expected the string "CRC"
      to appear in the error message.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      3de89c9d
  29. 27 5月, 2011 1 次提交
  30. 21 5月, 2011 3 次提交
  31. 20 5月, 2011 1 次提交
    • J
      sha1_object_info_extended(): expose a bit more info · 9a490590
      Junio C Hamano 提交于
      The original interface for sha1_object_info() takes an object name and
      gives back a type and its size (the latter is given only when it was
      asked).  The new interface wraps its implementation and exposes a bit
      more pieces of information that the interface used to discard, namely:
      
       - where the object is stored (loose? cached? packed?)
       - if packed, where in which packfile?
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ---
      
       * In the earlier round, this used u.pack.delta to record the length of
         the delta chain, but the caller is not necessarily interested in the
         length of the delta chain per-se, but may only want to know if it is a
         delta against another object or is stored as a deflated data. Calling
         packed_object_info_detail() involves walking the reverse index chain to
         compute the store size of the object and is unnecessarily expensive.
      
         We could resurrect the code if a new caller wants to know, but I doubt
         it.
      9a490590