1. 03 3月, 2011 1 次提交
  2. 01 3月, 2011 1 次提交
    • S
      Limit file descriptors used by packs · c7934306
      Shawn O. Pearce 提交于
      Rather than using 'errno == EMFILE' after a failed open() call
      to indicate the process is out of file descriptors and an LRU
      pack window should be closed, place a hard upper limit on the
      number of open packs based on the actual rlimit of the process.
      
      By using a hard upper limit that is below the rlimit of the current
      process it is not necessary to check for EMFILE on every single
      fd-allocating system call.  Instead reserving 25 file descriptors
      makes it safe to assume the system call won't fail due to being over
      the filedescriptor limit.  Here 25 is chosen as a WAG, but considers
      3 for stdin/stdout/stderr, and at least a few for other Git code
      to operate on temporary files.  An additional 20 is reserved as it
      is not known what the C library needs to perform other services on
      Git's behalf, such as nsswitch or name resolution.
      
      This fixes a case where running `git gc --auto` in a repository
      with more than 1024 packs (but an rlimit of 1024 open fds) fails
      due to the temporary output file not being able to allocate a
      file descriptor.  The output file is opened by pack-objects after
      object enumeration and delta compression are done, both of which
      have already opened all of the packs and fully populated the file
      descriptor table.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c7934306
  3. 15 2月, 2011 1 次提交
    • J
      correct type of EMPTY_TREE_SHA1_BIN · dab0d410
      Jonathan Nieder 提交于
      Functions such as hashcmp that expect a binary SHA-1 value take
      parameters of type "unsigned char *" to avoid accepting a textual
      SHA-1 passed by mistake.  Unfortunately, this means passing the string
      literal EMPTY_TREE_SHA1_BIN requires an ugly cast.  Tweak the
      definition of EMPTY_TREE_SHA1_BIN to produce a value of more
      convenient type.
      
      In the future the definition might change to
      
      	extern const unsigned char empty_tree_sha1_bin[20];
      	#define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin
      Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      dab0d410
  4. 08 2月, 2011 2 次提交
  5. 21 1月, 2011 1 次提交
    • B
      Correctly report corrupted objects · 25f3af3f
      Björn Steinbrink 提交于
      The errno check added in commit 3ba7a065 "A loose object is not corrupt
      if it cannot be read due to EMFILE" only checked for whether errno is
      not ENOENT and thus incorrectly treated "no error" as an error
      condition.
      
      Because of that, it never reached the code path that would report that
      the object is corrupted and instead caused funny errors like:
      
        fatal: failed to read object 333c4768ce595793fdab1ef3a036413e2a883853: Success
      
      So we have to extend the check to cover the case in which the object
      file was successfully read, but its contents are corrupted.
      Reported-by: NWill Palmer <wmpalmer@gmail.com>
      Signed-off-by: NBjörn Steinbrink <B.Steinbrink@gmx.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      25f3af3f
  6. 11 11月, 2010 2 次提交
  7. 04 11月, 2010 4 次提交
  8. 15 7月, 2010 1 次提交
    • Æ
      sha1_file: Show the the type and path to corrupt objects · e8b15e61
      Ævar Arnfjörð Bjarmason 提交于
      Change the error message that's displayed when we encounter corrupt
      objects to be more specific. We now print the type (loose or packed)
      of corrupted objects, along with the full path to the file in
      question.
      
      Before:
      
          $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
          fatal: object 909ef997367880aaf2133bafa1f1a71aa28e09df is corrupted
      
      After:
      
          $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
          fatal: loose object 909ef997367880aaf2133bafa1f1a71aa28e09df (stored in .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df) is corrupted
      
      Knowing the path helps to quickly analyze what's wrong:
      
          $ file .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df
          .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df: empty
      Signed-off-by: NÆvar Arnfjörð Bjarmason <avarab@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e8b15e61
  9. 26 5月, 2010 1 次提交
    • J
      remove over-eager caching in sha1_file_name · 560fb6a1
      Jeff King 提交于
      This function takes a sha1 and produces a loose object
      filename. It caches the location of the object directory so
      that it can fill the sha1 information directly without
      allocating a new buffer (and in its original incarnation,
      without calling getenv(), though these days we cache that
      with the code in environment.c).
      
      This cached base directory can become stale, however, if in
      a single process git changes the location of the object
      directory (e.g., by running setup_work_tree, which will
      chdir to the new worktree).
      
      In most cases this isn't a problem, because we tend to set
      up the git repository location and do any chdir()s before
      actually looking up any objects, so the first lookup will
      cache the correct location. In the case of reset --hard,
      however, we do something like:
      
        1. look up the commit object
      
        2. notice we are doing --hard, run setup_work_tree
      
        3. look up the tree object to reset
      
      Step (3) fails because our cache object directory value is
      bogus.
      
      This patch simply removes the caching. We use a static
      buffer instead of allocating one each time (the original
      version treated the malloc'd buffer as a static, so there is
      no change in calling semantics).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      560fb6a1
  10. 19 5月, 2010 1 次提交
  11. 20 4月, 2010 2 次提交
    • S
      Allow parse_pack_index on temporary files · 7b64469a
      Shawn O. Pearce 提交于
      The easiest way to verify a pack index is to open it through the
      standard parse_pack_index function, permitting the header check
      to happen when the file is mapped.  However, the dumb HTTP client
      needs to verify a pack index before its moved into its proper file
      name within the objects/pack directory, to prevent a corrupt index
      from being made available.  So permit the caller to specify the
      exact path of the index file.
      
      For now we're still using the final destination name within the
      sole call site in http.c, but eventually we will start to parse
      the temporary path instead.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7b64469a
    • S
      Introduce close_pack_index to permit replacement · fa5fc15d
      Shawn O. Pearce 提交于
      By closing the pack index, a caller can later overwrite the index
      with an updated index file, possibly after converting from v1 to
      the v2 format.  Because p->index_data is NULL after close, on the
      next access the index will be opened again and the other members
      will be updated with new data.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      fa5fc15d
  12. 02 4月, 2010 2 次提交
    • J
      make commit_tree a library function · 40d52ff7
      Jeff King 提交于
      Until now, this has been part of the commit-tree builtin.
      However, it is already used by other builtins (like commit,
      merge, and notes), and it would be useful to access it from
      library code.
      
      The check_valid helper has to come along, too, but is given
      a more library-ish name of "assert_sha1_type".
      
      Otherwise, the code is unchanged. There are still a few
      rough edges for a library function, like printing the utf8
      warning to stderr, but we can address those if and when they
      come up as inappropriate.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      40d52ff7
    • J
      fix const-correctness of write_sha1_file · c00e657d
      Jeff King 提交于
      These should take const buffers as input data, but zlib's
      next_in pointer is not const-correct. Let's fix it at the
      zlib level, though, so the cast happens in one obvious
      place. This should be safe, as a similar cast is used in
      zlib's example code for a const array.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c00e657d
  13. 24 2月, 2010 1 次提交
    • N
      move encode_in_pack_object_header() to a better place · f965c525
      Nicolas Pitre 提交于
      Commit 1b22b6c8 made duplicated versions of encode_header() into a
      common version called encode_in_pack_object_header(). There is however
      a better location that sha1_file.c for such a function though, as
      sha1_file.c contains nothing related to the creation of packs, and
      it is quite populated already.
      
      Also the comment that was moved to the header file should really remain
      near the function as it covers implementation details and provides no
      information about the actual function interface.
      Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f965c525
  14. 23 2月, 2010 1 次提交
  15. 22 2月, 2010 3 次提交
    • N
      sha1_file: be paranoid when creating loose objects · 748af44c
      Nicolas Pitre 提交于
      We don't want the data being deflated and stored into loose objects
      to be different from what we expect.  While the deflated data is
      protected by a CRC which is good enough for safe data retrieval
      operations, we still want to be doubly sure that the source data used
      at object creation time is still what we expected once that data has
      been deflated and its CRC32 computed.
      
      The most plausible data corruption may occur if the source file is
      modified while Git is deflating and writing it out in a loose object.
      Or Git itself could have a bug causing memory corruption.  Or even bad
      RAM could cause trouble.  So it is best to make sure everything is
      coherent and checksum protected from beginning to end.
      
      To do so we compute the SHA1 of the data being deflated _after_ the
      deflate operation has consumed that data, and make sure it matches
      with the expected SHA1.  This way we can rely on the CRC32 checked by
      the inflate operation to provide a good indication that the data is still
      coherent with its SHA1 hash.  One pathological case we ignore is when
      the data is modified before (or during) deflate call, but changed back
      before it is hashed.
      
      There is some overhead of course. Using 'git add' on a set of large files:
      
      Before:
      
      	real    0m25.210s
      	user    0m23.783s
      	sys     0m1.408s
      
      After:
      
      	real    0m26.537s
      	user    0m25.175s
      	sys     0m1.358s
      
      The overhead is around 5% for full data coherency guarantee.
      Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      748af44c
    • D
      hash-object: don't use mmap() for small files · ea68b0ce
      Dmitry Potapov 提交于
      Using read() instead of mmap() can be 39% speed up for 1Kb files and is
      1% speed up 1Mb files. For larger files, it is better to use mmap(),
      because the difference between is not significant, and when there is not
      enough memory, mmap() performs much better, because it avoids swapping.
      Signed-off-by: NDmitry Potapov <dpotapov@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ea68b0ce
    • N
      sha1_file: don't malloc the whole compressed result when writing out objects · 9892beba
      Nicolas Pitre 提交于
      There is no real advantage to malloc the whole output buffer and
      deflate the data in a single pass when writing loose objects. That is
      like only 1% faster while using more memory, especially with large
      files where memory usage is far more. It is best to deflate and write
      the data out in small chunks reusing the same memory instead.
      
      For example, using 'git add' on a few large files averaging 40 MB ...
      
      Before:
      21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+828040outputs (0major+142640minor)pagefaults 0swaps
      
      After:
      21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+828040outputs (0major+104408minor)pagefaults 0swaps
      
      While the runtime stayed relatively the same, the number of minor page
      faults went down significantly.
      Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      9892beba
  16. 18 2月, 2010 1 次提交
  17. 17 2月, 2010 1 次提交
  18. 27 1月, 2010 2 次提交
  19. 22 1月, 2010 1 次提交
    • L
      slim down "git show-index" · a5031214
      Linus Torvalds 提交于
      As the documentation says, this is primarily for debugging, and
      in the longer term we should rename it to test-show-index or something.
      
      In the meantime, just avoid xmalloc (which slurps in the rest of git), and
      separating out the trivial hex functions into "hex.o".
      
      This results in
      
        [torvalds@nehalem git]$ size git-show-index
             text    data     bss     dec     hex filename
           222818    2276  112688  337782   52776 git-show-index (before)
             5696     624    1264    7584    1da0 git-show-index (after)
      
      which is a whole lot better.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a5031214
  20. 12 1月, 2010 1 次提交
  21. 22 10月, 2009 1 次提交
    • J
      Fix incorrect error check while reading deflated pack data · 39eea7bd
      Junio C Hamano 提交于
      The loop in get_size_from_delta() feeds a deflated delta data from the
      pack stream _until_ we get inflated result of 20 bytes[*] or we reach the
      end of stream.
      
          Side note. This magic number 20 does not have anything to do with the
          size of the hash we use, but comes from 1a3b55c6 (reduce delta head
          inflated size, 2006-10-18).
      
      The loop reads like this:
      
          do {
              in = use_pack();
              stream.next_in = in;
              st = git_inflate(&stream, Z_FINISH);
              curpos += stream.next_in - in;
          } while ((st == Z_OK || st == Z_BUF_ERROR) &&
                   stream.total_out < sizeof(delta_head));
      
      This git_inflate() can return:
      
       - Z_STREAM_END, if use_pack() fed it enough input and the delta itself
         was smaller than 20 bytes;
      
       - Z_OK, when some progress has been made;
      
       - Z_BUF_ERROR, if no progress is possible, because we either ran out of
         input (due to corrupt pack), or we ran out of output before we saw the
         end of the stream.
      
      The fix b3118bdc (sha1_file: Fix infinite loop when pack is corrupted,
      2009-10-14) attempted was against a corruption that appears to be a valid
      stream that produces a result larger than the output buffer, but we are
      not even trying to read the stream to the end in this loop.  If avail_out
      becomes zero, total_out will be the same as sizeof(delta_head) so the loop
      will terminate without the "fix".  There is no fix from b3118bdc needed for
      this loop, in other words.
      
      The loop in unpack_compressed_entry() is quite a different story.  It
      feeds a deflated stream (either delta or base) and allows the stream to
      produce output up to what we expect but no more.
      
          do {
              in = use_pack();
              stream.next_in = in;
              st = git_inflate(&stream, Z_FINISH);
              curpos += stream.next_in - in;
          } while (st == Z_OK || st == Z_BUF_ERROR)
      
      This _does_ risk falling into an endless interation, as we can exhaust
      avail_out if the length we expect is smaller than what the stream wants to
      produce (due to pack corruption).  In such a case, avail_out will become
      zero and inflate() will return Z_BUF_ERROR, while avail_in may (or may
      not) be zero.
      
      But this is not a right fix:
      
          do {
              in = use_pack();
              stream.next_in = in;
              st = git_inflate(&stream, Z_FINISH);
      +       if (st == Z_BUF_ERROR && (stream.avail_in || !stream.avail_out)
      +               break; /* wants more input??? */
              curpos += stream.next_in - in;
          } while (st == Z_OK || st == Z_BUF_ERROR)
      
      as Z_BUF_ERROR from inflate() may be telling us that avail_in has also run
      out before reading the end of stream marker.  In such a case, both avail_in
      and avail_out would be zero, and the loop should iterate to allow the end
      of stream marker to be seen by inflate from the input stream.
      
      The right fix for this loop is likely to be to increment the initial
      avail_out by one (we allocate one extra byte to terminate it with NUL
      anyway, so there is no risk to overrun the buffer), and break out if we
      see that avail_out has become zero, in order to detect that the stream
      wants to produce more than what we expect.  After the loop, we have a
      check that exactly tests this condition:
      
          if ((st != Z_STREAM_END) || stream.total_out != size) {
              free(buffer);
              return NULL;
          }
      
      So here is a patch (without my previous botched attempts) to fix this
      issue.  The first hunk reverts the corresponding hunk from b3118bdc, and
      the second hunk is the same fix proposed earlier.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      39eea7bd
  22. 15 10月, 2009 1 次提交
    • S
      sha1_file: Fix infinite loop when pack is corrupted · b3118bdc
      Shawn O. Pearce 提交于
      Some types of corruption to a pack may confuse the deflate stream
      which stores an object.  In Andy's reported case a 36 byte region
      of the pack was overwritten, leading to what appeared to be a valid
      deflate stream that was trying to produce a result larger than our
      allocated output buffer could accept.
      
      Z_BUF_ERROR is returned from inflate() if either the input buffer
      needs more input bytes, or the output buffer has run out of space.
      Previously we only considered the former case, as it meant we needed
      to move the stream's input buffer to the next window in the pack.
      
      We now abort the loop if inflate() returns Z_BUF_ERROR without
      consuming the entire input buffer it was given, or has filled
      the entire output buffer but has not yet returned Z_STREAM_END.
      Either state is a clear indicator that this loop is not working
      as expected, and should not continue.
      
      This problem cannot occur with loose objects as we open the entire
      loose object as a single buffer and treat Z_BUF_ERROR as an error.
      Reported-by: NAndy Isaacson <adi@hexapodia.org>
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b3118bdc
  23. 23 7月, 2009 1 次提交
  24. 28 6月, 2009 1 次提交
  25. 19 6月, 2009 1 次提交
    • L
      Fix big left-shifts of unsigned char · 48fb7deb
      Linus Torvalds 提交于
      Shifting 'unsigned char' or 'unsigned short' left can result in sign
      extension errors, since the C integer promotion rules means that the
      unsigned char/short will get implicitly promoted to a signed 'int' due to
      the shift (or due to other operations).
      
      This normally doesn't matter, but if you shift things up sufficiently, it
      will now set the sign bit in 'int', and a subsequent cast to a bigger type
      (eg 'long' or 'unsigned long') will now sign-extend the value despite the
      original expression being unsigned.
      
      One example of this would be something like
      
      	unsigned long size;
      	unsigned char c;
      
      	size += c << 24;
      
      where despite all the variables being unsigned, 'c << 24' ends up being a
      signed entity, and will get sign-extended when then doing the addition in
      an 'unsigned long' type.
      
      Since git uses 'unsigned char' pointers extensively, we actually have this
      bug in a couple of places.
      
      I may have missed some, but this is the result of looking at
      
      	git grep '[^0-9 	][ 	]*<<[ 	][a-z]' -- '*.c' '*.h'
      	git grep '<<[   ]*24'
      
      which catches at least the common byte cases (shifting variables by a
      variable amount, and shifting by 24 bits).
      
      I also grepped for just 'unsigned char' variables in general, and
      converted the ones that most obviously ended up getting implicitly cast
      immediately anyway (eg hash_name(), encode_85()).
      
      In addition to just avoiding 'unsigned char', this patch also tries to use
      a common idiom for the delta header size thing. We had three different
      variations on it: "& 0x7fUL" in one place (getting the sign extension
      right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
      right). Apart from making them all just avoid using "unsigned char" at
      all, I also unified them to then use a simple "& 0x7f".
      
      I considered making a sparse extension which warns about doing implicit
      casts from unsigned types to signed types, but it gets rather complex very
      quickly, so this is just a hack.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      48fb7deb
  26. 01 6月, 2009 2 次提交
  27. 21 5月, 2009 1 次提交
  28. 02 5月, 2009 1 次提交
  29. 30 4月, 2009 1 次提交
    • A
      replace direct calls to unlink(2) with unlink_or_warn · 691f1a28
      Alex Riesen 提交于
      This helps to notice when something's going wrong, especially on
      systems which lock open files.
      
      I used the following criteria when selecting the code for replacement:
      - it was already printing a warning for the unlink failures
      - it is in a function which already printing something or is
        called from such a function
      - it is in a static function, returning void and the function is only
        called from a builtin main function (cmd_)
      - it is in a function which handles emergency exit (signal handlers)
      - it is in a function which is obvously cleaning up the lockfiles
      Signed-off-by: NAlex Riesen <raa.lkml@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      691f1a28