1. 18 3月, 2013 1 次提交
    • J
      avoid segfaults on parse_object failure · 75a95490
      Jeff King 提交于
      Many call-sites of parse_object assume that they will get a
      non-NULL return value; this is not the case if we encounter
      an error while parsing the object.
      
      This patch adds a wrapper function around parse_object that
      handles dying automatically, and uses it anywhere we
      immediately try to access the return value as a non-NULL
      pointer (i.e., anywhere that we would currently segfault).
      
      This wrapper may also be useful in other places. The most
      obvious one is code like:
      
        o = parse_object(sha1);
        if (!o)
      	  die(...);
      
      However, these should not be mechanically converted to
      parse_object_or_die, as the die message is sometimes
      customized. Later patches can address these sites on a
      case-by-case basis.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      75a95490
  2. 01 5月, 2012 1 次提交
  3. 30 3月, 2012 1 次提交
  4. 08 3月, 2012 1 次提交
    • N
      parse_object: avoid putting whole blob in core · 090ea126
      Nguyễn Thái Ngọc Duy 提交于
      Traditionally, all the callers of check_sha1_signature() first
      called read_sha1_file() to prepare the whole object data in core,
      and called this function.  The function is used to revalidate what
      we read from the object database actually matches the object name we
      used to ask for the data from the object database.
      
      Update the API to allow callers to pass NULL as the object data, and
      have the function read and hash the object data using streaming API
      to recompute the object name, without having to hold everything in
      core at the same time.  This is most useful in parse_object() that
      parses a blob object, because this caller does not have to keep the
      actual blob data around in memory after a "struct blob" is returned.
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      090ea126
  5. 06 1月, 2012 1 次提交
    • J
      parse_object: try internal cache before reading object db · ccdc6037
      Jeff King 提交于
      When parse_object is called, we do the following:
      
        1. read the object data into a buffer via read_sha1_file
      
        2. call parse_object_buffer, which then:
      
           a. calls the appropriate lookup_{commit,tree,blob,tag}
      	to either create a new "struct object", or to find
      	an existing one. We know the appropriate type from
      	the lookup in step 1.
      
           b. calls the appropriate parse_{commit,tree,blob,tag}
              to parse the buffer for the new (or existing) object
      
      In step 2b, all of the called functions are no-ops for
      object "X" if "X->object.parsed" is set. I.e., when we have
      already parsed an object, we end up going to a lot of work
      just to find out at a low level that there is nothing left
      for us to do (and we throw away the data from read_sha1_file
      unread).
      
      We can optimize this by moving the check for "do we have an
      in-memory object" from 2a before the expensive call to
      read_sha1_file in step 1.
      
      This might seem circular, since step 2a uses the type
      information determined in step 1 to call the appropriate
      lookup function. However, we can notice that all of the
      lookup_* functions are backed by lookup_object. In other
      words, all of the objects are kept in a master hash table,
      and we don't actually need the type to do the "do we have
      it" part of the lookup, only to do the "and create it if it
      doesn't exist" part.
      
      This can save time whenever we call parse_object on the same
      sha1 twice in a single program. Some code paths already
      perform this optimization manually, with either:
      
        if (!obj->parsed)
      	  obj = parse_object(obj->sha1);
      
      if you already have a "struct object", or:
      
        struct object *obj = lookup_unknown_object(sha1);
        if (!obj || !obj->parsed)
      	  obj = parse_object(sha1);
      
      if you don't.  This patch moves the optimization into
      parse_object itself.
      
      Most git operations won't notice any impact. Either they
      don't parse a lot of duplicate sha1s, or the calling code
      takes special care not to re-parse objects. I timed two
      code paths that do benefit (there may be more, but these two
      were immediately obvious and easy to time).
      
      The first is fast-export, which calls parse_object on each
      object it outputs, like this:
      
        object = parse_object(sha1);
        if (!object)
      	  die(...);
        if (object->flags & SHOWN)
      	  return;
      
      which means that just to realize we have already shown an
      object, we will read the whole object from disk!
      
      With this patch, my best-of-five time for "fast-export --all" on
      git.git dropped from 26.3s to 21.3s.
      
      The second case is upload-pack, which will call parse_object
      for each advertised ref (because it needs to peel tags to
      show "^{}" entries). This doesn't matter for most
      repositories, because they don't have a lot of refs pointing
      to the same objects. However, if you have a big alternates
      repository with a shared object db for a number of child
      repositories, then the alternates repository will have
      duplicated refs representing each of its children.
      
      For example, GitHub's alternates repository for git.git has
      ~120,000 refs, of which only ~3200 are unique. The time for
      upload-pack to print its list of advertised refs dropped
      from 3.4s to 0.76s.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ccdc6037
  6. 17 11月, 2011 1 次提交
  7. 16 5月, 2011 1 次提交
  8. 06 9月, 2010 1 次提交
  9. 04 9月, 2010 1 次提交
    • N
      parse_object: pass on the original sha1, not the replaced one · 2e3400c0
      Nguyễn Thái Ngọc Duy 提交于
      Commit 0e87c367 (object: call "check_sha1_signature" with the
      replacement sha1) changed the first argument passed to
      parse_object_buffer() from "sha1" to "repl". With that change,
      the returned obj pointer has the replacement SHA1 in obj->sha1,
      not the original one.
      
      But when using lookup_commit() and then parse_commit() on a
      commit, we get an object pointer with the original sha1, but
      the commit content comes from the replacement commit.
      
      So the result we get from using parse_object() is different
      from the we get from using lookup_commit() followed by
      parse_commit().
      
      It looks much simpler and safer to fix this inconsistency by
      passing "sha1" to parse_object_bufer() instead of "repl".
      
      The commit comment should be used to tell the the replacement
      commit is replacing another commit and why. So it should be
      easy to see that we have a replacement commit instead of an
      original one.
      
      And it is not a problem if the content of the commit is not
      consistent with the sha1 as cat-file piped to hash-object can
      be used to see the difference.
      Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      2e3400c0
  10. 20 4月, 2010 1 次提交
    • J
      fix "bundle --stdin" segfault · 97a20eea
      Jonathan Nieder 提交于
      When passed an empty list, objects_array_remove_duplicates() corrupts it
      by changing the number of entries from 0 to 1.
      
      The problem lies in the condition of its main loop:
      
      	for (ref = 0; ref < array->nr - 1; ref++) {
      
      The loop body manipulates the supplied object array.  In the case of an
      empty array, it should not be doing anything at all.  But array->nr is an
      unsigned quantity, so the code enters the loop, in particular increasing
      array->nr.  Fix this by comparing (ref + 1 < array->nr) instead.
      
      This bug can be triggered by git bundle --stdin:
      
      	$ echo HEAD | git bundle create some.bundle --stdin’
      	Segmentation fault (core dumped)
      
      The list of commits to bundle appears to be empty because of another bug:
      by the time the revision-walking machinery gets to look at it, standard
      input has already been consumed by rev-list, so this function gets an
      empty list of revisions.
      
      After this patch, git bundle --stdin still does not work; it just doesn’t
      segfault any more.
      Reported-by: NJoey Hess <joey@kitenet.net>
      Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      97a20eea
  11. 18 1月, 2010 1 次提交
  12. 01 6月, 2009 1 次提交
  13. 20 5月, 2009 1 次提交
  14. 17 5月, 2009 1 次提交
  15. 18 1月, 2009 1 次提交
  16. 04 2月, 2008 1 次提交
  17. 23 12月, 2007 1 次提交
  18. 07 6月, 2007 1 次提交
  19. 25 5月, 2007 1 次提交
  20. 24 4月, 2007 1 次提交
  21. 17 4月, 2007 2 次提交
  22. 21 3月, 2007 1 次提交
  23. 27 2月, 2007 3 次提交
  24. 17 9月, 2006 1 次提交
  25. 28 8月, 2006 1 次提交
  26. 24 8月, 2006 1 次提交
    • S
      Convert memcpy(a,b,20) to hashcpy(a,b). · e702496e
      Shawn Pearce 提交于
      This abstracts away the size of the hash values when copying them
      from memory location to memory location, much as the introduction
      of hashcmp abstracted away hash value comparsion.
      
      A few call sites were using char* rather than unsigned char* so
      I added the cast rather than open hashcpy to be void*.  This is a
      reasonable tradeoff as most call sites already use unsigned char*
      and the existing hashcmp is also declared to be unsigned char*.
      
      [jc: Splitted the patch to "master" part, to be followed by a
       patch for merge-recursive.c which is not in "master" yet.
      
       Fixed the cast in the latter hunk to combine-diff.c which was
       wrong in the original.
      
       Also converted ones left-over in combine-diff.c, diff-lib.c and
       upload-pack.c ]
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      e702496e
  27. 18 8月, 2006 1 次提交
  28. 13 7月, 2006 1 次提交
  29. 05 7月, 2006 1 次提交
  30. 03 7月, 2006 1 次提交
  31. 02 7月, 2006 1 次提交
    • L
      git object hash cleanups · 0556a11a
      Linus Torvalds 提交于
      This IMNSHO cleans up the object hashing.
      
      The hash expansion is separated out into a function of its own, the hash
      array (and size) names are made more obvious, and the code is generally
      made to look a bit more like the object-ref hashing.
      
      It also gets rid of "find_object()" returning an index (or negative
      position if no object is found), since that is made redundant by the
      simplified object rehashing. The basic operation is now "lookup_object()"
      which just returns the object itself.
      
      There's an almost unmeasurable speed increase, but more importantly, I
      think the end result is more readable.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      0556a11a
  32. 30 6月, 2006 1 次提交
    • L
      Abstract out accesses to object hash array · fc046a75
      Linus Torvalds 提交于
      There are a few special places where some programs accessed the object
      hash array directly, which bothered me because I wanted to play with some
      simple re-organizations.
      
      So this patch makes the object hash array data structures all entirely
      local to object.c, and the few users who wanted to look at it now get to
      use a function to query how many object index entries there can be, and to
      actually access the array.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      fc046a75
  33. 20 6月, 2006 1 次提交
    • L
      Add "named object array" concept · 1f1e895f
      Linus Torvalds 提交于
      We've had this notion of a "object_list" for a long time, which eventually
      grew a "name" member because some users (notably git-rev-list) wanted to
      name each object as it is generated.
      
      That object_list is great for some things, but it isn't all that wonderful
      for others, and the "name" member is generally not used by everybody.
      
      This patch splits the users of the object_list array up into two: the
      traditional list users, who want the list-like format, and who don't
      actually use or want the name. And another class of users that really used
      the list as an extensible array, and generally wanted to name the objects.
      
      The patch is fairly straightforward, but it's also biggish. Most of it
      really just cleans things up: switching the revision parsing and listing
      over to the array makes things like the builtin-diff usage much simpler
      (we now see exactly how many members the array has, and we don't get the
      objects reversed from the order they were on the command line).
      
      One of the main reasons for doing this at all is that the malloc overhead
      of the simple object list was actually pretty high, and the array is just
      a lot denser. So this patch brings down memory usage by git-rev-list by
      just under 3% (on top of all the other memory use optimizations) on the
      mozilla archive.
      
      It does add more lines than it removes, and more importantly, it adds a
      whole new infrastructure for maintaining lists of objects, but on the
      other hand, the new dynamic array code is pretty obvious. The change to
      builtin-diff-tree.c shows a fairly good example of why an array interface
      is sometimes more natural, and just much simpler for everybody.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      1f1e895f
  34. 19 6月, 2006 1 次提交
    • L
      Remove "refs" field from "struct object" · 3e4339e6
      Linus Torvalds 提交于
      This shrinks "struct object" to the absolutely minimal size possible.
      It now contains /only/ the object flags and the SHA1 hash name of the
      object.
      
      The "refs" field, which is really needed only for fsck, is maintained in
      a separate hashed lookup-table, allowing all normal users to totally
      ignore it.
      
      This helps memory usage, although not as much as I hoped: it looks like
      the allocation overhead of malloc (and the alignment constraints in
      particular) means that while the structure size shrinks, the actual
      allocation overhead mostly does not.
      
      [ That said: memory usage is actually down, but not as much as it should
        be: I suspect just one of the object types actually ended up shrinking
        its effective allocation size.
      
        To get to the next level, we probably need specialized allocators that
        don't pad the allocation more than necessary. ]
      
      The separation makes for some code cleanup, though, and makes the ref
      tracking that fsck wants a clearly separate thing.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      3e4339e6
  35. 18 6月, 2006 1 次提交
    • L
      Shrink "struct object" a bit · 885a86ab
      Linus Torvalds 提交于
      This shrinks "struct object" by a small amount, by getting rid of the
      "struct type *" pointer and replacing it with a 3-bit bitfield instead.
      
      In addition, we merge the bitfields and the "flags" field, which
      incidentally should also remove a useless 4-byte padding from the object
      when in 64-bit mode.
      
      Now, our "struct object" is still too damn large, but it's now less
      obviously bloated, and of the remaining fields, only the "util" (which is
      not used by most things) is clearly something that should be eventually
      discarded.
      
      This shrinks the "git-rev-list --all" memory use by about 2.5% on the
      kernel archive (and, perhaps more importantly, on the larger mozilla
      archive). That may not sound like much, but I suspect it's more on a
      64-bit platform.
      
      There are other remaining inefficiencies (the parent lists, for example,
      probably have horrible malloc overhead), but this was pretty obvious.
      
      Most of the patch is just changing the comparison of the "type" pointer
      from one of the constant string pointers to the appropriate new TYPE_xxx
      small integer constant.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      885a86ab
  36. 30 5月, 2006 2 次提交