1. 08 5月, 2017 1 次提交
  2. 23 2月, 2017 1 次提交
  3. 10 5月, 2016 1 次提交
  4. 17 3月, 2016 2 次提交
    • J
      list-objects: pass full pathname to callbacks · 2824e184
      Jeff King 提交于
      When we find a blob at "a/b/c", we currently pass this to
      our show_object_fn callbacks as two components: "a/b/" and
      "c". Callbacks which want the full value then call
      path_name(), which concatenates the two. But this is an
      inefficient interface; the path is a strbuf, and we could
      simply append "c" to it temporarily, then roll back the
      length, without creating a new copy.
      
      So we could improve this by teaching the callsites of
      path_name() this trick (and there are only 3). But we can
      also notice that no callback actually cares about the
      broken-down representation, and simply pass each callback
      the full path "a/b/c" as a string. The callback code becomes
      even simpler, then, as we do not have to worry about freeing
      an allocated buffer, nor rolling back our modification to
      the strbuf.
      
      This is theoretically less efficient, as some callbacks
      would not bother to format the final path component. But in
      practice this is not measurable. Since we use the same
      strbuf over and over, our work to grow it is amortized, and
      we really only pay to memcpy a few bytes.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      2824e184
    • J
      list-objects: drop name_path entirely · dc06dc88
      Jeff King 提交于
      In the previous commit, we left name_path as a thin wrapper
      around a strbuf. This patch drops it entirely. As a result,
      every show_object_fn callback needs to be adjusted. However,
      none of their code needs to be changed at all, because the
      only use was to pass it to path_name(), which now handles
      the bare strbuf.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      dc06dc88
  5. 13 2月, 2016 2 次提交
    • J
      list-objects: pass full pathname to callbacks · de1e67d0
      Jeff King 提交于
      When we find a blob at "a/b/c", we currently pass this to
      our show_object_fn callbacks as two components: "a/b/" and
      "c". Callbacks which want the full value then call
      path_name(), which concatenates the two. But this is an
      inefficient interface; the path is a strbuf, and we could
      simply append "c" to it temporarily, then roll back the
      length, without creating a new copy.
      
      So we could improve this by teaching the callsites of
      path_name() this trick (and there are only 3). But we can
      also notice that no callback actually cares about the
      broken-down representation, and simply pass each callback
      the full path "a/b/c" as a string. The callback code becomes
      even simpler, then, as we do not have to worry about freeing
      an allocated buffer, nor rolling back our modification to
      the strbuf.
      
      This is theoretically less efficient, as some callbacks
      would not bother to format the final path component. But in
      practice this is not measurable. Since we use the same
      strbuf over and over, our work to grow it is amortized, and
      we really only pay to memcpy a few bytes.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      de1e67d0
    • J
      list-objects: drop name_path entirely · bd64516a
      Jeff King 提交于
      In the previous commit, we left name_path as a thin wrapper
      around a strbuf. This patch drops it entirely. As a result,
      every show_object_fn callback needs to be adjusted. However,
      none of their code needs to be changed at all, because the
      only use was to pass it to path_name(), which now handles
      the bare strbuf.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      bd64516a
  6. 09 10月, 2015 1 次提交
  7. 26 5月, 2015 2 次提交
  8. 21 4月, 2015 1 次提交
    • J
      reachable: only mark local objects as recent · 1385bb7b
      Jeff King 提交于
      When pruning and repacking a repository that has an
      alternate object store configured, we may traverse a large
      number of objects in the alternate. This serves no purpose,
      and may be expensive to do. A longer explanation is below.
      
      Commits d3038d22 and abcb8655 taught prune and pack-objects
      (respectively) to treat "recent" objects as tips for
      reachability, so that we keep whole chunks of history. They
      built on the object traversal in 660c889e (sha1_file: add
      for_each iterators for loose and packed objects,
      2014-10-15), which covers both local and alternate objects.
      
      In both cases, covering alternate objects is unnecessary, as
      both commands can only drop objects from the local
      repository. In the case of prune, we traverse only the local
      object directory. And in the case of repacking, while we may
      or may not include local objects in our pack, we will never
      reach into the alternate with "repack -d". The "-l" option
      is only a question of whether we are migrating objects from
      the alternate into our repository, or leaving them
      untouched.
      
      It is possible that we may drop an object that is depended
      upon by another object in the alternate. For example,
      imagine two repositories, A and B, with A pointing to B as
      an alternate. Now imagine a commit that is in B which
      references a tree that is only in A. Traversing from recent
      objects in B might prevent A from dropping that tree. But
      this case isn't worth covering. Repo B should take
      responsibility for its own objects. It would never have had
      the commit in the first place if it did not also have the
      tree, and assuming it is using the same "keep recent chunks
      of history" scheme, then it would itself keep the tree, as
      well.
      
      So checking the alternate objects is not worth doing, and
      come with a significant performance impact. In both cases,
      we skip any recent objects that have already been marked
      SEEN (i.e., that we know are already reachable for prune, or
      included in the pack for a repack). So there is a slight
      waste of time in opening the alternate packs at all, only to
      notice that we have already considered each object. But much
      worse, the alternate repository may have a large number of
      objects that are not reachable from the local repository at
      all, and we end up adding them to the traversal.
      
      We can fix this by considering only local unseen objects.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1385bb7b
  9. 20 10月, 2014 1 次提交
  10. 17 10月, 2014 5 次提交
    • J
      pack-objects: match prune logic for discarding objects · abcb8655
      Jeff King 提交于
      A recent commit taught git-prune to keep non-recent objects
      that are reachable from recent ones. However, pack-objects,
      when loosening unreachable objects, tries to optimize out
      the write in the case that the object will be immediately
      pruned. It now gets this wrong, since its rule does not
      reflect the new prune code (and this can be seen by running
      t6501 with a strategically placed repack).
      
      Let's teach pack-objects similar logic.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      abcb8655
    • J
      prune: keep objects reachable from recent objects · d3038d22
      Jeff King 提交于
      Our current strategy with prune is that an object falls into
      one of three categories:
      
        1. Reachable (from ref tips, reflogs, index, etc).
      
        2. Not reachable, but recent (based on the --expire time).
      
        3. Not reachable and not recent.
      
      We keep objects from (1) and (2), but prune objects in (3).
      The point of (2) is that these objects may be part of an
      in-progress operation that has not yet updated any refs.
      
      However, it is not always the case that objects for an
      in-progress operation will have a recent mtime. For example,
      the object database may have an old copy of a blob (from an
      abandoned operation, a branch that was deleted, etc). If we
      create a new tree that points to it, a simultaneous prune
      will leave our tree, but delete the blob. Referencing that
      tree with a commit will then work (we check that the tree is
      in the object database, but not that all of its referred
      objects are), as will mentioning the commit in a ref. But
      the resulting repo is corrupt; we are missing the blob
      reachable from a ref.
      
      One way to solve this is to be more thorough when
      referencing a sha1: make sure that not only do we have that
      sha1, but that we have objects it refers to, and so forth
      recursively. The problem is that this is very expensive.
      Creating a parent link would require traversing the entire
      object graph!
      
      Instead, this patch pushes the extra work onto prune, which
      runs less frequently (and has to look at the whole object
      graph anyway). It creates a new category of objects: objects
      which are not recent, but which are reachable from a recent
      object. We do not prune these objects, just like the
      reachable and recent ones.
      
      This lets us avoid the recursive check above, because if we
      have an object, even if it is unreachable, we should have
      its referent. We can make a simple inductive argument that
      with this patch, this property holds (that there are no
      objects with missing referents in the repository):
      
        0. When we have no objects, we have nothing to refer or be
           referred to, so the property holds.
      
        1. If we add objects to the repository, their direct
           referents must generally exist (e.g., if you create a
           tree, the blobs it references must exist; if you create
           a commit to point at the tree, the tree must exist).
           This is already the case before this patch. And it is
           not 100% foolproof (you can make bogus objects using
           `git hash-object`, for example), but it should be the
           case for normal usage.
      
           Therefore for any sequence of object additions, the
           property will continue to hold.
      
        2. If we remove objects from the repository, then we will
           not remove a child object (like a blob) if an object
           that refers to it is being kept. That is the part
           implemented by this patch.
      
           Note, however, that our reachability check and the
           actual pruning are not atomic. So it _is_ still
           possible to violate the property (e.g., an object
           becomes referenced just as we are deleting it). This
           patch is shooting for eliminating problems where the
           mtimes of dependent objects differ by hours or days,
           and one is dropped without the other. It does nothing
           to help with short races.
      
      Naively, the simplest way to implement this would be to add
      all recent objects as tips to the reachability traversal.
      However, this does not perform well. In a recently-packed
      repository, all reachable objects will also be recent, and
      therefore we have to look at each object twice. This patch
      instead performs the reachability traversal, then follows up
      with a second traversal for recent objects, skipping any
      that have already been marked.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d3038d22
    • J
      reachable: mark index blobs as SEEN · 37254279
      Jeff King 提交于
      When we mark all reachable objects for pruning, that
      includes blobs mentioned by the index. However, we do not
      mark these with the SEEN flag, as we do for objects that we
      find by traversing (we also do not add them to the pending
      list, but that is because there is nothing further to
      traverse with them).
      
      This doesn't cause any problems with prune, because it
      checks only that the object exists in the global object
      hash, and not its flags. However, let's mark these objects
      to be consistent and avoid any later surprises.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      37254279
    • J
      reachable: reuse revision.c "add all reflogs" code · 718ccc97
      Jeff King 提交于
      We want to add all reflog entries as tips for finding
      reachable objects. The revision machinery can already do
      this (to support "rev-list --reflog"); we can reuse that
      code.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      718ccc97
    • J
      reachable: use traverse_commit_list instead of custom walk · 5f78a431
      Jeff King 提交于
      To find the set of reachable objects, we add a bunch of
      possible sources to our rev_info, call prepare_revision_walk,
      and then launch into a custom walker that handles each
      object top. This is a subset of what traverse_commit_list
      does, so we can just reuse that code (it can also handle
      more complex cases like UNINTERESTING commits and pathspecs,
      but we don't use those features).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5f78a431
  11. 04 9月, 2014 1 次提交
  12. 07 6月, 2013 1 次提交
    • J
      clear parsed flag when we free tree buffers · 6e454b9a
      Jeff King 提交于
      Many code paths will free a tree object's buffer and set it
      to NULL after finishing with it in order to keep memory
      usage down during a traversal. However, out of 8 sites that
      do this, only one actually unsets the "parsed" flag back.
      Those sites that don't are setting a trap for later users of
      the tree object; even after calling parse_tree, the buffer
      will remain NULL, causing potential segfaults.
      
      It is not known whether this is triggerable in the current
      code. Most commands do not do an in-memory traversal
      followed by actually using the objects again. However, it
      does not hurt to be safe for future callers.
      
      In most cases, we can abstract this out to a
      "free_tree_buffer" helper. However, there are two
      exceptions:
      
        1. The fsck code relies on the parsed flag to know that we
           were able to parse the object at one point. We can
           switch this to using a flag in the "flags" field.
      
        2. The index-pack code sets the buffer to NULL but does
           not free it (it is freed by a caller). We should still
           unset the parsed flag here, but we cannot use our
           helper, as we do not want to free the buffer.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      6e454b9a
  13. 18 3月, 2013 1 次提交
    • J
      use parse_object_or_die instead of die("bad object") · f7892d18
      Jeff King 提交于
      Some call-sites do:
      
        o = parse_object(sha1);
        if (!o)
      	  die("bad object %s", some_name);
      
      We can now handle that as a one-liner, and get more
      consistent output.
      
      In the third case of this patch, it looks like we are losing
      information, as the existing message also outputs the sha1
      hex; however, parse_object will already have written a more
      specific complaint about the sha1, so there is no point in
      repeating it here.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f7892d18
  14. 08 11月, 2011 2 次提交
  15. 23 3月, 2011 1 次提交
  16. 30 8月, 2010 1 次提交
  17. 09 4月, 2009 1 次提交
    • B
      process_{tree,blob}: Remove useless xstrdup calls · de551d47
      Björn Steinbrink 提交于
      The name of the processed object was duplicated for passing it to
      add_object(), but that already calls path_name, which allocates a new
      string anyway. So the memory allocated by the xstrdup calls just went
      nowhere, leaking memory.
      
      This reduces the RSS usage for a "rev-list --all --objects" by about 10% on
      the gentoo repo (fully packed) as well as linux-2.6.git:
      
          gentoo:
                          | old           | new
          ----------------|-------------------------------
          RSS             |       1537284 |       1388408
          VSZ             |       1816852 |       1667952
          time elapsed    |       1:49.62 |       1:48.99
          min. page faults|        417178 |        379919
      
          linux-2.6.git:
                          | old           | new
          ----------------|-------------------------------
          RSS             |        324452 |        292996
          VSZ             |        491792 |        460376
          time elapsed    |       0:14.53 |       0:14.28
          min. page faults|         89360 |         81613
      Signed-off-by: NBjörn Steinbrink <B.Steinbrink@gmx.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      de551d47
  18. 19 2月, 2008 3 次提交
  19. 18 2月, 2008 1 次提交
  20. 22 1月, 2008 1 次提交
    • L
      Make on-disk index representation separate from in-core one · 7a51ed66
      Linus Torvalds 提交于
      This converts the index explicitly on read and write to its on-disk
      format, allowing the in-core format to contain more flags, and be
      simpler.
      
      In particular, the in-core format is now host-endian (as opposed to the
      on-disk one that is network endian in order to be able to be shared
      across machines) and as a result we can dispense with all the
      htonl/ntohl on accesses to the cache_entry fields.
      
      This will make it easier to make use of various temporary flags that do
      not exist in the on-disk format.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a51ed66
  21. 03 7月, 2007 1 次提交
    • A
      Make git-prune submodule aware (and fix a SEGFAULT in the process) · 8d2244ba
      Andy Parkins 提交于
      I ran git-prune on a repository and got this:
      
       $ git-prune
       error: Object 228f8065b930120e35fc0c154c237487ab02d64a is a blob, not a commit
       Segmentation fault (core dumped)
      
      This repository was a strange one in that it was being used to provide
      its own submodule.  That is, the repository was cloned into a
      subdirectory, an independent branch checked out in that subdirectory,
      and then it was marked as a submodule.  git-prune then failed in the
      above manner.
      
      The problem was that git-prune was not submodule aware in two areas.
      
      Linus said:
      
       > So what happens is that something traverses a tree object, looks at each
       > entry, sees that it's not a tree, and tries to look it up as a blob. But
       > subprojects are commits, not blobs, and then when you look at the object
       > more closely, you get the above kind of object type confusion.
      
      and included a patch to add an S_ISGITLINK() test to reachable.c's
      process_tree() function.  That fixed the first git-prune error, and
      stopped it from trying to process the gitlink entries in trees as if
      they were pointers to other trees (and of course failing, because
      gitlinks _aren't_ trees).  That part of this patch is his.
      
      The second area is add_cache_refs().  This is called before starting the
      reachability analysis, and was calling lookup_blob() on every object
      hash found in the index.  However, it is no longer true that every hash
      in the index is a pointer to a blob, some of them are gitlinks, and are
      not backed by any object at all, they are commits in another repository.
      Normally this bug was not causing any problems, but in the case of the
      self-referencing repository described above, it meant that the gitlink
      hash was being marked as being of type OBJ_BLOB by add_cache_refs() call
      to lookup_blob().  Then later, because that hash was also pointed to by
      a ref, add_one_ref() would treat it as a commit; lookup_commit() would
      return a NULL because that object was already noted as being an
      OBJ_BLOB, not an OBJ_COMMIT; and parse_commit_buffer() would SEGFAULT on
      that NULL pointer.
      
      The fix made by this patch is to not blindly call lookup_blob() in
      reachable.c's add_cache_refs(), and instead skip any index entries that
      are S_ISGITLINK().
      Signed-off-by: NAndy Parkins <andyparkins@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      8d2244ba
  22. 22 3月, 2007 1 次提交
  23. 04 2月, 2007 1 次提交
  24. 09 1月, 2007 1 次提交
    • J
      Sanitize for_each_reflog_ent() · 883d60fa
      Johannes Schindelin 提交于
      It used to ignore the return value of the helper function; now, it
      expects it to return 0, and stops iteration upon non-zero return
      values; this value is then passed on as the return value of
      for_each_reflog_ent().
      
      Further, it makes no sense to force the parsing upon the helper
      functions; for_each_reflog_ent() now calls the helper function with
      old and new sha1, the email, the timestamp & timezone, and the message.
      Signed-off-by: NJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      883d60fa
  25. 07 1月, 2007 3 次提交
  26. 06 1月, 2007 1 次提交
    • J
      builtin-prune: memory diet. · 16157b80
      Junio C Hamano 提交于
      Somehow we forgot to turn save_commit_buffer off while walking
      the reachable objects.  Releasing the memory for commit object
      data that we do not use matters for large projects (for example,
      about 90MB is saved while traversing linux-2.6 history).
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      16157b80
  27. 21 12月, 2006 1 次提交
  28. 24 11月, 2006 1 次提交