1. 26 5月, 2015 1 次提交
    • M
      each_ref_fn: change to take an object_id parameter · 2b2a5be3
      Michael Haggerty 提交于
      Change typedef each_ref_fn to take a "const struct object_id *oid"
      parameter instead of "const unsigned char *sha1".
      
      To aid this transition, implement an adapter that can be used to wrap
      old-style functions matching the old typedef, which is now called
      "each_ref_sha1_fn"), and make such functions callable via the new
      interface. This requires the old function and its cb_data to be
      wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
      used as the cb_data for an adapter function, each_ref_fn_adapter().
      
      This is an enormous diff, but most of it consists of simple,
      mechanical changes to the sites that call any of the "for_each_ref"
      family of functions. Subsequent to this change, the call sites can be
      rewritten one by one to use the new interface.
      Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: Nbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      2b2a5be3
  2. 21 4月, 2015 1 次提交
    • J
      reachable: only mark local objects as recent · 1385bb7b
      Jeff King 提交于
      When pruning and repacking a repository that has an
      alternate object store configured, we may traverse a large
      number of objects in the alternate. This serves no purpose,
      and may be expensive to do. A longer explanation is below.
      
      Commits d3038d22 and abcb8655 taught prune and pack-objects
      (respectively) to treat "recent" objects as tips for
      reachability, so that we keep whole chunks of history. They
      built on the object traversal in 660c889e (sha1_file: add
      for_each iterators for loose and packed objects,
      2014-10-15), which covers both local and alternate objects.
      
      In both cases, covering alternate objects is unnecessary, as
      both commands can only drop objects from the local
      repository. In the case of prune, we traverse only the local
      object directory. And in the case of repacking, while we may
      or may not include local objects in our pack, we will never
      reach into the alternate with "repack -d". The "-l" option
      is only a question of whether we are migrating objects from
      the alternate into our repository, or leaving them
      untouched.
      
      It is possible that we may drop an object that is depended
      upon by another object in the alternate. For example,
      imagine two repositories, A and B, with A pointing to B as
      an alternate. Now imagine a commit that is in B which
      references a tree that is only in A. Traversing from recent
      objects in B might prevent A from dropping that tree. But
      this case isn't worth covering. Repo B should take
      responsibility for its own objects. It would never have had
      the commit in the first place if it did not also have the
      tree, and assuming it is using the same "keep recent chunks
      of history" scheme, then it would itself keep the tree, as
      well.
      
      So checking the alternate objects is not worth doing, and
      come with a significant performance impact. In both cases,
      we skip any recent objects that have already been marked
      SEEN (i.e., that we know are already reachable for prune, or
      included in the pack for a repack). So there is a slight
      waste of time in opening the alternate packs at all, only to
      notice that we have already considered each object. But much
      worse, the alternate repository may have a large number of
      objects that are not reachable from the local repository at
      all, and we end up adding them to the traversal.
      
      We can fix this by considering only local unseen objects.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1385bb7b
  3. 20 10月, 2014 1 次提交
  4. 17 10月, 2014 5 次提交
    • J
      pack-objects: match prune logic for discarding objects · abcb8655
      Jeff King 提交于
      A recent commit taught git-prune to keep non-recent objects
      that are reachable from recent ones. However, pack-objects,
      when loosening unreachable objects, tries to optimize out
      the write in the case that the object will be immediately
      pruned. It now gets this wrong, since its rule does not
      reflect the new prune code (and this can be seen by running
      t6501 with a strategically placed repack).
      
      Let's teach pack-objects similar logic.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      abcb8655
    • J
      prune: keep objects reachable from recent objects · d3038d22
      Jeff King 提交于
      Our current strategy with prune is that an object falls into
      one of three categories:
      
        1. Reachable (from ref tips, reflogs, index, etc).
      
        2. Not reachable, but recent (based on the --expire time).
      
        3. Not reachable and not recent.
      
      We keep objects from (1) and (2), but prune objects in (3).
      The point of (2) is that these objects may be part of an
      in-progress operation that has not yet updated any refs.
      
      However, it is not always the case that objects for an
      in-progress operation will have a recent mtime. For example,
      the object database may have an old copy of a blob (from an
      abandoned operation, a branch that was deleted, etc). If we
      create a new tree that points to it, a simultaneous prune
      will leave our tree, but delete the blob. Referencing that
      tree with a commit will then work (we check that the tree is
      in the object database, but not that all of its referred
      objects are), as will mentioning the commit in a ref. But
      the resulting repo is corrupt; we are missing the blob
      reachable from a ref.
      
      One way to solve this is to be more thorough when
      referencing a sha1: make sure that not only do we have that
      sha1, but that we have objects it refers to, and so forth
      recursively. The problem is that this is very expensive.
      Creating a parent link would require traversing the entire
      object graph!
      
      Instead, this patch pushes the extra work onto prune, which
      runs less frequently (and has to look at the whole object
      graph anyway). It creates a new category of objects: objects
      which are not recent, but which are reachable from a recent
      object. We do not prune these objects, just like the
      reachable and recent ones.
      
      This lets us avoid the recursive check above, because if we
      have an object, even if it is unreachable, we should have
      its referent. We can make a simple inductive argument that
      with this patch, this property holds (that there are no
      objects with missing referents in the repository):
      
        0. When we have no objects, we have nothing to refer or be
           referred to, so the property holds.
      
        1. If we add objects to the repository, their direct
           referents must generally exist (e.g., if you create a
           tree, the blobs it references must exist; if you create
           a commit to point at the tree, the tree must exist).
           This is already the case before this patch. And it is
           not 100% foolproof (you can make bogus objects using
           `git hash-object`, for example), but it should be the
           case for normal usage.
      
           Therefore for any sequence of object additions, the
           property will continue to hold.
      
        2. If we remove objects from the repository, then we will
           not remove a child object (like a blob) if an object
           that refers to it is being kept. That is the part
           implemented by this patch.
      
           Note, however, that our reachability check and the
           actual pruning are not atomic. So it _is_ still
           possible to violate the property (e.g., an object
           becomes referenced just as we are deleting it). This
           patch is shooting for eliminating problems where the
           mtimes of dependent objects differ by hours or days,
           and one is dropped without the other. It does nothing
           to help with short races.
      
      Naively, the simplest way to implement this would be to add
      all recent objects as tips to the reachability traversal.
      However, this does not perform well. In a recently-packed
      repository, all reachable objects will also be recent, and
      therefore we have to look at each object twice. This patch
      instead performs the reachability traversal, then follows up
      with a second traversal for recent objects, skipping any
      that have already been marked.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d3038d22
    • J
      reachable: mark index blobs as SEEN · 37254279
      Jeff King 提交于
      When we mark all reachable objects for pruning, that
      includes blobs mentioned by the index. However, we do not
      mark these with the SEEN flag, as we do for objects that we
      find by traversing (we also do not add them to the pending
      list, but that is because there is nothing further to
      traverse with them).
      
      This doesn't cause any problems with prune, because it
      checks only that the object exists in the global object
      hash, and not its flags. However, let's mark these objects
      to be consistent and avoid any later surprises.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      37254279
    • J
      reachable: reuse revision.c "add all reflogs" code · 718ccc97
      Jeff King 提交于
      We want to add all reflog entries as tips for finding
      reachable objects. The revision machinery can already do
      this (to support "rev-list --reflog"); we can reuse that
      code.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      718ccc97
    • J
      reachable: use traverse_commit_list instead of custom walk · 5f78a431
      Jeff King 提交于
      To find the set of reachable objects, we add a bunch of
      possible sources to our rev_info, call prepare_revision_walk,
      and then launch into a custom walker that handles each
      object top. This is a subset of what traverse_commit_list
      does, so we can just reuse that code (it can also handle
      more complex cases like UNINTERESTING commits and pathspecs,
      but we don't use those features).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5f78a431
  5. 04 9月, 2014 1 次提交
  6. 07 6月, 2013 1 次提交
    • J
      clear parsed flag when we free tree buffers · 6e454b9a
      Jeff King 提交于
      Many code paths will free a tree object's buffer and set it
      to NULL after finishing with it in order to keep memory
      usage down during a traversal. However, out of 8 sites that
      do this, only one actually unsets the "parsed" flag back.
      Those sites that don't are setting a trap for later users of
      the tree object; even after calling parse_tree, the buffer
      will remain NULL, causing potential segfaults.
      
      It is not known whether this is triggerable in the current
      code. Most commands do not do an in-memory traversal
      followed by actually using the objects again. However, it
      does not hurt to be safe for future callers.
      
      In most cases, we can abstract this out to a
      "free_tree_buffer" helper. However, there are two
      exceptions:
      
        1. The fsck code relies on the parsed flag to know that we
           were able to parse the object at one point. We can
           switch this to using a flag in the "flags" field.
      
        2. The index-pack code sets the buffer to NULL but does
           not free it (it is freed by a caller). We should still
           unset the parsed flag here, but we cannot use our
           helper, as we do not want to free the buffer.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      6e454b9a
  7. 18 3月, 2013 1 次提交
    • J
      use parse_object_or_die instead of die("bad object") · f7892d18
      Jeff King 提交于
      Some call-sites do:
      
        o = parse_object(sha1);
        if (!o)
      	  die("bad object %s", some_name);
      
      We can now handle that as a one-liner, and get more
      consistent output.
      
      In the third case of this patch, it looks like we are losing
      information, as the existing message also outputs the sha1
      hex; however, parse_object will already have written a more
      specific complaint about the sha1, so there is no point in
      repeating it here.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f7892d18
  8. 08 11月, 2011 2 次提交
  9. 23 3月, 2011 1 次提交
  10. 30 8月, 2010 1 次提交
  11. 09 4月, 2009 1 次提交
    • B
      process_{tree,blob}: Remove useless xstrdup calls · de551d47
      Björn Steinbrink 提交于
      The name of the processed object was duplicated for passing it to
      add_object(), but that already calls path_name, which allocates a new
      string anyway. So the memory allocated by the xstrdup calls just went
      nowhere, leaking memory.
      
      This reduces the RSS usage for a "rev-list --all --objects" by about 10% on
      the gentoo repo (fully packed) as well as linux-2.6.git:
      
          gentoo:
                          | old           | new
          ----------------|-------------------------------
          RSS             |       1537284 |       1388408
          VSZ             |       1816852 |       1667952
          time elapsed    |       1:49.62 |       1:48.99
          min. page faults|        417178 |        379919
      
          linux-2.6.git:
                          | old           | new
          ----------------|-------------------------------
          RSS             |        324452 |        292996
          VSZ             |        491792 |        460376
          time elapsed    |       0:14.53 |       0:14.28
          min. page faults|         89360 |         81613
      Signed-off-by: NBjörn Steinbrink <B.Steinbrink@gmx.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      de551d47
  12. 19 2月, 2008 3 次提交
  13. 18 2月, 2008 1 次提交
  14. 22 1月, 2008 1 次提交
    • L
      Make on-disk index representation separate from in-core one · 7a51ed66
      Linus Torvalds 提交于
      This converts the index explicitly on read and write to its on-disk
      format, allowing the in-core format to contain more flags, and be
      simpler.
      
      In particular, the in-core format is now host-endian (as opposed to the
      on-disk one that is network endian in order to be able to be shared
      across machines) and as a result we can dispense with all the
      htonl/ntohl on accesses to the cache_entry fields.
      
      This will make it easier to make use of various temporary flags that do
      not exist in the on-disk format.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a51ed66
  15. 03 7月, 2007 1 次提交
    • A
      Make git-prune submodule aware (and fix a SEGFAULT in the process) · 8d2244ba
      Andy Parkins 提交于
      I ran git-prune on a repository and got this:
      
       $ git-prune
       error: Object 228f8065b930120e35fc0c154c237487ab02d64a is a blob, not a commit
       Segmentation fault (core dumped)
      
      This repository was a strange one in that it was being used to provide
      its own submodule.  That is, the repository was cloned into a
      subdirectory, an independent branch checked out in that subdirectory,
      and then it was marked as a submodule.  git-prune then failed in the
      above manner.
      
      The problem was that git-prune was not submodule aware in two areas.
      
      Linus said:
      
       > So what happens is that something traverses a tree object, looks at each
       > entry, sees that it's not a tree, and tries to look it up as a blob. But
       > subprojects are commits, not blobs, and then when you look at the object
       > more closely, you get the above kind of object type confusion.
      
      and included a patch to add an S_ISGITLINK() test to reachable.c's
      process_tree() function.  That fixed the first git-prune error, and
      stopped it from trying to process the gitlink entries in trees as if
      they were pointers to other trees (and of course failing, because
      gitlinks _aren't_ trees).  That part of this patch is his.
      
      The second area is add_cache_refs().  This is called before starting the
      reachability analysis, and was calling lookup_blob() on every object
      hash found in the index.  However, it is no longer true that every hash
      in the index is a pointer to a blob, some of them are gitlinks, and are
      not backed by any object at all, they are commits in another repository.
      Normally this bug was not causing any problems, but in the case of the
      self-referencing repository described above, it meant that the gitlink
      hash was being marked as being of type OBJ_BLOB by add_cache_refs() call
      to lookup_blob().  Then later, because that hash was also pointed to by
      a ref, add_one_ref() would treat it as a commit; lookup_commit() would
      return a NULL because that object was already noted as being an
      OBJ_BLOB, not an OBJ_COMMIT; and parse_commit_buffer() would SEGFAULT on
      that NULL pointer.
      
      The fix made by this patch is to not blindly call lookup_blob() in
      reachable.c's add_cache_refs(), and instead skip any index entries that
      are S_ISGITLINK().
      Signed-off-by: NAndy Parkins <andyparkins@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      8d2244ba
  16. 22 3月, 2007 1 次提交
  17. 04 2月, 2007 1 次提交
  18. 09 1月, 2007 1 次提交
    • J
      Sanitize for_each_reflog_ent() · 883d60fa
      Johannes Schindelin 提交于
      It used to ignore the return value of the helper function; now, it
      expects it to return 0, and stops iteration upon non-zero return
      values; this value is then passed on as the return value of
      for_each_reflog_ent().
      
      Further, it makes no sense to force the parsing upon the helper
      functions; for_each_reflog_ent() now calls the helper function with
      old and new sha1, the email, the timestamp & timezone, and the message.
      Signed-off-by: NJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      883d60fa
  19. 07 1月, 2007 3 次提交
  20. 06 1月, 2007 1 次提交
    • J
      builtin-prune: memory diet. · 16157b80
      Junio C Hamano 提交于
      Somehow we forgot to turn save_commit_buffer off while walking
      the reachable objects.  Releasing the memory for commit object
      data that we do not use matters for large projects (for example,
      about 90MB is saved while traversing linux-2.6 history).
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      16157b80
  21. 21 12月, 2006 1 次提交
  22. 24 11月, 2006 1 次提交
  23. 22 11月, 2006 1 次提交
    • A
      Improve git-prune -n output · 21f88ac8
      Andy Parkins 提交于
      prune_object() in show_only mode would previously just show the path to the
      object that would be deleted.  The path the object is stored in shouldn't be
      shown to users, they only know about sha1 identifiers so show that instead.
      
      Further, the sha1 alone isn't that useful for examining what is going to be
      deleted.  This patch also adds the object type to the output, which makes it
      easy to pick out, say, the commits and use git-show to display them.
      Signed-off-by: NAndy Parkins <andyparkins@gmail.com>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      21f88ac8
  24. 23 10月, 2006 1 次提交
    • J
      Make prune also run prune-packed · 2eb53e65
      J. Bruce Fields 提交于
      Both the git-prune manpage and everday.txt say that git-prune should also prune
      unpacked objects that are also found in packs, by running git prune-packed.
      
      Junio thought this was "a regression when prune was rewritten as a built-in."
      
      So modify prune to call prune-packed again.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      2eb53e65
  25. 21 9月, 2006 2 次提交
    • J
      Tell between packed, unpacked and symbolic refs. · 8da19775
      Junio C Hamano 提交于
      This adds a "int *flag" parameter to resolve_ref() and makes
      for_each_ref() family to call callback function with an extra
      "int flag" parameter.  They are used to give two bits of
      information (REF_ISSYMREF and REF_ISPACKED) about the ref.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      8da19775
    • J
      Add callback data to for_each_ref() family. · cb5d709f
      Junio C Hamano 提交于
      This is a long overdue fix to the API for for_each_ref() family
      of functions.  It allows the callers to specify a callback data
      pointer, so that the caller does not have to use static
      variables to communicate with the callback funciton.
      
      The updated for_each_ref() family takes a function of type
      
      	int (*fn)(const char *, const unsigned char *, void *)
      
      and a void pointer as parameters, and calls the function with
      the name of the ref and its SHA-1 with the caller-supplied void
      pointer as parameters.
      
      The commit updates two callers, builtin-name-rev.c and
      builtin-pack-refs.c as an example.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      cb5d709f
  26. 02 9月, 2006 1 次提交
    • S
      Replace uses of strdup with xstrdup. · 9befac47
      Shawn Pearce 提交于
      Like xmalloc and xrealloc xstrdup dies with a useful message if
      the native strdup() implementation returns NULL rather than a
      valid pointer.
      
      I just tried to use xstrdup in new code and found it to be missing.
      However I expected it to be present as xmalloc and xrealloc are
      already commonly used throughout the code.
      
      [jc: removed the part that deals with last_XXX, which I am
       finding more and more dubious these days.]
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      9befac47
  27. 16 8月, 2006 1 次提交
  28. 04 8月, 2006 1 次提交
  29. 29 7月, 2006 2 次提交