1. 18 6月, 2006 2 次提交
    • L
      Some more memory leak avoidance · cb115748
      Linus Torvalds 提交于
      This is really the dregs of my effort to not waste memory in git-rev-list,
      and makes barely one percent of a difference in the memory footprint, but
      hey, it's also a pretty small patch.
      
      It discards the parent lists and the commit buffer after the commit has
      been shown by git-rev-list (and "git log" - which already did the commit
      buffer part), and frees the commit list entry that was used by the
      revision walker.
      
      The big win would be to get rid of the "refs" pointer in the object
      structure (another 5%), because it's only used by fsck. That would require
      some pretty major surgery to fsck, though, so I'm timid and did the less
      interesting but much easier part instead.
      
      This (percentually) makes a bigger difference to "git log" and friends,
      since those are walking _just_ commits, and thus the list entries tend to
      be a bigger percentage of the memory use. But the "list all objects" case
      does improve too.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      cb115748
    • L
      Shrink "struct object" a bit · 885a86ab
      Linus Torvalds 提交于
      This shrinks "struct object" by a small amount, by getting rid of the
      "struct type *" pointer and replacing it with a 3-bit bitfield instead.
      
      In addition, we merge the bitfields and the "flags" field, which
      incidentally should also remove a useless 4-byte padding from the object
      when in 64-bit mode.
      
      Now, our "struct object" is still too damn large, but it's now less
      obviously bloated, and of the remaining fields, only the "util" (which is
      not used by most things) is clearly something that should be eventually
      discarded.
      
      This shrinks the "git-rev-list --all" memory use by about 2.5% on the
      kernel archive (and, perhaps more importantly, on the larger mozilla
      archive). That may not sound like much, but I suspect it's more on a
      64-bit platform.
      
      There are other remaining inefficiencies (the parent lists, for example,
      probably have horrible malloc overhead), but this was pretty obvious.
      
      Most of the patch is just changing the comparison of the "type" pointer
      from one of the constant string pointers to the appropriate new TYPE_xxx
      small integer constant.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      885a86ab
  2. 06 6月, 2006 1 次提交
  3. 31 5月, 2006 1 次提交
    • L
      tree_entry(): new tree-walking helper function · 4c068a98
      Linus Torvalds 提交于
      This adds a "tree_entry()" function that combines the common operation of
      doing a "tree_entry_extract()" + "update_tree_entry()".
      
      It also has a simplified calling convention, designed for simple loops
      that traverse over a whole tree: the arguments are pointers to the tree
      descriptor and a name_entry structure to fill in, and it returns a boolean
      "true" if there was an entry left to be gotten in the tree.
      
      This allows tree traversal with
      
      	struct tree_desc desc;
      	struct name_entry entry;
      
      	desc.buf = tree->buffer;
      	desc.size = tree->size;
      	while (tree_entry(&desc, &entry) {
      		... use "entry.{path, sha1, mode, pathlen}" ...
      	}
      
      which is not only shorter than writing it out in full, it's hopefully less
      error prone too.
      
      [ It's actually a tad faster too - we don't need to recalculate the entry
        pathlength in both extract and update, but need to do it only once.
        Also, some callers can avoid doing a "strlen()" on the result, since
        it's returned as part of the name_entry structure.
      
        However, by now we're talking just 1% speedup on "git-rev-list --objects
        --all", and we're definitely at the point where tree walking is no
        longer the issue any more. ]
      
      NOTE! Not everybody wants to use this new helper function, since some of
      the tree walkers very much on purpose do the descriptor update separately
      from the entry extraction. So the "extract + update" sequence still
      remains as the core sequence, this is just a simplified interface.
      
      We should probably add a silly two-line inline helper function for
      initializing the descriptor from the "struct tree" too, just to cut down
      on the noise from that common "desc" initializer.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      4c068a98
  4. 30 5月, 2006 3 次提交
  5. 29 5月, 2006 4 次提交
    • L
      Remove "tree->entries" tree-entry list from tree parser · 097dc3d8
      Linus Torvalds 提交于
      This finally removes the tree-entry list from "struct tree", since most of
      the users can just use the tree-walk infrastructure to walk the raw tree
      buffers instead of the tree-entry list.
      
      The tree-entry list is inefficient, and generates tons of small
      allocations for no good reason. The tree-walk infrastructure is generally
      no harder to use than following a linked list, and allows us to do most
      tree parsing in-place.
      
      Some programs still use the old tree-entry lists, and are a bit painful to
      convert without major surgery. For them we have a helper function that
      creates a temporary tree-entry list on demand. We can convert those too
      eventually, but with this they no longer affect any users who don't need
      the explicit lists.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      097dc3d8
    • L
      Make "tree_entry" have a SHA1 instead of a union of object pointers · a755dfe4
      Linus Torvalds 提交于
      This is preparatory work for further cleanups, where we try to make
      tree_entry look more like the more efficient tree-walk descriptor.
      
      Instead of having a union of pointers to blob/tree/objects, this just
      makes "struct tree_entry" have the raw SHA1, and makes all the users use
      that instead (often that implies adding a "lookup_tree(..)" on the sha1,
      but sometimes the user just wanted the SHA1 in the first place, and it
      just avoids an unnecessary indirection).
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      a755dfe4
    • L
      Add raw tree buffer info to "struct tree" · d2eafb76
      Linus Torvalds 提交于
      This allows us to avoid allocating information for names etc, because
      we can just use the information from the tree buffer directly.
      
      We still keep the old "tree_entry_list" in struct tree as well, so old
      users aren't affected, apart from the fact that the allocations are
      different (if you free a tree entry, you should no longer free the name
      allocation for it, since it's allocated as part of "tree->buffer")
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      d2eafb76
    • L
      Fix memory leak in "git rev-list --objects" · 91b452cb
      Linus Torvalds 提交于
      Martin Langhoff points out that "git repack -a" ends up using up a lot of
      memory for big archives, and that git cvsimport probably should do only
      incremental repacks in order to avoid having repacking flush all the
      caches.
      
      The big majority of the memory usage of repacking is from git rev-list
      tracking all objects, and this patch should go a long way in avoiding the
      excessive memory usage: the bulk of it was due to the object names being
      leaked from the tree parser.
      
      For the historic Linux kernel archive, this simple patch does:
      
      Before:
      	/usr/bin/time git-rev-list --all --objects > /dev/null
      
      	72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
      	0inputs+0outputs (0major+125376minor)pagefaults 0swaps
      
      After:
      	/usr/bin/time git-rev-list --all --objects > /dev/null
      
      	75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
      	0inputs+0outputs (0major+43921minor)pagefaults 0swaps
      
      where we do end up wasting a bit of time on some extra strdup()s (which
      could be avoided, but that would require tracking where the pathnames came
      from), but we avoid a lot of memory usage.
      
      Minor page faults track maximum RSS very closely (each page fault maps in
      one page into memory), so the reduction from 125376 page faults to 43921
      means a rough reduction of VM footprint from almost half a gigabyte to
      about a third of that. Those numbers were also double-checked by looking
      at "top" while the process was running.
      
      (Side note: at least part of the remaining VM footprint is the mapping of
      the 177MB pack-file, so the remaining memory use is at least partly "well
      behaved" from a project caching perspective).
      
      For the current git archive itself, the memory usage for a "--all
      --objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to
      9MB), so the reduction seems to hold for much smaller projects too.
      
      For regular "git-rev-list" usage (ie without the "--objects" flag) this
      patch has no impact.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      91b452cb
  6. 21 5月, 2006 1 次提交
  7. 19 5月, 2006 1 次提交
  8. 06 5月, 2006 1 次提交
  9. 18 4月, 2006 2 次提交
    • L
      Log message printout cleanups · 91539833
      Linus Torvalds 提交于
      On Sun, 16 Apr 2006, Junio C Hamano wrote:
      >
      > In the mid-term, I am hoping we can drop the generate_header()
      > callchain _and_ the custom code that formats commit log in-core,
      > found in cmd_log_wc().
      
      Ok, this was nastier than expected, just because the dependencies between
      the different log-printing stuff were absolutely _everywhere_, but here's
      a patch that does exactly that.
      
      The patch is not very easy to read, and the "--patch-with-stat" thing is
      still broken (it does not call the "show_log()" thing properly for
      merges). That's not a new bug. In the new world order it _should_ do
      something like
      
      	if (rev->logopt)
      		show_log(rev, rev->logopt, "---\n");
      
      but it doesn't. I haven't looked at the --with-stat logic, so I left it
      alone.
      
      That said, this patch removes more lines than it adds, and in particular,
      the "cmd_log_wc()" loop is now a very clean:
      
      	while ((commit = get_revision(rev)) != NULL) {
      		log_tree_commit(rev, commit);
      		free(commit->buffer);
      		commit->buffer = NULL;
      	}
      
      so it doesn't get much prettier than this. All the complexity is entirely
      hidden in log-tree.c, and any code that needs to flush the log literally
      just needs to do the "if (rev->logopt) show_log(...)" incantation.
      
      I had to make the combined_diff() logic take a "struct rev_info" instead
      of just a "struct diff_options", but that part is pretty clean.
      
      This does change "git whatchanged" from using "diff-tree" as the commit
      descriptor to "commit", and I changed one of the tests to reflect that new
      reality. Otherwise everything still passes, and my other tests look fine
      too.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      91539833
    • J
      rev-list --header: output format fix · db89665f
      Junio C Hamano 提交于
      Initial fix prepared by Johannes, but I did it slightly differently.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      db89665f
  10. 17 4月, 2006 1 次提交
  11. 16 4月, 2006 1 次提交
  12. 15 4月, 2006 2 次提交
    • J
      Fix up rev-list option parsing. · 8c1f0b44
      Junio C Hamano 提交于
      rev-list does not take diff options, so barf after seeing some.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      8c1f0b44
    • J
      rev-list --bisect: limit list before bisecting. · 4e1dc640
      Junio C Hamano 提交于
      I noticed bisect does not work well without both good and bad.
      Running this script in git.git repository would give you quite
      different results:
      
      	#!/bin/sh
              initial=e83c5163
      
              mid0=`git rev-list --bisect ^$initial --all`
      
              git rev-list $mid0 | wc -l
              git rev-list ^$mid0 --all | wc -l
      
              mid1=`git rev-list --bisect --all`
      
              git rev-list $mid1 | wc -l
              git rev-list ^$mid1 --all | wc -l
      
      The $initial commit is the very first commit you made.  The
      first midpoint bisects things evenly as designed, but the latter
      does not.
      
      The reason I got interested in this was because I was wondering
      if something like the following would help people converting a
      huge repository from foreign SCM, or preparing a repository to
      be fetched over plain dumb HTTP only:
      
              #!/bin/sh
      
              N=4
              P=.git/objects/pack
              bottom=
      
              while test 0 \< $N
              do
                      N=$((N-1))
                      if test -z "$bottom"
                      then
                              newbottom=`git rev-list --bisect --all`
                      else
                              newbottom=`git rev-list --bisect ^$bottom --all`
                      fi
                      if test -z "$bottom"
                      then
                              rev_list="$newbottom"
                      elif test 0 = $N
                      then
                              rev_list="^$bottom --all"
                      else
                              rev_list="^$bottom $newbottom"
                      fi
                      p=$(git rev-list --unpacked --objects $rev_list |
                          git pack-objects $P/pack)
                      git show-index <$P/pack-$p.idx | wc -l
                      bottom=$newbottom
              done
      
      The idea is to pack older half of the history to one pack, then
      older half of the remaining history to another, to continue a
      few times, using finer granularity as we get closer to the tip.
      
      This may not matter, since for a truly huge history, running
      bisect number of times could be quite time consuming, and we
      might be better off running "git rev-list --all" once into a
      temporary file, and manually pick cut-off points from the
      resulting list of commits.  After all we are talking about
      "approximately half" for such an usage, and older history does
      not matter much.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      4e1dc640
  13. 11 4月, 2006 1 次提交
  14. 09 4月, 2006 1 次提交
    • L
      Make "--parents" logs also be incremental · 3381c790
      Linus Torvalds 提交于
      The parent rewriting feature caused us to create the whole history in one
      go, and then simplify it later, because of how rewrite_parents() had been
      written. However, with a little tweaking, it's perfectly possible to do
      even that one incrementally.
      
      Right now, this doesn't really much matter, because every user of
      "--parents" will probably generally _also_ use "--topo-order", which will
      cause the old non-incremental behaviour anyway. However, I'm hopeful that
      we could make even the topological sort incremental, or at least
      _partially_ so (for example, make it incremental up to the first merge).
      
      In the meantime, this at least moves things in the right direction, and
      removes a strange special case.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      3381c790
  15. 07 4月, 2006 1 次提交
  16. 01 4月, 2006 1 次提交
  17. 30 3月, 2006 1 次提交
    • J
      tree/diff header cleanup. · 1b0c7174
      Junio C Hamano 提交于
      Introduce tree-walk.[ch] and move "struct tree_desc" and
      associated functions from various places.
      
      Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
      move it to cache.h.  This macro returns the canonicalized
      st_mode value in the host byte order for files, symlinks and
      directories -- to be compared with a tree_desc entry.
      create_ce_mode(mode) in cache.h is similar but is intended to be
      used for index entries (so it does not work for directories) and
      returns the value in the network byte order.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      1b0c7174
  18. 29 3月, 2006 2 次提交
    • J
      rev-list --boundary · 384e99a4
      Junio C Hamano 提交于
      With the new --boundary flag, the output from rev-list includes
      the UNINTERESING commits at the boundary, which are usually not
      shown.  Their object names are prefixed with '-'.
      
      For example, with this graph:
      
                    C side
                   /
      	A---B---D master
      
      You would get something like this:
      
      	$ git rev-list --boundary --header --parents side..master
      	D B
              tree D^{tree}
              parent B
              ... log message for commit D here ...
              \0-B A
              tree B^{tree}
              parent A
              ... log message for commit B here ...
              \0
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      384e99a4
    • J
      rev-list: memory usage reduction. · 9181ca2c
      Junio C Hamano 提交于
      We do not need to track object refs, neither we need to save commit
      unless we are doing verbose header.  A lot of traversal happens
      inside prepare_revision_walk() these days so setting things up before
      calling that function is necessary.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Acked-by: NLinus Torvalds <torvalds@osdl.org>
      9181ca2c
  19. 22 3月, 2006 1 次提交
  20. 11 3月, 2006 1 次提交
  21. 01 3月, 2006 3 次提交
    • J
      git-log (internal): more options. · 7ae0b0cb
      Junio C Hamano 提交于
      This ports the following options from rev-list based git-log
      implementation:
      
       * -<n>, -n<n>, and -n <n>.  I am still wondering if we want
          this natively supported by setup_revisions(), which already
          takes --max-count.  We may want to move them in the next
          round.  Also I am not sure if we can get away with not
          setting revs->limited when we set max-count.  The latest
          rev-list.c and revision.c in this series do not, so I left
          them as they are.
      
       * --pretty and --pretty=<fmt>.
      
       * --abbrev=<n> and --no-abbrev.
      
      The previous commit already handles time-based limiters
      (--since, --until and friends).  The remaining things that
      rev-list based git-log happens to do are not useful in a pure
      log-viewing purposes, and not ported:
      
       * --bisect (obviously).
      
       * --header.  I am actually in favor of doing the NUL
         terminated record format, but rev-list based one always
         passed --pretty, which defeated this option.  Maybe next
         round.
      
       * --parents.  I do not think of a reason a log viewer wants
         this.  The flag is primarily for feeding squashed history
         via pipe to downstream tools.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      7ae0b0cb
    • L
      Rip out merge-order and make "git log <paths>..." work again. · 765ac8ec
      Linus Torvalds 提交于
      Well, assuming breaking --merge-order is fine, here's a patch (on top of
      the other ones) that makes
      
      	git log <filename>
      
      actually work, as far as I can tell.
      
      I didn't add the logic for --before/--after flags, but that should be
      pretty trivial, and is independent of this anyway.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      765ac8ec
    • L
      git-rev-list libification: rev-list walking · a4a88b2b
      Linus Torvalds 提交于
      This actually moves the "meat" of the revision walking from rev-list.c
      to the new library code in revision.h. It introduces the new functions
      
      	void prepare_revision_walk(struct rev_info *revs);
      	struct commit *get_revision(struct rev_info *revs);
      
      to prepare and then walk the revisions that we have.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      a4a88b2b
  22. 28 2月, 2006 1 次提交
  23. 27 2月, 2006 2 次提交
    • J
      rev-list split: minimum fixup. · d9cfb964
      Junio C Hamano 提交于
      This fixes "the other end has commit X but since then we tagged
      that commit with tag T, and he says he wants T -- what is the
      list of objects we need to send him?" question:
      
      	git-rev-list --objects ^X T
      
      We ended up sending everything since the beginning of time X-<.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      d9cfb964
    • L
      First cut at libifying revlist generation · ae563542
      Linus Torvalds 提交于
      This really just splits things up partially, and creates the
      interface to set things up by parsing the command line.
      
      No real code changes so far, although the parsing of filenames is a bit
      stricter. In particular, if there is a "--", then we do not accept any
      filenames before it, and if there isn't any "--", then we check that _all_
      paths listed are valid, not just the first one.
      
      The new argument parsing automatically also gives us "--default" and
      "--not" handling as in git-rev-parse.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      ae563542
  24. 24 2月, 2006 2 次提交
  25. 22 2月, 2006 1 次提交
  26. 20 2月, 2006 1 次提交
    • J
      rev-list --objects-edge · c6496575
      Junio C Hamano 提交于
      This new flag is similar to --objects, but causes rev-list to
      show list of "uninteresting" commits that appear on the edge
      commit prefixed with '-'.
      
      Downstream pack-objects will be changed to take these as hints
      to use the trees and blobs contained with them as base objects
      of resulting pack, producing an incomplete (not self-contained)
      pack.
      
      Such a pack cannot be used in .git/objects/pack (it is prevented
      by git-index-pack erroring out if it is fed to git-fetch-pack -k
      or git-clone-pack), but would be useful when transferring only
      small changes to huge blobs.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      c6496575
  27. 16 2月, 2006 1 次提交