1. 29 5月, 2006 2 次提交
    • L
      Add raw tree buffer info to "struct tree" · d2eafb76
      Linus Torvalds 提交于
      This allows us to avoid allocating information for names etc, because
      we can just use the information from the tree buffer directly.
      
      We still keep the old "tree_entry_list" in struct tree as well, so old
      users aren't affected, apart from the fact that the allocations are
      different (if you free a tree entry, you should no longer free the name
      allocation for it, since it's allocated as part of "tree->buffer")
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      d2eafb76
    • L
      Fix memory leak in "git rev-list --objects" · 91b452cb
      Linus Torvalds 提交于
      Martin Langhoff points out that "git repack -a" ends up using up a lot of
      memory for big archives, and that git cvsimport probably should do only
      incremental repacks in order to avoid having repacking flush all the
      caches.
      
      The big majority of the memory usage of repacking is from git rev-list
      tracking all objects, and this patch should go a long way in avoiding the
      excessive memory usage: the bulk of it was due to the object names being
      leaked from the tree parser.
      
      For the historic Linux kernel archive, this simple patch does:
      
      Before:
      	/usr/bin/time git-rev-list --all --objects > /dev/null
      
      	72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
      	0inputs+0outputs (0major+125376minor)pagefaults 0swaps
      
      After:
      	/usr/bin/time git-rev-list --all --objects > /dev/null
      
      	75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
      	0inputs+0outputs (0major+43921minor)pagefaults 0swaps
      
      where we do end up wasting a bit of time on some extra strdup()s (which
      could be avoided, but that would require tracking where the pathnames came
      from), but we avoid a lot of memory usage.
      
      Minor page faults track maximum RSS very closely (each page fault maps in
      one page into memory), so the reduction from 125376 page faults to 43921
      means a rough reduction of VM footprint from almost half a gigabyte to
      about a third of that. Those numbers were also double-checked by looking
      at "top" while the process was running.
      
      (Side note: at least part of the remaining VM footprint is the mapping of
      the 177MB pack-file, so the remaining memory use is at least partly "well
      behaved" from a project caching perspective).
      
      For the current git archive itself, the memory usage for a "--all
      --objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to
      9MB), so the reduction seems to hold for much smaller projects too.
      
      For regular "git-rev-list" usage (ie without the "--objects" flag) this
      patch has no impact.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      91b452cb
  2. 21 5月, 2006 1 次提交
  3. 19 5月, 2006 1 次提交
  4. 06 5月, 2006 1 次提交
  5. 18 4月, 2006 2 次提交
    • L
      Log message printout cleanups · 91539833
      Linus Torvalds 提交于
      On Sun, 16 Apr 2006, Junio C Hamano wrote:
      >
      > In the mid-term, I am hoping we can drop the generate_header()
      > callchain _and_ the custom code that formats commit log in-core,
      > found in cmd_log_wc().
      
      Ok, this was nastier than expected, just because the dependencies between
      the different log-printing stuff were absolutely _everywhere_, but here's
      a patch that does exactly that.
      
      The patch is not very easy to read, and the "--patch-with-stat" thing is
      still broken (it does not call the "show_log()" thing properly for
      merges). That's not a new bug. In the new world order it _should_ do
      something like
      
      	if (rev->logopt)
      		show_log(rev, rev->logopt, "---\n");
      
      but it doesn't. I haven't looked at the --with-stat logic, so I left it
      alone.
      
      That said, this patch removes more lines than it adds, and in particular,
      the "cmd_log_wc()" loop is now a very clean:
      
      	while ((commit = get_revision(rev)) != NULL) {
      		log_tree_commit(rev, commit);
      		free(commit->buffer);
      		commit->buffer = NULL;
      	}
      
      so it doesn't get much prettier than this. All the complexity is entirely
      hidden in log-tree.c, and any code that needs to flush the log literally
      just needs to do the "if (rev->logopt) show_log(...)" incantation.
      
      I had to make the combined_diff() logic take a "struct rev_info" instead
      of just a "struct diff_options", but that part is pretty clean.
      
      This does change "git whatchanged" from using "diff-tree" as the commit
      descriptor to "commit", and I changed one of the tests to reflect that new
      reality. Otherwise everything still passes, and my other tests look fine
      too.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      91539833
    • J
      rev-list --header: output format fix · db89665f
      Junio C Hamano 提交于
      Initial fix prepared by Johannes, but I did it slightly differently.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      db89665f
  6. 17 4月, 2006 1 次提交
  7. 16 4月, 2006 1 次提交
  8. 15 4月, 2006 2 次提交
    • J
      Fix up rev-list option parsing. · 8c1f0b44
      Junio C Hamano 提交于
      rev-list does not take diff options, so barf after seeing some.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      8c1f0b44
    • J
      rev-list --bisect: limit list before bisecting. · 4e1dc640
      Junio C Hamano 提交于
      I noticed bisect does not work well without both good and bad.
      Running this script in git.git repository would give you quite
      different results:
      
      	#!/bin/sh
              initial=e83c5163
      
              mid0=`git rev-list --bisect ^$initial --all`
      
              git rev-list $mid0 | wc -l
              git rev-list ^$mid0 --all | wc -l
      
              mid1=`git rev-list --bisect --all`
      
              git rev-list $mid1 | wc -l
              git rev-list ^$mid1 --all | wc -l
      
      The $initial commit is the very first commit you made.  The
      first midpoint bisects things evenly as designed, but the latter
      does not.
      
      The reason I got interested in this was because I was wondering
      if something like the following would help people converting a
      huge repository from foreign SCM, or preparing a repository to
      be fetched over plain dumb HTTP only:
      
              #!/bin/sh
      
              N=4
              P=.git/objects/pack
              bottom=
      
              while test 0 \< $N
              do
                      N=$((N-1))
                      if test -z "$bottom"
                      then
                              newbottom=`git rev-list --bisect --all`
                      else
                              newbottom=`git rev-list --bisect ^$bottom --all`
                      fi
                      if test -z "$bottom"
                      then
                              rev_list="$newbottom"
                      elif test 0 = $N
                      then
                              rev_list="^$bottom --all"
                      else
                              rev_list="^$bottom $newbottom"
                      fi
                      p=$(git rev-list --unpacked --objects $rev_list |
                          git pack-objects $P/pack)
                      git show-index <$P/pack-$p.idx | wc -l
                      bottom=$newbottom
              done
      
      The idea is to pack older half of the history to one pack, then
      older half of the remaining history to another, to continue a
      few times, using finer granularity as we get closer to the tip.
      
      This may not matter, since for a truly huge history, running
      bisect number of times could be quite time consuming, and we
      might be better off running "git rev-list --all" once into a
      temporary file, and manually pick cut-off points from the
      resulting list of commits.  After all we are talking about
      "approximately half" for such an usage, and older history does
      not matter much.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      4e1dc640
  9. 11 4月, 2006 1 次提交
  10. 09 4月, 2006 1 次提交
    • L
      Make "--parents" logs also be incremental · 3381c790
      Linus Torvalds 提交于
      The parent rewriting feature caused us to create the whole history in one
      go, and then simplify it later, because of how rewrite_parents() had been
      written. However, with a little tweaking, it's perfectly possible to do
      even that one incrementally.
      
      Right now, this doesn't really much matter, because every user of
      "--parents" will probably generally _also_ use "--topo-order", which will
      cause the old non-incremental behaviour anyway. However, I'm hopeful that
      we could make even the topological sort incremental, or at least
      _partially_ so (for example, make it incremental up to the first merge).
      
      In the meantime, this at least moves things in the right direction, and
      removes a strange special case.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      3381c790
  11. 07 4月, 2006 1 次提交
  12. 01 4月, 2006 1 次提交
  13. 30 3月, 2006 1 次提交
    • J
      tree/diff header cleanup. · 1b0c7174
      Junio C Hamano 提交于
      Introduce tree-walk.[ch] and move "struct tree_desc" and
      associated functions from various places.
      
      Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
      move it to cache.h.  This macro returns the canonicalized
      st_mode value in the host byte order for files, symlinks and
      directories -- to be compared with a tree_desc entry.
      create_ce_mode(mode) in cache.h is similar but is intended to be
      used for index entries (so it does not work for directories) and
      returns the value in the network byte order.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      1b0c7174
  14. 29 3月, 2006 2 次提交
    • J
      rev-list --boundary · 384e99a4
      Junio C Hamano 提交于
      With the new --boundary flag, the output from rev-list includes
      the UNINTERESING commits at the boundary, which are usually not
      shown.  Their object names are prefixed with '-'.
      
      For example, with this graph:
      
                    C side
                   /
      	A---B---D master
      
      You would get something like this:
      
      	$ git rev-list --boundary --header --parents side..master
      	D B
              tree D^{tree}
              parent B
              ... log message for commit D here ...
              \0-B A
              tree B^{tree}
              parent A
              ... log message for commit B here ...
              \0
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      384e99a4
    • J
      rev-list: memory usage reduction. · 9181ca2c
      Junio C Hamano 提交于
      We do not need to track object refs, neither we need to save commit
      unless we are doing verbose header.  A lot of traversal happens
      inside prepare_revision_walk() these days so setting things up before
      calling that function is necessary.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Acked-by: NLinus Torvalds <torvalds@osdl.org>
      9181ca2c
  15. 22 3月, 2006 1 次提交
  16. 11 3月, 2006 1 次提交
  17. 01 3月, 2006 3 次提交
    • J
      git-log (internal): more options. · 7ae0b0cb
      Junio C Hamano 提交于
      This ports the following options from rev-list based git-log
      implementation:
      
       * -<n>, -n<n>, and -n <n>.  I am still wondering if we want
          this natively supported by setup_revisions(), which already
          takes --max-count.  We may want to move them in the next
          round.  Also I am not sure if we can get away with not
          setting revs->limited when we set max-count.  The latest
          rev-list.c and revision.c in this series do not, so I left
          them as they are.
      
       * --pretty and --pretty=<fmt>.
      
       * --abbrev=<n> and --no-abbrev.
      
      The previous commit already handles time-based limiters
      (--since, --until and friends).  The remaining things that
      rev-list based git-log happens to do are not useful in a pure
      log-viewing purposes, and not ported:
      
       * --bisect (obviously).
      
       * --header.  I am actually in favor of doing the NUL
         terminated record format, but rev-list based one always
         passed --pretty, which defeated this option.  Maybe next
         round.
      
       * --parents.  I do not think of a reason a log viewer wants
         this.  The flag is primarily for feeding squashed history
         via pipe to downstream tools.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      7ae0b0cb
    • L
      Rip out merge-order and make "git log <paths>..." work again. · 765ac8ec
      Linus Torvalds 提交于
      Well, assuming breaking --merge-order is fine, here's a patch (on top of
      the other ones) that makes
      
      	git log <filename>
      
      actually work, as far as I can tell.
      
      I didn't add the logic for --before/--after flags, but that should be
      pretty trivial, and is independent of this anyway.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      765ac8ec
    • L
      git-rev-list libification: rev-list walking · a4a88b2b
      Linus Torvalds 提交于
      This actually moves the "meat" of the revision walking from rev-list.c
      to the new library code in revision.h. It introduces the new functions
      
      	void prepare_revision_walk(struct rev_info *revs);
      	struct commit *get_revision(struct rev_info *revs);
      
      to prepare and then walk the revisions that we have.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      a4a88b2b
  18. 28 2月, 2006 1 次提交
  19. 27 2月, 2006 2 次提交
    • J
      rev-list split: minimum fixup. · d9cfb964
      Junio C Hamano 提交于
      This fixes "the other end has commit X but since then we tagged
      that commit with tag T, and he says he wants T -- what is the
      list of objects we need to send him?" question:
      
      	git-rev-list --objects ^X T
      
      We ended up sending everything since the beginning of time X-<.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      d9cfb964
    • L
      First cut at libifying revlist generation · ae563542
      Linus Torvalds 提交于
      This really just splits things up partially, and creates the
      interface to set things up by parsing the command line.
      
      No real code changes so far, although the parsing of filenames is a bit
      stricter. In particular, if there is a "--", then we do not accept any
      filenames before it, and if there isn't any "--", then we check that _all_
      paths listed are valid, not just the first one.
      
      The new argument parsing automatically also gives us "--default" and
      "--not" handling as in git-rev-parse.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      ae563542
  20. 24 2月, 2006 2 次提交
  21. 22 2月, 2006 1 次提交
  22. 20 2月, 2006 1 次提交
    • J
      rev-list --objects-edge · c6496575
      Junio C Hamano 提交于
      This new flag is similar to --objects, but causes rev-list to
      show list of "uninteresting" commits that appear on the edge
      commit prefixed with '-'.
      
      Downstream pack-objects will be changed to take these as hints
      to use the trees and blobs contained with them as base objects
      of resulting pack, producing an incomplete (not self-contained)
      pack.
      
      Such a pack cannot be used in .git/objects/pack (it is prevented
      by git-index-pack erroring out if it is fed to git-fetch-pack -k
      or git-clone-pack), but would be useful when transferring only
      small changes to huge blobs.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      c6496575
  23. 16 2月, 2006 1 次提交
  24. 11 2月, 2006 1 次提交
    • J
      rev-list: default to abbreviate merge parent names under --pretty. · 9da5c2f0
      Junio C Hamano 提交于
      When we prettyprint commit log messages, merge parent names were
      often very long and there was no way to abbreviate it.
      
      This changes them to be abbreviated by default, and non-default
      abbreviations can be specified with --no-abbrev or --abbrev=<n>
      options.
      
      Note that this affects only the prettyprinted parent names.  The
      output from --show-parents is meant for machine consumption and
      is not affected by this flag.
      9da5c2f0
  25. 02 2月, 2006 1 次提交
    • J
      rev-list: omit duplicated parents. · 88494423
      Junio C Hamano 提交于
      Showing the same parent more than once for a commit does not
      make much sense downstream, so stop it.
      
      This can happen with an incorrectly made merge commit that
      merges the same parent twice, but can happen in an otherwise
      sane development history while squishing the history by taking
      into account only commits that touch specified paths.
      
      For example,
      
      	$ git rev-list --max-count=1 --parents addafaf9 -- rev-list.c
      
      would have to show this commit ancestry graph:
      
                        .---o---.
                       /         \
                      .---*---o---.
                     /    93b74bca  \
         ---*---o---o-----o---o-----o addafaf9
            d8f6b342  \             /
                      .---o---o---.
                       \         /
                        .---*---.
                            3815f423
      
      where 5 independent development tracks, only two of which have
      changes in the specified paths since they forked.  The last
      change for the other three development tracks was done by the
      same commit before they forked, and we were showing that three
      times.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      88494423
  26. 01 2月, 2006 2 次提交
  27. 28 1月, 2006 4 次提交
    • J
    • J
      diff-tree: abbreviate merge parent object names with --abbrev --pretty. · b2d4c56f
      Junio C Hamano 提交于
      When --abbrev is in effect, abbreviate the merge parent names
      in prettyprinted output.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      b2d4c56f
    • J
      93b74bca
    • L
      rev-list: stop when the file disappears · 461cf59f
      Linus Torvalds 提交于
      The one thing I've considered doing (I really should) is to add a "stop
      when you don't find the file" option to "git-rev-list". This patch does
      some of the work towards that: it removes the "parent" thing when the
      file disappears, so a "git annotate" could do do something like
      
      	git-rev-list --remove-empty --parents HEAD -- "$filename"
      
      and it would get a good graph that stops when the filename disappears
      (it's not perfect though: it won't remove all the unintersting commits).
      
      It also simplifies the logic of finding tree differences a bit, at the
      cost of making it a tad less efficient.
      
      The old logic was two-phase: it would first simplify _only_ merges tree as
      it traversed the tree, and then simplify the linear parts of the remainder
      independently. That was pretty optimal from an efficiency standpoint
      because it avoids doing any comparisons that we can see are unnecessary,
      but it made it much harder to understand than it really needed to be.
      
      The new logic is a lot more straightforward, and compares the trees as it
      traverses the graph (ie everything is a single phase). That makes it much
      easier to stop graph traversal at any point where a file disappears.
      
      As an example, let's say that you have a git repository that has had a
      file called "A" some time in the past. That file gets renamed to B, and
      then gets renamed back again to A. The old "git-rev-list" would show two
      commits: the commit that renames B to A (because it changes A) _and_ as
      its parent the commit that renames A to B (because it changes A).
      
      With the new --remove-empty flag, git-rev-list will show just the commit
      that renames B to A as the "root" commit, and stop traversal there
      (because that's what you want for "annotate" - you want to stop there, and
      for every "root" commit you then separately see if it really is a new
      file, or if the paths history disappeared because it was renamed from some
      other file).
      
      With this patch, you should be able to basically do a "poor mans 'git
      annotate'" with a fairly simple loop:
      
      	push("HEAD", "$filename")
      	while (revision,filename = pop()) {
      		for each i in $(git-rev-list --parents --remove-empty $revision -- "$filename")
      
      		pseudo-parents($i) = git-rev-list parents for that line
      
      		if (pseudo-parents($i) is non-empty) {
      			show diff of $i against pseudo-parents
      			continue
      		}
      
      		/* See if the _real_ parents of $i had a rename */
      		parent($i) = real-parent($i)
      		if (find-rename in $parent($i)->$i)
      			push $parent($i), "old-name"
      	}
      
      which should be doable in perl or something (doing stacks in shell is just
      too painful to be worth it, so I'm not going to do this).
      
      Anybody want to try?
      
      		Linus
      461cf59f
  28. 26 1月, 2006 1 次提交
    • L
      Make git-rev-list and git-rev-parse argument parsing stricter · d8f6b342
      Linus Torvalds 提交于
      If you pass it a filename without the "--" marker to separate it from
      revision information and flags, we now require that the file in question
      actually exists. This makes mis-typed revision information not be silently
      just considered a strange filename.
      
      With the "--" marker, you can continue to pass in filenames that do not
      actually exists - useful for querying what happened to a file that you
      no longer have in the repository.
      
      [ All scripts should use the "--" format regardless, to make things
        unambiguous. So this change should not affect any existing tools ]
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      d8f6b342