1. 19 1月, 2010 1 次提交
    • J
      Performance optimization for detection of modified submodules · e3d42c47
      Jens Lehmann 提交于
      In the worst case is_submodule_modified() got called three times for
      each submodule. The information we got from scanning the whole
      submodule tree the first time can be reused instead.
      
      New parameters have been added to diff_change() and diff_addremove(),
      the information is stored in a new member of struct diff_filespec. Its
      value is then reused instead of calling is_submodule_modified() again.
      
      When no explicit "-dirty" is needed in the output the call to
      is_submodule_modified() is not necessary when the submodules HEAD
      already disagrees with the ref of the superproject, as this alone
      marks it as modified. To achieve that, get_stat_data() got an extra
      argument.
      Signed-off-by: NJens Lehmann <Jens.Lehmann@web.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e3d42c47
  2. 30 7月, 2009 2 次提交
    • J
      diff: Rename QUIET internal option to QUICK · 90b19941
      Junio C Hamano 提交于
      The option "QUIET" primarily meant "find if we have _any_ difference as
      quick as possible and report", which means we often do not even have to
      look at blobs if we know the trees are different by looking at the higher
      level (e.g. "diff-tree A B").  As a side effect, because there is no point
      showing one change that we happened to have found first, it also enables
      NO_OUTPUT and EXIT_WITH_STATUS options, making the end result look quiet.
      
      Rename the internal option to QUICK to reflect this better; it also makes
      grepping the source tree much easier, as there are other kinds of QUIET
      option everywhere.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      90b19941
    • J
      diff: change semantics of "ignore whitespace" options · f245194f
      Junio C Hamano 提交于
      Traditionally, the --ignore-whitespace* options have merely meant to tell
      the diff output routine that some class of differences are not worth
      showing in the textual diff output, so that the end user has easier time
      to review the remaining (presumably more meaningful) changes.  These
      options never affected the outcome of the command, given as the exit
      status when the --exit-code option was in effect (either directly or
      indirectly).
      
      When you have only whitespace changes, however, you might expect
      
      	git diff -b --exit-code
      
      to report that there is _no_ change with zero exit status.
      
      Change the semantics of --ignore-whitespace* options to mean more than
      "omit showing the difference in text".
      
      The exit status, when --exit-code is in effect, is computed by checking if
      we found any differences at the path level, while diff frontends feed
      filepairs to the diffcore engine.  When "ignore whitespace" options are in
      effect, we defer this determination until the very end of diffcore
      transformation.  We simply do not know until the textual diff is
      generated, which comes very late in the pipeline.
      
      When --quiet is in effect, various diff frontends optimize by breaking out
      early from the loop that enumerates the filepairs, when we find the first
      path level difference; when --ignore-whitespace* is used the above change
      automatically disables this optimization.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f245194f
  3. 14 6月, 2009 1 次提交
  4. 23 4月, 2009 1 次提交
  5. 02 4月, 2009 1 次提交
  6. 01 9月, 2008 1 次提交
  7. 17 7月, 2008 1 次提交
  8. 13 12月, 2007 1 次提交
  9. 12 11月, 2007 1 次提交
  10. 21 10月, 2007 1 次提交
    • L
      Fix diffcore-break total breakage · 6dd4b66f
      Linus Torvalds 提交于
      Ok, so on the kernel list, some people noticed that "git log --follow"
      doesn't work too well with some files in the x86 merge, because a lot of
      files got renamed in very special ways.
      
      In particular, there was a pattern of doing single commits with renames
      that looked basically like
      
       - rename "filename.h" -> "filename_64.h"
       - create new "filename.c" that includes "filename_32.h" or
         "filename_64.h" depending on whether we're 32-bit or 64-bit.
      
      which was preparatory for smushing the two trees together.
      
      Now, there's two issues here:
      
       - "filename.c" *remained*. Yes, it was a rename, but there was a new file
         created with the old name in the same commit. This was important,
         because we wanted each commit to compile properly, so that it was
         bisectable, so splitting the rename into one commit and the "create
         helper file" into another was *not* an option.
      
         So we need to break associations where the contents change too much.
         Fine. We have the -B flag for that. When we break things up, then the
         rename detection will be able to figure out whether there are better
         alternatives.
      
       - "git log --follow" didn't with with -B.
      
      Now, the second case was really simple: we use a different "diffopt"
      structure for the rename detection than the basic one (which we use for
      showing the diffs). So that second case is trivially fixed by a trivial
      one-liner that just copies the break_opt values from the "real" diffopts
      to the one used for rename following. So now "git log -B --follow" works
      fine:
      
      	diff --git a/tree-diff.c b/tree-diff.c
      	index 26bdbdd..7c261fd 100644
      	--- a/tree-diff.c
      	+++ b/tree-diff.c
      	@@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
      	 	diff_opts.detect_rename = DIFF_DETECT_RENAME;
      	 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
      	 	diff_opts.single_follow = opt->paths[0];
      	+	diff_opts.break_opt = opt->break_opt;
      	 	paths[0] = NULL;
      	 	diff_tree_setup_paths(paths, &diff_opts);
      	 	if (diff_setup_done(&diff_opts) < 0)
      
      however, the end result does *not* work. Because our diffcore-break.c
      logic is totally bogus!
      
      In particular:
      
       - it used to do
      
      	if (base_size < MINIMUM_BREAK_SIZE)
      		return 0; /* we do not break too small filepair */
      
         which basically says "don't bother to break small files". But that
         "base_size" is the *smaller* of the two sizes, which means that if some
         large file was rewritten into one that just includes another file, we
         would look at the (small) result, and decide that it's smaller than the
         break size, so it cannot be worth it to break it up! Even if the other
         side was ten times bigger and looked *nothing* like the samell file!
      
         That's clearly bogus. I replaced "base_size" with "max_size", so that
         we compare the *bigger* of the filepair with the break size.
      
       - It calculated a "merge_score", which was the score needed to merge it
         back together if nothing else wanted it. But even if it was *so*
         different that we would never want to merge it back, we wouldn't
         consider it a break! That makes no sense. So I added
      
      	if (*merge_score_p > break_score)
      		return 1;
      
         to make it clear that if we wouldn't want to merge it at the end, it
         was *definitely* a break.
      
       - It compared the whole "extent of damage", counting all inserts and
         deletes, but it based this score on the "base_size", and generated the
         damage score with
      
      	delta_size = src_removed + literal_added;
      	damage_score = delta_size * MAX_SCORE / base_size;
      
         but that makes no sense either, since quite often, this will result in
         a number that is *bigger* than MAX_SCORE! Why? Because base_size is
         (again) the smaller of the two files we compare, and when you start out
         from a small file and add a lot (or start out from a large file and
         remove a lot), the base_size is going to be much smaller than the
         damage!
      
         Again, the fix was to replace "base_size" with "max_size", at which
         point the damage actually becomes a sane percentage of the whole.
      
      With these changes in place, not only does "git log -B --follow" work for
      the case that triggered this in the first place, ie now
      
      	git log -B --follow arch/x86/kernel/vmlinux_64.lds.S
      
      actually gives reasonable results. But I also wanted to verify it in
      general, by doing a full-history
      
      	git log --stat -B -C
      
      on my kernel tree with the old code and the new code.
      
      There's some tweaking to be done, but generally, the new code generates
      much better results wrt breaking up files (and then finding better rename
      candidates). Here's a few examples of the "--stat" output:
      
       - This:
      	include/asm-x86/Kbuild        |    2 -
      	include/asm-x86/debugreg.h    |   79 +++++++++++++++++++++++++++++++++++------
      	include/asm-x86/debugreg_32.h |   64 ---------------------------------
      	include/asm-x86/debugreg_64.h |   65 ---------------------------------
      	4 files changed, 68 insertions(+), 142 deletions(-)
      
            Becomes:
      
      	include/asm-x86/Kbuild                        |    2 -
      	include/asm-x86/{debugreg_64.h => debugreg.h} |    9 +++-
      	include/asm-x86/debugreg_32.h                 |   64 -------------------------
      	3 files changed, 7 insertions(+), 68 deletions(-)
      
       - This:
      	include/asm-x86/bug.h    |   41 +++++++++++++++++++++++++++++++++++++++--
      	include/asm-x86/bug_32.h |   37 -------------------------------------
      	include/asm-x86/bug_64.h |   34 ----------------------------------
      	3 files changed, 39 insertions(+), 73 deletions(-)
      
            Becomes
      
      	include/asm-x86/{bug_64.h => bug.h} |   20 +++++++++++++-----
      	include/asm-x86/bug_32.h            |   37 -----------------------------------
      	2 files changed, 14 insertions(+), 43 deletions(-)
      
      Now, in some other cases, it does actually turn a rename into a real
      "delete+create" pair, and then the diff is usually bigger, so truth in
      advertizing: it doesn't always generate a nicer diff. But for what -B was
      meant for, I think this is a big improvement, and I suspect those cases
      where it generates a bigger diff are tweakable.
      
      So I think this diff fixes a real bug, but we might still want to tweak
      the default values and perhaps the exact rules for when a break happens.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      6dd4b66f
  11. 23 6月, 2007 2 次提交
    • L
      Fix up "git log --follow" a bit.. · 9f38e1ef
      Linus Torvalds 提交于
      This fixes "git log --follow" to hopefully not leak memory any more, and
      also cleans it up a bit to look more like some of the other functions that
      use "diff_queued_diff" (by *not* using it directly as a global in the
      code, but by instead just taking a pointer to the diff queue and using
      that).
      
      As to "diff_queued_diff", I think it would be better off not as a global
      at all, but as being just an entry in the "struct diff_options" structure,
      but that's a separate issue, and there may be some subtle reason for why
      it's currently a global.
      
      Anyway, no real changes. Instead of having a magical first entry in the
      diff-queue, we now end up just keeping the diff-queue clean, and keeping
      our "preferred" file pairing in an internal "choice" variable. That makes
      it easy to switch the choice around when we find a better one.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      9f38e1ef
    • L
      Finally implement "git log --follow" · 750f7b66
      Linus Torvalds 提交于
      Ok, I've really held off doing this too damn long, because I'm lazy, and I
      was always hoping that somebody else would do it.
      
      But no, people keep asking for it, but nobody actually did anything, so I
      decided I might as well bite the bullet, and instead of telling people
      they could add a "--follow" flag to "git log" to do what they want to do,
      I decided that it looks like I just have to do it for them..
      
      The code wasn't actually that complicated, in that the diffstat for this
      patch literally says "70 insertions(+), 1 deletions(-)", but I will have
      to admit that in order to get to this fairly simple patch, you did have to
      know and understand the internal git diff generation machinery pretty
      well, and had to really be able to follow how commit generation interacts
      with generating patches and generating the log.
      
      So I suspect that while I was right that it wasn't that hard, I might have
      been expecting too much of random people - this patch does seem to be
      firmly in the core "Linus or Junio" territory.
      
      To make a long story short: I'm sorry for it taking so long until I just
      did it.
      
      I'm not going to guarantee that this works for everybody, but you really
      can just look at the patch, and after the appropriate appreciative noises
      ("Ooh, aah") over how clever I am, you can then just notice that the code
      itself isn't really that complicated.
      
      All the real new code is in the new "try_to_follow_renames()" function. It
      really isn't rocket science: we notice that the pathname we were looking
      at went away, so we start a full tree diff and try to see if we can
      instead make that pathname be a rename or a copy from some other previous
      pathname. And if we can, we just continue, except we show *that*
      particular diff, and ever after we use the _previous_ pathname.
      
      One thing to look out for: the "rename detection" is considered to be a
      singular event in the _linear_ "git log" output! That's what people want
      to do, but I just wanted to point out that this patch is *not* carrying
      around a "commit,pathname" kind of pair and it's *not* going to be able to
      notice the file coming from multiple *different* files in earlier history.
      
      IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
      of "files have single identities" kind of semantics, and git log will just
      pick the identity based on the normal move/copy heuristics _as_if_ the
      history could be linearized.
      
      Put another way: I think the model is broken, but given the broken model,
      I think this patch does just about as well as you can do. If you have
      merges with the same "file" having different filenames over the two
      branches, git will just end up picking _one_ of the pathnames at the point
      where the newer one goes away. It never looks at multiple pathnames in
      parallel.
      
      And if you understood all that, you probably didn't need it explained, and
      if you didn't understand the above blathering, it doesn't really mtter to
      you. What matters to you is that you can now do
      
      	git log -p --follow builtin-rev-list.c
      
      and it will find the point where the old "rev-list.c" got renamed to
      "builtin-rev-list.c" and show it as such.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      750f7b66
  12. 22 3月, 2007 4 次提交
  13. 19 3月, 2007 2 次提交
    • L
      Set up for better tree diff optimizations · 5d865017
      Linus Torvalds 提交于
      This is mainly just a cleanup patch, and sets up for later changes where
      the tree-diff.c "interesting()" function can return more than just a
      yes/no value.
      
      In particular, it should be quite possible to say "no subsequent entries
      in this tree can possibly be interesting any more", and thus allow the
      callers to short-circuit the tree entirely.
      
      In fact, changing the callers to do so is trivial, and is really all this
      patch really does, because changing "interesting()" itself to say that
      nothing further is going to be interesting is definitely more complicated,
      considering that we may have arbitrary pathspecs.
      
      But in cleaning up the callers, this actually fixes a potential small
      performance issue in diff_tree(): if the second tree has a lot of
      uninterestign crud in it, we would keep on doing the "is it interesting?"
      check on the first tree for each uninteresting entry in the second one.
      
      The answer is obviously not going to change, so that was just not helping.
      The new code is clearer and simpler and avoids this issue entirely.
      
      I also renamed "interesting()" to "tree_entry_interesting()", because I
      got frustrated by the fact that
      
       - we actually had *another* function called "interesting()" in another
         file, and I couldn't tell from the profiles which one was the one that
         mattered more.
      
       - when rewriting it to return a ternary value, you can't just do
      
      	if (interesting(...))
      		...
      
         any more, but want to assign the return value to a local variable. The
         name of choice for that variable would normally be "interesting", so
         I just wanted to make the function name be more specific, and avoid
         that whole issue (even though I then didn't choose that name for either
         of the users, just to avoid confusion in the patch itself ;)
      
      In other words, this doesn't really change anything, but I think it's a
      good thing to do, and if somebody comes along and writes the logic for
      "yeah, none of the pathspecs you have are interesting", we now support
      that trivially.
      
      It could easily be a meaningful optimization for things like "blame",
      where there's just one pathspec, and stopping when you've seen it would
      allow you to avoid about 50% of the tree traversals on average.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      5d865017
    • L
      Avoid unnecessary strlen() calls · 304de2d2
      Linus Torvalds 提交于
      This is a micro-optimization that grew out of the mailing list discussion
      about "strlen()" showing up in profiles.
      
      We used to pass regular C strings around to the low-level tree walking
      routines, and while this worked fine, it meant that we needed to call
      strlen() on strings that the caller always actually knew the size of
      anyway.
      
      So pass the length of the string down wih the string, and avoid
      unnecessary calls to strlen(). Also, when extracting a pathname from a
      tree entry, use "tree_entry_len()" instead of strlen(), since the length
      of the pathname is directly calculable from the decoded tree entry itself
      without having to actually do another strlen().
      
      This shaves off another ~5-10% from some loads that are very tree
      intensive (notably doing commit filtering by a pathspec).
      
      Signed-off-by: Linus Torvalds  <torvalds@linux-foundation.org>"
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      304de2d2
  14. 15 3月, 2007 1 次提交
  15. 27 2月, 2007 1 次提交
    • N
      convert object type handling from a string to a number · 21666f1a
      Nicolas Pitre 提交于
      We currently have two parallel notation for dealing with object types
      in the code: a string and a numerical value.  One of them is obviously
      redundent, and the most used one requires more stack space and a bunch
      of strcmp() all over the place.
      
      This is an initial step for the removal of the version using a char array
      found in object reading code paths.  The patch is unfortunately large but
      there is no sane way to split it in smaller parts without breaking the
      system.
      Signed-off-by: NNicolas Pitre <nico@cam.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      21666f1a
  16. 27 10月, 2006 1 次提交
    • R
      Make git-cherry handle root trees · 2b60356d
      Rene Scharfe 提交于
      This patch on top of 'next' makes built-in git-cherry handle root
      commits.
      
      It moves the static function log-tree.c::diff_root_tree() to
      tree-diff.c and makes it more similar to diff_tree_sha1() by
      shuffling around arguments and factoring out the call to
      log_tree_diff_flush().  Consequently the name is changed to
      diff_root_tree_sha1().  It is a version of diff_tree_sha1() that
      compares the empty tree (= root tree) against a single 'real' tree.
      
      This function is then used in get_patch_id() to compute patch IDs
      for initial commits instead of SEGFAULTing, as the current code
      does if confronted with parentless commits.
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      2b60356d
  17. 18 8月, 2006 1 次提交
  18. 15 8月, 2006 1 次提交
  19. 11 4月, 2006 1 次提交
    • J
      tree-diff: do not assume we use only one pathspec · a8baa7b9
      Junio C Hamano 提交于
      The way tree-diff was set up assumed we would use only one set
      of pathspec during the entire life of the program.  Move the
      pathspec related static variables out to diff_options structure
      so that we can filter commits with one set of paths while show
      the actual diffs using different set of paths.
      
      I suspect this breaks blame.c, and makes "git log paths..." to
      default to the --full-diff, the latter of which is dealt with
      the next commit.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      a8baa7b9
  20. 04 4月, 2006 1 次提交
  21. 30 3月, 2006 1 次提交
    • J
      tree/diff header cleanup. · 1b0c7174
      Junio C Hamano 提交于
      Introduce tree-walk.[ch] and move "struct tree_desc" and
      associated functions from various places.
      
      Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
      move it to cache.h.  This macro returns the canonicalized
      st_mode value in the host byte order for files, symlinks and
      directories -- to be compared with a tree_desc entry.
      create_ce_mode(mode) in cache.h is similar but is intended to be
      used for index entries (so it does not work for directories) and
      returns the value in the network byte order.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      1b0c7174
  22. 01 2月, 2006 1 次提交
  23. 27 12月, 2005 1 次提交
  24. 23 10月, 2005 1 次提交
    • L
      Split up tree diff functions into tree-diff.c library · ac1b3d12
      Linus Torvalds 提交于
      This makes the tree diff functionality independent of the "git-diff-tree"
      program, by splitting the core functionality up into a library file.
      
      This will be needed for when we teach git-rev-list to only follow a
      specified set of pathnames, rather than the global revision history.
      
      Most of it is a fairly straightforward code move, but it also involves
      some calling convention cleanup, and moving some of the static variables
      from diff-tree.c into the options structure.
      
      The actual tree change callback routines also become paramterized by the
      diff_options structure, allowing the library functionality to do something
      else than just show the diff on stdout.
      
      Right now the only user of this functionality remains git-diff-tree
      itself.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      ac1b3d12