1. 14 12月, 2006 1 次提交
    • S
      Bypass expensive content comparsion during rename detection. · 7da41f48
      Shawn O. Pearce 提交于
      When comparing file contents during the second loop through a rename
      detection attempt we can skip the expensive byte-by-byte comparsion
      if both source and destination files have valid SHA1 values.  This
      improves performance by avoiding either an expensive open/mmap to
      read the working tree copy, or an expensive inflate of a blob object.
      
      Unfortunately we still have to at least initialize the sizes of the
      source and destination files even if the SHA1 values don't match.
      Failing to initialize the sizes causes a number of test cases to fail
      and start reporting different copy/rename behavior than was expected.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      7da41f48
  2. 05 11月, 2006 1 次提交
  3. 18 8月, 2006 1 次提交
  4. 04 8月, 2006 1 次提交
    • J
      diff.c: do not use pathname comparison to tell renames · ef677686
      Junio C Hamano 提交于
      The final output from diff used to compare pathnames between
      preimage and postimage to tell if the filepair is a rename/copy.
      By explicitly marking the filepair created by diffcore_rename(),
      the output routine, resolve_rename_copy(), does not have to do
      so anymore.  This helps feeding a filepair that has different
      pathnames in one and two elements to the diff machinery (most
      notably, comparing two blobs).
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      ef677686
  5. 07 7月, 2006 1 次提交
  6. 09 4月, 2006 1 次提交
  7. 13 3月, 2006 2 次提交
    • L
      Fix up diffcore-rename scoring · 90bd932c
      Linus Torvalds 提交于
      The "score" calculation for diffcore-rename was totally broken.
      
      It scaled "score" as
      
      	score = src_copied * MAX_SCORE / dst->size;
      
      which means that you got a 100% similarity score even if src and dest were
      different, if just every byte of dst was copied from src, even if source
      was much larger than dst (eg we had copied 85% of the bytes, but _deleted_
      the remaining 15%).
      
      That's clearly bogus. We should do the score calculation relative not to
      the destination size, but to the max size of the two.
      
      This seems to fix it.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      90bd932c
    • J
      diffcore-delta: make the hash a bit denser. · 2821104d
      Junio C Hamano 提交于
      To reduce wasted memory, wait until the hash fills up more
      densely before we rehash.  This reduces the working set size a
      bit further.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      2821104d
  8. 12 3月, 2006 1 次提交
  9. 03 3月, 2006 1 次提交
    • J
      diffcore-rename: similarity estimator fix. · 1706306a
      Junio C Hamano 提交于
      The "similarity" logic was giving added material way too much
      negative weight.  What we wanted to see was how similar the
      post-change image was compared to the pre-change image, so the
      natural definition of similarity is how much common things are
      there, relative to the post-change image's size.
      
      This simplifies things a lot.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      1706306a
  10. 02 3月, 2006 1 次提交
    • N
      diff-delta: allow reusing of the reference buffer index · 38fd0721
      Nicolas Pitre 提交于
      When a reference buffer is used multiple times then its index can be
      computed only once and reused multiple times.  This patch adds an extra
      pointer to a pointer argument (from_index) to diff_delta() for this.
      
      If from_index is NULL then everything is like before.
      
      If from_index is non NULL and *from_index is NULL then the index is
      created and its location stored to *from_index.  In this case the caller
      has the responsibility to free the memory pointed to by *from_index.
      
      If from_index and *from_index are non NULL then the index is reused as
      is.
      
      This currently saves about 10% of CPU time to repack the git archive.
      Signed-off-by: NNicolas Pitre <nico@cam.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      38fd0721
  11. 01 3月, 2006 1 次提交
  12. 23 2月, 2006 1 次提交
  13. 27 12月, 2005 1 次提交
  14. 22 11月, 2005 1 次提交
    • J
      rename detection with -M100 means "exact renames only". · 9f70b806
      Junio C Hamano 提交于
      When the user is interested in pure renames, there is no point
      doing the similarity scores.  This changes the score argument
      parsing to special case -M100 (otherwise, it is a precision
      scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you
      do mean 0.1, you can say -M1), and optimizes the diffcore_rename
      transformation to only look at pure renames in that case.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      9f70b806
  15. 17 11月, 2005 1 次提交
  16. 16 11月, 2005 1 次提交
    • J
      diff: make default rename detection limit configurable. · 3299c6f6
      Junio C Hamano 提交于
      A while ago, a rename-detection limit logic was implemented as a
      response to this thread:
      
      	http://marc.theaimsgroup.com/?l=git&m=112413080630175
      
      where gitweb was found to be using a lot of time and memory to
      detect renames on huge commits.  git-diff family takes -l<num>
      flag, and if the number of paths that are rename destination
      candidates (i.e. new paths with -M, or modified paths with -C)
      are larger than that number, skips rename/copy detection even
      when -M or -C is specified on the command line.
      
      This commit makes the rename detection limit easier to use.  You
      can have:
      
      	[diff]
      		renamelimit = 30
      
      in your .git/config file to specify the default rename detection
      limit.  You can override this from the command line; giving 0
      means 'unlimited':
      
      	git diff -M -l0
      
      We might want to change the default behaviour, when you do not
      have the configuration, to limit it to say 20 paths or so.  This
      would also help the diffstat generation after a big 'git pull'.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      3299c6f6
  17. 25 9月, 2005 1 次提交
  18. 16 9月, 2005 1 次提交
    • J
      Plug diff leaks. · 5098bafb
      Junio C Hamano 提交于
      It is a bit embarrassing that it took this long for a fix since the
      problem was first reported on Aug 13th.
      
          Message-ID: <87y876gl1r.wl@mail2.atmark-techno.com>
          From: Yasushi SHOJI <yashi@atmark-techno.com>
          Newsgroups: gmane.comp.version-control.git
          Subject: [patch] possible memory leak in diff.c::diff_free_filepair()
          Date: Sat, 13 Aug 2005 19:58:56 +0900
      
      This time I used valgrind to make sure that it does not overeagerly
      discard memory that is still being used.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      5098bafb
  19. 11 9月, 2005 1 次提交
    • J
      Fix copy marking from diffcore-rename. · 6bac10d8
      Junio C Hamano 提交于
      When (A,B) ==> (B,C) rename-copy was detected, we incorrectly said
      that C was created by copying B.  This is because we only check if the
      path of rename/copy source still exists in the resulting tree to see
      if the file is renamed out of existence.  In this case, the new B is
      created by copying or renaming A, so the original B is lost and we
      should say C is a rename of B not a copy of B.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      6bac10d8
  20. 29 6月, 2005 1 次提交
  21. 26 6月, 2005 1 次提交
    • L
      Add a "max_size" parameter to diff_delta() · 75c42d8c
      Linus Torvalds 提交于
      Anything that generates a delta to see if two objects are close usually
      isn't interested in the delta ends up being bigger than some specified
      size, and this allows us to stop delta generation early when that
      happens.
      75c42d8c
  22. 13 6月, 2005 1 次提交
  23. 04 6月, 2005 2 次提交
  24. 31 5月, 2005 3 次提交
    • J
      [PATCH] Add -B flag to diff-* brothers. · f345b0a0
      Junio C Hamano 提交于
      A new diffcore transformation, diffcore-break.c, is introduced.
      
      When the -B flag is given, a patch that represents a complete
      rewrite is broken into a deletion followed by a creation.  This
      makes it easier to review such a complete rewrite patch.
      
      The -B flag takes the same syntax as the -M and -C flags to
      specify the minimum amount of non-source material the resulting
      file needs to have to be considered a complete rewrite, and
      defaults to 99% if not specified.
      
      As the new test t4008-diff-break-rewrite.sh demonstrates, if a
      file is a complete rewrite, it is broken into a delete/create
      pair, which can further be subjected to the usual rename
      detection if -M or -C is used.  For example, if file0 gets
      completely rewritten to make it as if it were rather based on
      file1 which itself disappeared, the following happens:
      
          The original change looks like this:
      
      	file0     --> file0' (quite different from file0)
      	file1     --> /dev/null
      
          After diffcore-break runs, it would become this:
      
      	file0     --> /dev/null
      	/dev/null --> file0'
      	file1     --> /dev/null
      
          Then diffcore-rename matches them up:
      
      	file1     --> file0'
      
      The internal score values are finer grained now.  Earlier
      maximum of 10000 has been raised to 60000; there is no user
      visible changes but there is no reason to waste available bits.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f345b0a0
    • J
      [PATCH] diff: fix the culling of unneeded delete record. · 2cd68882
      Junio C Hamano 提交于
      The commit 15d061b4
      
          [PATCH] Fix the way diffcore-rename records unremoved source.
      
      still leaves unneeded delete records in its output stream by
      mistake, which was covered up by having an extra check to turn
      such a delete into a no-op downstream.  Fix the check in the
      diffcore-rename to simplify the output routine.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2cd68882
    • J
      [PATCH] diff: code clean-up and removal of rename hack. · 01c4e70f
      Junio C Hamano 提交于
      A new macro, DIFF_PAIR_RENAME(), is introduced to distinguish a
      filepair that is a rename/copy (the definition of which is src
      and dst are different paths, of course).  This removes the hack
      used in the record_rename_pair() to always put a non-zero value
      in the score field.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      01c4e70f
  25. 30 5月, 2005 5 次提交
  26. 25 5月, 2005 1 次提交
  27. 24 5月, 2005 4 次提交
    • J
      [PATCH] Redo rename/copy detection logic. · 25d5ea41
      Junio C Hamano 提交于
      Earlier implementation had a major screw-up in the memory
      management area.  Rename/copy logic sometimes borrowed a pointer
      to a structure without any provision for downstream to determine
      which pointer is shared and which is not.  This resulted in the
      later clean-up code to sometimes double free such structure,
      resulting in a segfault.  This made -M and -C useless.
      
      Another problem the earlier implementation had was that it
      reordered the patches, and forced the logic to differentiate
      renames and copies to depend on that particular order.  This
      problem was fixed by teaching rename/copy detection logic not to
      do any reordering, and rename-copy differentiator not to depend
      on the order of the patches.  The diffs will leave rename/copy
      detector in the same destination path order as the patch that
      was fed into it.  Some test vectors have been reordered to
      accommodate this change.
      
      It also adds a sanity check logic to the human-readable diff-raw
      output to detect paths with embedded TAB and LF characters,
      which cannot be expressed with that format.  This idea came up
      during a discussion with Chris Wedgwood.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25d5ea41
    • J
      [PATCH] Fix diff-pruning logic which was running prune too early. · bceafe75
      Junio C Hamano 提交于
      For later stages to reorder patches, pruning logic and rename detection
      logic should not decide which delete to discard (because another entry
      said it will take over the file as a rename) until the very end.
      
      Also fix some tests that were assuming the earlier "last one is rename
      or keep everything else is copy" semantics of diff-raw format, which no
      longer is true.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bceafe75
    • J
      [PATCH] Rename/copy detection fix. · f7c1512a
      Junio C Hamano 提交于
      The rename/copy detection logic in earlier round was only good
      enough to show patch output and discussion on the mailing list
      about the diff-raw format updates revealed many problems with
      it.  This patch fixes all the ones known to me, without making
      things I want to do later impossible, mostly related to patch
      reordering.
      
       (1) Earlier rename/copy detector determined which one is rename
           and which one is copy too early, which made it impossible
           to later introduce diffcore transformers to reorder
           patches.  This patch fixes it by moving that logic to the
           very end of the processing.
      
       (2) Earlier output routine diff_flush() was pruning all the
           "no-change" entries indiscriminatingly.  This was done due
           to my false assumption that one of the requirements in the
           diff-raw output was not to show such an entry (which
           resulted in my incorrect comment about "diff-helper never
           being able to be equivalent to built-in diff driver").  My
           special thanks go to Linus for correcting me about this.
           When we produce diff-raw output, for the downstream to be
           able to tell renames from copies, sometimes it _is_
           necessary to output "no-change" entries, and this patch
           adds diffcore_prune() function for doing it.
      
       (3) Earlier diff_filepair structure was trying to be not too
           specific about rename/copy operations, but the purpose of
           the structure was to record one or two paths, which _was_
           indeed about rename/copy.  This patch discards xfrm_msg
           field which was trying to be generic for this wrong reason,
           and introduces a couple of fields (rename_score and
           rename_rank) that are explicitly specific to rename/copy
           logic.  One thing to note is that the information in a
           single diff_filepair structure _still_ does not distinguish
           renames from copies, and it is deliberately so.  This is to
           allow patches to be reordered in later stages.
      
       (4) This patch also adds some tests about diff-raw format
           output and makes sure that necessary "no-change" entries
           appear on the output.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7c1512a
    • J
      [PATCH] Be careful with symlinks when detecting renames and copies. · 60896c7b
      Junio C Hamano 提交于
      Earlier round was not treating symbolic links carefully enough,
      and would have produced diff output that renamed/copied then
      edited the contents of a symbolic link, which made no practical
      sense.  Change it to detect only pure renames.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      60896c7b
  28. 23 5月, 2005 2 次提交