1. 24 5月, 2017 1 次提交
  2. 01 5月, 2017 1 次提交
  3. 31 3月, 2017 1 次提交
    • J
      diff: avoid fixed-size buffer for patch-ids · 977db6b4
      Jeff King 提交于
      To generate a patch id, we format the diff header into a
      fixed-size buffer, and then feed the result to our sha1
      computation. The fixed buffer has size '4*PATH_MAX + 20',
      which in theory accommodates the four filenames plus some
      extra data. Except:
      
        1. The filenames may not be constrained to PATH_MAX. The
           static value may not be a real limit on the current
           filesystem. Moreover, we may compute patch-ids for
           names stored only in git, without touching the current
           filesystem at all.
      
        2. The 20 bytes is not nearly enough to cover the
           extra content we put in the buffer.
      
      As a result, the data we feed to the sha1 computation may be
      truncated, and it's possible that a commit with a very long
      filename could erroneously collide in the patch-id space
      with another commit. For instance, if one commit modified
      "really-long-filename/foo" and another modified "bar" in the
      same directory.
      
      In practice this is unlikely. Because the filenames are
      repeated, and because there's a single cutoff at the end of
      the buffer, the offending filename would have to be on the
      order of four times larger than PATH_MAX.
      
      We could fix this by moving to a strbuf. However, we can
      observe that the purpose of formatting this in the first
      place is to feed it to git_SHA1_Update(). So instead, let's
      just feed each part of the formatted string directly. This
      actually ends up more readable, and we can even factor out
      some duplicated bits from the various conditional branches.
      
      Technically this may change the output of patch-id for very
      long filenames, but it's not worth making an exception for
      this in the --stable output. It was a bug, and one that only
      affected an unlikely set of paths.  And anyway, the exact
      value would have varied from platform to platform depending
      on the value of PATH_MAX, so there is no "stable" value.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      977db6b4
  4. 27 3月, 2017 1 次提交
  5. 22 3月, 2017 2 次提交
    • J
      prefix_filename: return newly allocated string · e4da43b1
      Jeff King 提交于
      The prefix_filename() function returns a pointer to static
      storage, which makes it easy to use dangerously. We already
      fixed one buggy caller in hash-object recently, and the
      calls in apply.c are suspicious (I didn't dig in enough to
      confirm that there is a bug, but we call the function once
      in apply_all_patches() and then again indirectly from
      parse_chunk()).
      
      Let's make it harder to get wrong by allocating the return
      value. For simplicity, we'll do this even when the prefix is
      empty (and we could just return the original file pointer).
      That will cause us to allocate sometimes when we wouldn't
      otherwise need to, but this function isn't called in
      performance critical code-paths (and it already _might_
      allocate on any given call, so a caller that cares about
      performance is questionable anyway).
      
      The downside is that the callers need to remember to free()
      the result to avoid leaking. Most of them already used
      xstrdup() on the result, so we know they are OK. The
      remainder have been converted to use free() as appropriate.
      
      I considered retaining a prefix_filename_unsafe() for cases
      where we know the static lifetime is OK (and handling the
      cleanup is awkward). This is only a handful of cases,
      though, and it's not worth the mental energy in worrying
      about whether the "unsafe" variant is OK to use in any
      situation.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e4da43b1
    • J
      prefix_filename: drop length parameter · 116fb64e
      Jeff King 提交于
      This function takes the prefix as a ptr/len pair, but in
      every caller the length is exactly strlen(ptr). Let's
      simplify the interface and just take the string. This saves
      callers specifying it (and in some cases handling a NULL
      prefix).
      
      In a handful of cases we had the length already without
      calling strlen, so this is technically slower. But it's not
      likely to matter (after all, if the prefix is non-empty
      we'll allocate and copy it into a buffer anyway).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      116fb64e
  6. 03 3月, 2017 1 次提交
    • J
      diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec() · 12426e11
      Junio C Hamano 提交于
      Callers of diff_populate_filespec() can choose to ask only for the
      size of the blob without grabbing the blob data, and the function,
      after running lstat() when the filespec points at a working tree
      file, returns by copying the value in size field of the stat
      structure into the size field of the filespec when this is the case.
      
      However, this short-cut cannot be taken if the contents from the
      path needs to go through convert_to_git(), whose resulting real blob
      data may be different from what is in the working tree file.
      
      As "git diff --quiet" compares the .size fields of filespec
      structures to skip content comparison, this bug manifests as a
      false "there are differences" for a file that needs eol conversion,
      for example.
      Reported-by: NMike Crowe <mac@mcrowe.com>
      Helped-by: NTorsten Bögershausen <tboegi@web.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      12426e11
  7. 09 2月, 2017 1 次提交
    • J
      diff: print line prefix for --name-only output · f5022b5f
      Jeff King 提交于
      If you run "git log --graph --name-only", the pathnames are
      not indented to go along with their matching commits (unlike
      all of the other diff formats). We need to output the line
      prefix for each item before writing it.
      
      The tests cover both --name-status and --name-only. The
      former actually gets this right already, because it builds
      on the --raw format functions. It's only --name-only which
      uses its own code (and this fix mirrors the code in
      diff_flush_raw()).
      
      Note that the tests don't follow our usual style of setting
      up the "expect" output inside the test block. This matches
      the surrounding style, but more importantly it is easier to
      read: we don't have to worry about embedded single-quotes,
      and the leading indentation is more obvious.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f5022b5f
  8. 31 1月, 2017 3 次提交
  9. 13 1月, 2017 1 次提交
  10. 24 12月, 2016 1 次提交
    • J
      diff: retire "compaction" heuristics · 3cde4e02
      Junio C Hamano 提交于
      When a patch inserts a block of lines, whose last lines are the
      same as the existing lines that appear before the inserted block,
      "git diff" can choose any place between these existing lines as the
      boundary between the pre-context and the added lines (adjusting the
      end of the inserted block as appropriate) to come up with variants
      of the same patch, and some variants are easier to read than others.
      
      We have been trying to improve the choice of this boundary, and Git
      2.11 shipped with an experimental "compaction-heuristic".  Since
      then another attempt to improve the logic further resulted in a new
      "indent-heuristic" logic.  It is agreed that the latter gives better
      result overall, and the former outlived its usefulness.
      
      Retire "compaction", and keep "indent" as an experimental feature.
      The latter hopefully will be turned on by default in a future
      release, but that should be done as a separate step.
      Suggested-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      3cde4e02
  11. 09 12月, 2016 1 次提交
    • J
      diff: handle --no-abbrev in no-index case · 43d1948b
      Jack Bates 提交于
      There are two different places where the --no-abbrev option is parsed,
      and two different places where SHA-1s are abbreviated. We normally parse
      --no-abbrev with setup_revisions(), but in the no-index case, "git diff"
      calls diff_opt_parse() directly, and diff_opt_parse() didn't handle
      --no-abbrev until now. (It did handle --abbrev, however.) We normally
      abbreviate SHA-1s with find_unique_abbrev(), but commit 4f03666a ("diff:
      handle sha1 abbreviations outside of repository, 2016-10-20) recently
      introduced a special case when you run "git diff" outside of a
      repository.
      
      setup_revisions() does also call diff_opt_parse(), but not for --abbrev
      or --no-abbrev, which it handles itself. setup_revisions() sets
      rev_info->abbrev, and later copies that to diff_options->abbrev. It
      handles --no-abbrev by setting abbrev to zero. (This change doesn't
      touch that.)
      
      Setting abbrev to zero was broken in the outside-of-a-repository special
      case, which until now resulted in a truly zero-length SHA-1, rather than
      taking zero to mean do not abbreviate. The only way to trigger this bug,
      however, was by running "git diff --raw" without either the --abbrev or
      --no-abbrev options, because 1) without --raw it doesn't respect abbrev
      (which is bizarre, but has been that way forever), 2) we silently clamp
      --abbrev=0 to MINIMUM_ABBREV, and 3) --no-abbrev wasn't handled until
      now.
      
      The outside-of-a-repository case is one of three no-index cases. The
      other two are when one of the files you're comparing is outside of the
      repository you're in, and the --no-index option.
      Signed-off-by: NJack Bates <jack@nottheoilrig.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      43d1948b
  12. 15 11月, 2016 1 次提交
  13. 27 10月, 2016 3 次提交
    • J
      diff: handle sha1 abbreviations outside of repository · 4f03666a
      Jeff King 提交于
      When generating diffs outside a repository (e.g., with "diff
      --no-index"), we may write abbreviated sha1s as part of
      "--raw" output or the "index" lines of "--patch" output.
      Since we have no object database, we never find any
      collisions, and these sha1s get whatever static abbreviation
      length is configured (typically 7).
      
      However, we do blindly look in ".git/objects" to see if any
      objects exist, even though we know we are not in a
      repository. This is usually harmless because such a
      directory is unlikely to exist, but could be wrong in rare
      circumstances.
      
      Let's instead notice when we are not in a repository and
      behave as if the object database is empty (i.e., just use
      the default abbrev length). It would perhaps make sense to
      be conservative and show full sha1s in that case, but
      showing the default abbreviation is what we've always done
      (and is certainly less ugly).
      
      Note that this does mean that:
      
        cd /not/a/repo
        GIT_OBJECT_DIRECTORY=/some/real/objdir git diff --no-index ...
      
      used to look for collisions in /some/real/objdir but now
      does not. This could be considered either a bugfix (we do
      not look at objects if we have no repository) or a
      regression, but it seems unlikely that anybody would care
      much either way.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      4f03666a
    • J
      diff_aligned_abbrev: use "struct oid" · d6cece51
      Jeff King 提交于
      Since we're modifying this function anyway, it's a good time
      to update it to the more modern "struct oid". We can also
      drop some of the magic numbers in favor of GIT_SHA1_HEXSZ,
      along with some descriptive comments.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d6cece51
    • J
      diff_unique_abbrev: rename to diff_aligned_abbrev · d5e3b01e
      Jeff King 提交于
      The word "align" describes how the function actually differs
      from find_unique_abbrev, and will make it less confusing
      when we add more diff-specific abbrevation functions that do
      not do this alignment.
      
      Since this is a globally available function, let's also move
      its descriptive comment to the header file, where we
      typically document function interfaces.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d5e3b01e
  14. 25 10月, 2016 1 次提交
  15. 18 10月, 2016 1 次提交
  16. 05 10月, 2016 3 次提交
  17. 04 10月, 2016 1 次提交
    • J
      abbrev: prepare for new world order · 7b5b7721
      Junio C Hamano 提交于
      The code that sets custom abbreviation length, in response to
      command line argument, often does something like this:
      
      	if (skip_prefix(arg, "--abbrev=", &arg))
      		abbrev = atoi(arg);
      	else if (!strcmp("--abbrev", &arg))
      		abbrev = DEFAULT_ABBREV;
      	/* make the value sane */
      	if (abbrev < 0 || 40 < abbrev)
      		abbrev = ... some sane value ...
      
      However, it is pointless to sanity-check and tweak the value
      obtained from DEFAULT_ABBREV.  We are going to allow it to be
      initially set to -1 to signal that the default abbreviation length
      must be auto sized upon the first request to abbreviate, based on
      the number of objects in the repository, and when that happens,
      rejecting or tweaking a negative value to a "saner" one will
      negatively interfere with the auto sizing.  The codepaths for
      
          git rev-parse --short <object>
          git diff --raw --abbrev
      
      do exactly that; allow them to pass possibly negative abbrevs
      intact, that will come from DEFAULT_ABBREV in the future.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7b5b7721
  18. 01 10月, 2016 1 次提交
    • J
      diff_unique_abbrev(): document its assumption and limitation · d709f1fb
      Junio C Hamano 提交于
      This function is used to add "..." to displayed object names in
      "diff --raw --abbrev[=<n>]" output.  It bases its behaviour on an
      untold assumption that the abbreviation length requested by the
      caller is "reasonble", i.e. most of the objects will abbreviate
      within the requested length and the resulting length would never
      exceed it by more than a few hexdigits (otherwise the resulting
      columns would not align).  Explain that in a comment.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d709f1fb
  19. 30 9月, 2016 1 次提交
  20. 28 9月, 2016 1 次提交
  21. 22 9月, 2016 2 次提交
  22. 20 9月, 2016 2 次提交
    • M
      blame: honor the diff heuristic options and config · 5b162879
      Michael Haggerty 提交于
      Teach "git blame" and "git annotate" the --compaction-heuristic and
      --indent-heuristic options that are now supported by "git diff".
      
      Also teach them to honor the `diff.compactionHeuristic` and
      `diff.indentHeuristic` configuration options.
      
      It would be conceivable to introduce separate configuration options for
      "blame" and "annotate"; for example `blame.compactionHeuristic` and
      `blame.indentHeuristic`. But it would be confusing to users if blame
      output is inconsistent with diff output, so it makes more sense for them
      to respect the same configuration.
      Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5b162879
    • M
      diff: improve positioning of add/delete blocks in diffs · 433860f3
      Michael Haggerty 提交于
      Some groups of added/deleted lines in diffs can be slid up or down,
      because lines at the edges of the group are not unique. Picking good
      shifts for such groups is not a matter of correctness but definitely has
      a big effect on aesthetics. For example, consider the following two
      diffs. The first is what standard Git emits:
      
          --- a/9c572b21:git-send-email.perl
          +++ b/6dcfa306:git-send-email.perl
          @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
           }
      
           if (!$smtp_server) {
          +       $smtp_server = $repo->config('sendemail.smtpserver');
          +}
          +if (!$smtp_server) {
                  foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                          if (-x $_) {
                                  $smtp_server = $_;
      
      The following diff is equivalent, but is obviously preferable from an
      aesthetic point of view:
      
          --- a/9c572b21:git-send-email.perl
          +++ b/6dcfa306:git-send-email.perl
          @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
                  $initial_reply_to =~ s/(^\s+|\s+$)//g;
           }
      
          +if (!$smtp_server) {
          +       $smtp_server = $repo->config('sendemail.smtpserver');
          +}
           if (!$smtp_server) {
                  foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                          if (-x $_) {
      
      This patch teaches Git to pick better positions for such "diff sliders"
      using heuristics that take the positions of nearby blank lines and the
      indentation of nearby lines into account.
      
      The existing Git code basically always shifts such "sliders" as far down
      in the file as possible. The only exception is when the slider can be
      aligned with a group of changed lines in the other file, in which case
      Git favors depicting the change as one add+delete block rather than one
      add and a slightly offset delete block. This naive algorithm often
      yields ugly diffs.
      
      Commit d634d61e improved the situation somewhat by preferring to
      position add/delete groups to make their last line a blank line, when
      that is possible. This heuristic does more good than harm, but (1) it
      can only help if there are blank lines in the right places, and (2)
      always picks the last blank line, even if there are others that might be
      better. The end result is that it makes perhaps 1/3 as many errors as
      the default Git algorithm, but that still leaves a lot of ugly diffs.
      
      This commit implements a new and much better heuristic for picking
      optimal "slider" positions using the following approach: First observe
      that each hypothetical positioning of a diff slider introduces two
      splits: one between the context lines preceding the group and the first
      added/deleted line, and the other between the last added/deleted line
      and the first line of context following it. It tries to find the
      positioning that creates the least bad splits.
      
      Splits are evaluated based only on the presence and locations of nearby
      blank lines, and the indentation of lines near the split. Basically, it
      prefers to introduce splits adjacent to blank lines, between lines that
      are indented less, and between lines with the same level of indentation.
      In more detail:
      
      1. It measures the following characteristics of a proposed splitting
         position in a `struct split_measurement`:
      
         * the number of blank lines above the proposed split
         * whether the line directly after the split is blank
         * the number of blank lines following that line
         * the indentation of the nearest non-blank line above the split
         * the indentation of the line directly below the split
         * the indentation of the nearest non-blank line after that line
      
      2. It combines the measured attributes using a bunch of
         empirically-optimized weighting factors to derive a `struct
         split_score` that measures the "badness" of splitting the text at
         that position.
      
      3. It combines the `split_score` for the top and the bottom of the
         slider at each of its possible positions, and selects the position
         that has the best `split_score`.
      
      I determined the initial set of weighting factors by collecting a corpus
      of Git histories from 29 open-source software projects in various
      programming languages. I generated many diffs from this corpus, and
      determined the best positioning "by eye" for about 6600 diff sliders. I
      used about half of the repositories in the corpus (corresponding to
      about 2/3 of the sliders) as a training set, and optimized the weights
      against this corpus using a crude automated search of the parameter
      space to get the best agreement with the manually-determined values.
      Then I tested the resulting heuristic against the full corpus. The
      results are summarized in the following table, in column `indent-1`:
      
      | repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
      | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
      | afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
      | alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
      | angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
      | animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
      | ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
      | bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
      | corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
      | couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
      | cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
      | discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
      | docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
      | electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
      | git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
      | gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
      | ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
      | ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
      | junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
      | lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
      | magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
      | neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
      | nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
      | phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
      | react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
      | rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
      | spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
      | tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
      | test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
      | test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
      | xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
      | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
      | totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
      | totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
      | totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |
      
      In this table, the numbers are the count and percentage of human-rated
      sliders that the corresponding algorithm got *wrong*. The columns are
      
      * "repository" - the name of the repository used. I used the diffs
        between successive non-merge commits on the HEAD branch of the
        corresponding repository.
      
      * "count" - the number of sliders that were human-rated. I chose most,
        but not all, sliders to rate from those among which the various
        algorithms gave different answers.
      
      * "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.
      
      * "compaction" - the heuristic used by `git diff --compaction-heuristic`
        in Git 2.9.0.
      
      * "compaction-fixed" - the heuristic used by `git diff
        --compaction-heuristic` after the fixes from earlier in this patch
        series. Note that the results are not dramatically different than
        those for "compaction". Both produce non-ideal diffs only about 1/3 as
        often as the default `git diff`.
      
      * "indent-1" - the new `--indent-heuristic` algorithm, using the first
        set of weighting factors, determined as described above.
      
      * "indent-2" - the new `--indent-heuristic` algorithm, using the final
        set of weighting factors, determined as described below.
      
      * `*` - indicates that repo was part of training set used to determine
        the first set of weighting factors.
      
      The fact that the heuristic performed nearly as well on the test set as
      on the training set in column "indent-1" is a good indication that the
      heuristic was not over-trained. Given that fact, I ran a second round of
      optimization, using the entire corpus as the training set. The resulting
      set of weights gave the results in column "indent-2". These are the
      weights included in this patch.
      
      The final result gives consistently and significantly better results
      across the whole corpus than either `git diff` or `git diff
      --compaction-heuristic`. It makes only about 1/30 as many errors as the
      former and about 1/10 as many errors as the latter. (And a good fraction
      of the remaining errors are for diffs that involve weirdly-formatted
      code, sometimes apparently machine-generated.)
      
      The tools that were used to do this optimization and analysis, along
      with the human-generated data values, are recorded in a separate project
      [1].
      
      This patch adds a new command-line option `--indent-heuristic`, and a
      new configuration setting `diff.indentHeuristic`, that activate this
      heuristic. This interface is only meant for testing purposes, and should
      be finalized before including this change in any release.
      
      [1] https://github.com/mhagger/diff-slider-toolsSigned-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      433860f3
  23. 09 9月, 2016 3 次提交
  24. 08 9月, 2016 1 次提交
  25. 01 9月, 2016 5 次提交
    • J
      diff: teach diff to display submodule difference with an inline diff · fd47ae6a
      Jacob Keller 提交于
      Teach git-diff and friends a new format for displaying the difference
      of a submodule. The new format is an inline diff of the contents of the
      submodule between the commit range of the update. This allows the user
      to see the actual code change caused by a submodule update.
      
      Add tests for the new format and option.
      Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      fd47ae6a
    • J
      submodule: convert show_submodule_summary to use struct object_id * · 602a283a
      Jacob Keller 提交于
      Since we're going to be changing this function in a future patch, lets
      go ahead and convert this to use object_id now.
      Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      602a283a
    • J
      diff: prepare for additional submodule formats · 61cfbc05
      Jacob Keller 提交于
      A future patch will add a new format for displaying the difference of
      a submodule. Make it easier by changing how we store the current
      selected format. Replace the DIFF_OPT flag with an enumeration, as each
      format will be mutually exclusive.
      Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      61cfbc05
    • J
      graph: add support for --line-prefix on all graph-aware output · 660e113c
      Jacob Keller 提交于
      Add an extension to git-diff and git-log (and any other graph-aware
      displayable output) such that "--line-prefix=<string>" will print the
      additional line-prefix on every line of output.
      
      To make this work, we have to fix a few bugs in the graph API that force
      graph_show_commit_msg to be used only when you have a valid graph.
      Additionally, we extend the default_diff_output_prefix handler to work
      even when no graph is enabled.
      
      This is somewhat of a hack on top of the graph API, but I think it
      should be acceptable here.
      
      This will be used by a future extension of submodule display which
      displays the submodule diff as the actual diff between the pre and post
      commit in the submodule project.
      
      Add some tests for both git-log and git-diff to ensure that the prefix
      is honored correctly.
      Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      660e113c
    • J
      diff.c: remove output_prefix_length field · cd48dadb
      Junio C Hamano 提交于
      "diff/log --stat" has a logic that determines the display columns
      available for the diffstat part of the output and apportions it for
      pathnames and diffstat graph automatically.
      
      5e71a84a (Add output_prefix_length to diff_options, 2012-04-16)
      added the output_prefix_length field to diff_options structure to
      allow this logic to subtract the display columns used for the
      history graph part from the total "terminal width"; this matters
      when the "git log --graph -p" option is in use.
      
      The field must be set to the number of display columns needed to
      show the output from the output_prefix() callback, which is error
      prone.  As there is only one user of the field, and the user has the
      actual value of the prefix string, let's get rid of the field and
      have the user count the display width itself.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      cd48dadb