提交 · 3a35cb2ea8ab5d44d6ea6290b7af8a7c8623e4c2 · 李少辉-开发者 / git

24 5月, 2017 1 次提交

blame: move textconv_object with related functions · 3a35cb2e

由 Jeff Smith 提交于 5月 24, 2017

textconv_object is used in places other than blame.c and should be moved
to a more appropriate location.  Other textconv related functions are
located in diff.c so that seems as good a place as any.
Signed-off-by: NJeff Smith <whydoubt@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

3a35cb2e

01 5月, 2017 1 次提交

fix minor typos · 5621760f

由 René Genz 提交于 4月 30, 2017

Helped-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NRené Genz <liebundartig@freenet.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5621760f

31 3月, 2017 1 次提交

diff: avoid fixed-size buffer for patch-ids · 977db6b4

由 Jeff King 提交于 3月 30, 2017

To generate a patch id, we format the diff header into a
fixed-size buffer, and then feed the result to our sha1
computation. The fixed buffer has size '4*PATH_MAX + 20',
which in theory accommodates the four filenames plus some
extra data. Except:

  1. The filenames may not be constrained to PATH_MAX. The
     static value may not be a real limit on the current
     filesystem. Moreover, we may compute patch-ids for
     names stored only in git, without touching the current
     filesystem at all.

  2. The 20 bytes is not nearly enough to cover the
     extra content we put in the buffer.

As a result, the data we feed to the sha1 computation may be
truncated, and it's possible that a commit with a very long
filename could erroneously collide in the patch-id space
with another commit. For instance, if one commit modified
"really-long-filename/foo" and another modified "bar" in the
same directory.

In practice this is unlikely. Because the filenames are
repeated, and because there's a single cutoff at the end of
the buffer, the offending filename would have to be on the
order of four times larger than PATH_MAX.

We could fix this by moving to a strbuf. However, we can
observe that the purpose of formatting this in the first
place is to feed it to git_SHA1_Update(). So instead, let's
just feed each part of the formatted string directly. This
actually ends up more readable, and we can even factor out
some duplicated bits from the various conditional branches.

Technically this may change the output of patch-id for very
long filenames, but it's not worth making an exception for
this in the --stable output. It was a bug, and one that only
affected an unlikely set of paths.  And anyway, the exact
value would have varied from platform to platform depending
on the value of PATH_MAX, so there is no "stable" value.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

977db6b4

27 3月, 2017 1 次提交

Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ · dc01505f

由 brian m. carlson 提交于 3月 26, 2017

Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 40 hex characters, use the
constant GIT_MAX_HEXSZ, which is designed to be suitable for
allocations, instead of GIT_SHA1_HEXSZ.  This will ease the transition
down the line by distinguishing between places where we need to allocate
memory suitable for the largest hash from those where we need to handle
the current hash.
Signed-off-by: Nbrian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

dc01505f

22 3月, 2017 2 次提交

prefix_filename: return newly allocated string · e4da43b1

由 Jeff King 提交于 3月 20, 2017

The prefix_filename() function returns a pointer to static
storage, which makes it easy to use dangerously. We already
fixed one buggy caller in hash-object recently, and the
calls in apply.c are suspicious (I didn't dig in enough to
confirm that there is a bug, but we call the function once
in apply_all_patches() and then again indirectly from
parse_chunk()).

Let's make it harder to get wrong by allocating the return
value. For simplicity, we'll do this even when the prefix is
empty (and we could just return the original file pointer).
That will cause us to allocate sometimes when we wouldn't
otherwise need to, but this function isn't called in
performance critical code-paths (and it already _might_
allocate on any given call, so a caller that cares about
performance is questionable anyway).

The downside is that the callers need to remember to free()
the result to avoid leaking. Most of them already used
xstrdup() on the result, so we know they are OK. The
remainder have been converted to use free() as appropriate.

I considered retaining a prefix_filename_unsafe() for cases
where we know the static lifetime is OK (and handling the
cleanup is awkward). This is only a handful of cases,
though, and it's not worth the mental energy in worrying
about whether the "unsafe" variant is OK to use in any
situation.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

e4da43b1

prefix_filename: drop length parameter · 116fb64e

由 Jeff King 提交于 3月 20, 2017

This function takes the prefix as a ptr/len pair, but in
every caller the length is exactly strlen(ptr). Let's
simplify the interface and just take the string. This saves
callers specifying it (and in some cases handling a NULL
prefix).

In a handful of cases we had the length already without
calling strlen, so this is technically slower. But it's not
likely to matter (after all, if the prefix is non-empty
we'll allocate and copy it into a buffer anyway).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

116fb64e

03 3月, 2017 1 次提交

diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec() · 12426e11

由 Junio C Hamano 提交于 3月 01, 2017

Callers of diff_populate_filespec() can choose to ask only for the
size of the blob without grabbing the blob data, and the function,
after running lstat() when the filespec points at a working tree
file, returns by copying the value in size field of the stat
structure into the size field of the filespec when this is the case.

However, this short-cut cannot be taken if the contents from the
path needs to go through convert_to_git(), whose resulting real blob
data may be different from what is in the working tree file.

As "git diff --quiet" compares the .size fields of filespec
structures to skip content comparison, this bug manifests as a
false "there are differences" for a file that needs eol conversion,
for example.
Reported-by: NMike Crowe <mac@mcrowe.com>
Helped-by: NTorsten Bögershausen <tboegi@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

12426e11

09 2月, 2017 1 次提交

diff: print line prefix for --name-only output · f5022b5f

由 Jeff King 提交于 2月 08, 2017

If you run "git log --graph --name-only", the pathnames are
not indented to go along with their matching commits (unlike
all of the other diff formats). We need to output the line
prefix for each item before writing it.

The tests cover both --name-status and --name-only. The
former actually gets this right already, because it builds
on the --raw format functions. It's only --name-only which
uses its own code (and this fix mirrors the code in
diff_flush_raw()).

Note that the tests don't follow our usual style of setting
up the "expect" output inside the test block. This matches
the surrounding style, but more importantly it is easier to
read: we don't have to worry about embedded single-quotes,
and the leading indentation is more obvious.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f5022b5f

31 1月, 2017 3 次提交

use oid_to_hex_r() for converting struct object_id hashes to hex strings · 2490574d

由 René Scharfe 提交于 1月 28, 2017

Patch generated by Coccinelle and contrib/coccinelle/object_id.cocci.
Signed-off-by: NRene Scharfe <l.s.r@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

2490574d

diff: use SWAP macro · 402bf8e1

由 René Scharfe 提交于 1月 28, 2017

Use the macro SWAP to exchange the value of pairs of variables instead
of swapping them manually with the help of a temporary variable. The
resulting code is shorter and easier to read.

The two cases were not transformed by the semantic patch swap.cocci
because it's extra careful and handles only cases where the types of all
variables are the same -- and here we swap two ints and use an unsigned
temporary variable for that. Nevertheless the conversion is safe, as
the value range is preserved with and without the patch.
Signed-off-by: NRene Scharfe <l.s.r@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

402bf8e1

use SWAP macro · 35d803bc

由 René Scharfe 提交于 1月 28, 2017

Apply the semantic patch swap.cocci to convert hand-rolled swaps to use
the macro SWAP.  The resulting code is shorter and easier to read, the
object code is effectively unchanged.

The patch for object.c had to be hand-edited in order to preserve the
comment before the change; Coccinelle tried to eat it for some reason.
Signed-off-by: NRene Scharfe <l.s.r@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

35d803bc

13 1月, 2017 1 次提交

diff: add interhunk context config option · c4888677

由 Vegard Nossum 提交于 1月 12, 2017

The --inter-hunk-context= option was added in commit 6d0e674a
("diff: add option to show context between close hunks"). This patch
allows configuring a default for this option.
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c4888677

24 12月, 2016 1 次提交

diff: retire "compaction" heuristics · 3cde4e02

由 Junio C Hamano 提交于 12月 23, 2016

When a patch inserts a block of lines, whose last lines are the
same as the existing lines that appear before the inserted block,
"git diff" can choose any place between these existing lines as the
boundary between the pre-context and the added lines (adjusting the
end of the inserted block as appropriate) to come up with variants
of the same patch, and some variants are easier to read than others.

We have been trying to improve the choice of this boundary, and Git
2.11 shipped with an experimental "compaction-heuristic". Since
then another attempt to improve the logic further resulted in a new
"indent-heuristic" logic. It is agreed that the latter gives better
result overall, and the former outlived its usefulness.

Retire "compaction", and keep "indent" as an experimental feature.
The latter hopefully will be turned on by default in a future
release, but that should be done as a separate step.
Suggested-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

3cde4e02

09 12月, 2016 1 次提交

diff: handle --no-abbrev in no-index case · 43d1948b

由 Jack Bates 提交于 12月 06, 2016

There are two different places where the --no-abbrev option is parsed,
and two different places where SHA-1s are abbreviated. We normally parse
--no-abbrev with setup_revisions(), but in the no-index case, "git diff"
calls diff_opt_parse() directly, and diff_opt_parse() didn't handle
--no-abbrev until now. (It did handle --abbrev, however.) We normally
abbreviate SHA-1s with find_unique_abbrev(), but commit 4f03666a ("diff:
handle sha1 abbreviations outside of repository, 2016-10-20) recently
introduced a special case when you run "git diff" outside of a
repository.

setup_revisions() does also call diff_opt_parse(), but not for --abbrev
or --no-abbrev, which it handles itself. setup_revisions() sets
rev_info->abbrev, and later copies that to diff_options->abbrev. It
handles --no-abbrev by setting abbrev to zero. (This change doesn't
touch that.)

Setting abbrev to zero was broken in the outside-of-a-repository special
case, which until now resulted in a truly zero-length SHA-1, rather than
taking zero to mean do not abbreviate. The only way to trigger this bug,
however, was by running "git diff --raw" without either the --abbrev or
--no-abbrev options, because 1) without --raw it doesn't respect abbrev
(which is bizarre, but has been that way forever), 2) we silently clamp
--abbrev=0 to MINIMUM_ABBREV, and 3) --no-abbrev wasn't handled until
now.

The outside-of-a-repository case is one of three no-index cases. The
other two are when one of the files you're comparing is outside of the
repository you're in, and the --no-index option.
Signed-off-by: NJack Bates <jack@nottheoilrig.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

43d1948b

15 11月, 2016 1 次提交

diffcore-delta: remove unused parameter to diffcore_count_changes() · 974e0044

由 Tobias Klauser 提交于 11月 14, 2016

The delta_limit parameter to diffcore_count_changes() has been unused
since commit ba23bbc8 ("diffcore-delta: make change counter to byte
oriented again.", 2006-03-04).

Remove the parameter and adjust all callers.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

974e0044

27 10月, 2016 3 次提交

diff: handle sha1 abbreviations outside of repository · 4f03666a

由 Jeff King 提交于 10月 20, 2016

When generating diffs outside a repository (e.g., with "diff
--no-index"), we may write abbreviated sha1s as part of
"--raw" output or the "index" lines of "--patch" output.
Since we have no object database, we never find any
collisions, and these sha1s get whatever static abbreviation
length is configured (typically 7).

However, we do blindly look in ".git/objects" to see if any
objects exist, even though we know we are not in a
repository. This is usually harmless because such a
directory is unlikely to exist, but could be wrong in rare
circumstances.

Let's instead notice when we are not in a repository and
behave as if the object database is empty (i.e., just use
the default abbrev length). It would perhaps make sense to
be conservative and show full sha1s in that case, but
showing the default abbreviation is what we've always done
(and is certainly less ugly).

Note that this does mean that:

  cd /not/a/repo
  GIT_OBJECT_DIRECTORY=/some/real/objdir git diff --no-index ...

used to look for collisions in /some/real/objdir but now
does not. This could be considered either a bugfix (we do
not look at objects if we have no repository) or a
regression, but it seems unlikely that anybody would care
much either way.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

4f03666a

diff_aligned_abbrev: use "struct oid" · d6cece51

由 Jeff King 提交于 10月 20, 2016

Since we're modifying this function anyway, it's a good time
to update it to the more modern "struct oid". We can also
drop some of the magic numbers in favor of GIT_SHA1_HEXSZ,
along with some descriptive comments.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d6cece51

diff_unique_abbrev: rename to diff_aligned_abbrev · d5e3b01e

由 Jeff King 提交于 10月 20, 2016

The word "align" describes how the function actually differs
from find_unique_abbrev, and will make it less confusing
when we add more diff-specific abbrevation functions that do
not do this alignment.

Since this is a globally available function, let's also move
its descriptive comment to the header file, where we
typically document function interfaces.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d5e3b01e

25 10月, 2016 1 次提交

diff: add --ita-[in]visible-in-index · b42b4519

由 Nguyễn Thái Ngọc Duy 提交于 10月 24, 2016

The option --ita-invisible-in-index exposes the "ita_invisible_in_index"
diff flag to outside to allow easier experimentation with this new mode.
The "plan" is to make --ita-invisible-in-index default to keep consistent
behavior with 'status' and 'commit', but a bunch other commands like
'apply', 'merge', 'reset'.... need to be taken into consideration as well.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b42b4519

18 10月, 2016 1 次提交

i18n: diff: mark warnings for translation · db424979

由 Vasco Almeida 提交于 10月 17, 2016

Mark rename_limit_warning and degrade_cc_to_c_warning and
rename_limit_warning for translation.
Signed-off-by: NVasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

db424979

05 10月, 2016 3 次提交

diff: introduce diff.wsErrorHighlight option · a17505f2

由 Junio C Hamano 提交于 10月 04, 2016

With the preparatory steps, it has become trivial to teach the
system a new diff.wsErrorHighlight configuration that gives the
default value for --ws-error-highlight command line option.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a17505f2

diff.c: move ws-error-highlight parsing helpers up · 0b4b42e7

由 Junio C Hamano 提交于 10月 04, 2016

These need to be usable from git_diff_ui_config() code to help
parsing a configuration variable, so move them up.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

0b4b42e7

diff.c: refactor parse_ws_error_highlight() · 077965f8

由 Junio C Hamano 提交于 10月 04, 2016

Rename the function to parse_ws_error_highlight_opt(), because it is
meant to parse a command line option, and then refactor the meat of
the function into a helper function that reports the parsed result
which is typically a small unsigned int (these are OR'ed bitmask
after all), or a negative offset that indicates where in the input
string a parse error happened.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

077965f8

04 10月, 2016 1 次提交

abbrev: prepare for new world order · 7b5b7721

由 Junio C Hamano 提交于 9月 30, 2016

The code that sets custom abbreviation length, in response to
command line argument, often does something like this:

	if (skip_prefix(arg, "--abbrev=", &arg))
		abbrev = atoi(arg);
	else if (!strcmp("--abbrev", &arg))
		abbrev = DEFAULT_ABBREV;
	/* make the value sane */
	if (abbrev < 0 || 40 < abbrev)
		abbrev = ... some sane value ...

However, it is pointless to sanity-check and tweak the value
obtained from DEFAULT_ABBREV.  We are going to allow it to be
initially set to -1 to signal that the default abbreviation length
must be auto sized upon the first request to abbreviate, based on
the number of objects in the repository, and when that happens,
rejecting or tweaking a negative value to a "saner" one will
negatively interfere with the auto sizing.  The codepaths for

    git rev-parse --short <object>
    git diff --raw --abbrev

do exactly that; allow them to pass possibly negative abbrevs
intact, that will come from DEFAULT_ABBREV in the future.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

7b5b7721

01 10月, 2016 1 次提交

diff_unique_abbrev(): document its assumption and limitation · d709f1fb

由 Junio C Hamano 提交于 9月 30, 2016

This function is used to add "..." to displayed object names in
"diff --raw --abbrev[=<n>]" output.  It bases its behaviour on an
untold assumption that the abbreviation length requested by the
caller is "reasonble", i.e. most of the objects will abbreviate
within the requested length and the resulting length would never
exceed it by more than a few hexdigits (otherwise the resulting
columns would not align).  Explain that in a comment.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d709f1fb

30 9月, 2016 1 次提交

use QSORT · 9ed0d8d6

由 René Scharfe 提交于 9月 29, 2016

Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT.  The resulting code is
shorter and supports empty arrays with NULL pointers.
Signed-off-by: NRene Scharfe <l.s.r@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9ed0d8d6

28 9月, 2016 1 次提交

use strbuf_add_unique_abbrev() for adding short hashes, part 2 · f937d785

由 René Scharfe 提交于 9月 27, 2016

Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
instead of taking detours through find_unique_abbrev() and its static
buffer.  This is shorter and a bit more efficient.

1eb47f16 already converted six cases,
this patch covers three more.

A semantic patch for Coccinelle is included for easier checking for
new cases that might be introduced in the future.
Signed-off-by: NRene Scharfe <l.s.r@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f937d785

22 9月, 2016 2 次提交

regex: use regexec_buf() · b7d36ffc

由 Johannes Schindelin 提交于 9月 21, 2016

The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3).  By converting them to use regexec_buf(), the code has
become much cleaner.
Signed-off-by: NJohannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b7d36ffc

i18n: i18n: diff: mark die messages for translation · a2f05c94

由 Jean-Noël AVILA 提交于 9月 20, 2016

While marking individual messages for translation, consolidate some
messages "option 'foo' requires a value" that is used for many
options into one by introducing a helper function to die with the
message with the option name embedded in it, and ask the translators
to localize that single message instead.
Signed-off-by: NVasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: NJean-Noel Avila <jn.avila@free.fr>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a2f05c94

20 9月, 2016 2 次提交

blame: honor the diff heuristic options and config · 5b162879

由 Michael Haggerty 提交于 9月 05, 2016

Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".

Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.

It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5b162879

diff: improve positioning of add/delete blocks in diffs · 433860f3

由 Michael Haggerty 提交于 9月 05, 2016

Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:

    --- a/9c572b21:git-send-email.perl
    +++ b/6dcfa306:git-send-email.perl
    @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
     }

     if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
    +if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {
                            $smtp_server = $_;

The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:

    --- a/9c572b21:git-send-email.perl
    +++ b/6dcfa306:git-send-email.perl
    @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
            $initial_reply_to =~ s/(^\s+|\s+$)//g;
     }

    +if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
     if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {

This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.

The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.

Commit d634d61e improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.

This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.

Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:

1. It measures the following characteristics of a proposed splitting
   position in a `struct split_measurement`:

   * the number of blank lines above the proposed split
   * whether the line directly after the split is blank
   * the number of blank lines following that line
   * the indentation of the nearest non-blank line above the split
   * the indentation of the line directly below the split
   * the indentation of the nearest non-blank line after that line

2. It combines the measured attributes using a bunch of
   empirically-optimized weighting factors to derive a `struct
   split_score` that measures the "badness" of splitting the text at
   that position.

3. It combines the `split_score` for the top and the bottom of the
   slider at each of its possible positions, and selects the position
   that has the best `split_score`.

I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:

| repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
| alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
| angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
| animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
| ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
| bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
| corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
| couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
| cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
| discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
| docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
| electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
| git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
| gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
| ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
| junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
| lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
| magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
| neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
| phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
| react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
| rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
| spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
| tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
| test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
| test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
| xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
| totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
| totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |

In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are

* "repository" - the name of the repository used. I used the diffs
  between successive non-merge commits on the HEAD branch of the
  corresponding repository.

* "count" - the number of sliders that were human-rated. I chose most,
  but not all, sliders to rate from those among which the various
  algorithms gave different answers.

* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.

* "compaction" - the heuristic used by `git diff --compaction-heuristic`
  in Git 2.9.0.

* "compaction-fixed" - the heuristic used by `git diff
  --compaction-heuristic` after the fixes from earlier in this patch
  series. Note that the results are not dramatically different than
  those for "compaction". Both produce non-ideal diffs only about 1/3 as
  often as the default `git diff`.

* "indent-1" - the new `--indent-heuristic` algorithm, using the first
  set of weighting factors, determined as described above.

* "indent-2" - the new `--indent-heuristic` algorithm, using the final
  set of weighting factors, determined as described below.

* `*` - indicates that repo was part of training set used to determine
  the first set of weighting factors.

The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.

The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)

The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].

This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.

[1] https://github.com/mhagger/diff-slider-toolsSigned-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

433860f3

09 9月, 2016 3 次提交

diff: remove dead code · ca9b37e5

由 Stefan Beller 提交于 9月 07, 2016

When `len < 1`, len has to be 0 or negative, emit_line will then remove the
first character and by then `len` would be negative. As this doesn't
happen, it is safe to assume it is dead code.

This continues to simplify the code, which was started in b8d9c1a6
(2009-09-03,  diff.c: the builtin_diff() deals with only two-file
comparison).
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ca9b37e5

diff: omit found pointer from emit_callback · ba16233c

由 Stefan Beller 提交于 9月 07, 2016

We keep the actual data in the diff options, which are just as accessible.
Remove the pointer stored in struct emit_callback for readability.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ba16233c

diff.c: use diff_options directly · fb33b62c

由 Stefan Beller 提交于 9月 07, 2016

The value of `ecbdata->opt` is accessible via the short variable `o`
already, so let's use that instead.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

fb33b62c

08 9月, 2016 1 次提交

cache: convert struct cache_entry to use struct object_id · 99d1a986

由 brian m. carlson 提交于 9月 05, 2016

Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: Nbrian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

99d1a986

01 9月, 2016 5 次提交

diff: teach diff to display submodule difference with an inline diff · fd47ae6a

由 Jacob Keller 提交于 8月 31, 2016

Teach git-diff and friends a new format for displaying the difference
of a submodule. The new format is an inline diff of the contents of the
submodule between the commit range of the update. This allows the user
to see the actual code change caused by a submodule update.

Add tests for the new format and option.
Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

fd47ae6a

submodule: convert show_submodule_summary to use struct object_id * · 602a283a

由 Jacob Keller 提交于 8月 31, 2016

Since we're going to be changing this function in a future patch, lets
go ahead and convert this to use object_id now.
Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

602a283a

diff: prepare for additional submodule formats · 61cfbc05

由 Jacob Keller 提交于 8月 31, 2016

A future patch will add a new format for displaying the difference of
a submodule. Make it easier by changing how we store the current
selected format. Replace the DIFF_OPT flag with an enumeration, as each
format will be mutually exclusive.
Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

61cfbc05

graph: add support for --line-prefix on all graph-aware output · 660e113c

由 Jacob Keller 提交于 8月 31, 2016

Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.

To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.

This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.

This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.

Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.
Signed-off-by: NJacob Keller <jacob.keller@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

660e113c

diff.c: remove output_prefix_length field · cd48dadb

由 Junio C Hamano 提交于 8月 31, 2016

"diff/log --stat" has a logic that determines the display columns
available for the diffstat part of the output and apportions it for
pathnames and diffstat graph automatically.

5e71a84a (Add output_prefix_length to diff_options, 2012-04-16)
added the output_prefix_length field to diff_options structure to
allow this logic to subtract the display columns used for the
history graph part from the total "terminal width"; this matters
when the "git log --graph -p" option is in use.

The field must be set to the number of display columns needed to
show the output from the output_prefix() callback, which is error
prone.  As there is only one user of the field, and the user has the
actual value of the prefix string, let's get rid of the field and
have the user count the display width itself.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

cd48dadb

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致