提交 · 23b65f9528a1fa19619d1dcb59932cc55075808c · 李少辉-开发者 / git

15 8月, 2017 1 次提交

diff: avoid redundantly clearing a flag · 23b65f95

由 Jonathan Tan 提交于 8月 11, 2017

No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in
mark_color_as_moved(), so it is redundant to clear it for the current
line. Therefore, clear it only for previous lines.

This makes a refactoring in a subsequent patch easier.
Signed-off-by: NJonathan Tan <jonathantanmy@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

23b65f95

01 7月, 2017 24 次提交

diff.c: add dimming to moved line detection · 86b452e2

由 Stefan Beller 提交于 6月 30, 2017

Any lines inside a moved block of code are not interesting. Boundaries
of blocks are only interesting if they are next to another block of moved
code.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

86b452e2

diff.c: color moved lines differently, plain mode · 176841f0

由 Stefan Beller 提交于 6月 30, 2017

Add the 'plain' mode for move detection of code. This omits the checking
for adjacent blocks, so it is not as useful. If you have a lot of the
same blocks moved in the same patch, the 'Zebra' would end up slow as it
is O(n^2) (n is number of same blocks). So this may be useful there and
is generally easy to add. Instead be very literal at the move detection,
do not skip over short blocks here.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

176841f0

diff.c: color moved lines differently · 2e2d5ac1

由 Stefan Beller 提交于 6月 30, 2017

When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OM]  -        if (!is_authorized_user())
    [OM]  -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OM]  -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NM]  +        sensitive_stuff(spanning,
    [NM]  +                        multiple,
    [NM]  +                        lines);
    [NM]  +}

However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OMA] -        if (!is_authorized_user())
    [OMA] -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OMA] -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NMA] +        sensitive_stuff(spanning,
    [NMA] +                        multiple,
    [NMA] +                        lines);
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.

This patch implements the first mode:
* basic alternating 'Zebra' mode
  This conveys all information needed to the user.  Defer customization to
  later patches.

First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

It is useful to have moved lines colored, but there are annoying corner
cases, such as a single line moved, that is very common. For example
in a typical patch of C code, we have closing braces that end statement
blocks or functions.

While it is technically true that these lines are moved as they show up
elsewhere, it is harmful for the review as the reviewers attention is
drawn to such a minor side annoyance.

For now let's have a simple solution of hardcoding the number of
moved lines to be at least 3 before coloring them. Note, that the
length is applied across all blocks to find the 'lonely' blocks
that pollute new code, but do not interfere with a permutated
block where each permutation has less lines than 3.
Helped-by: NJonathan Tan <jonathantanmy@google.com>
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

2e2d5ac1

diff.c: buffer all output if asked to · e6e045f8

由 Stefan Beller 提交于 6月 29, 2017

Introduce a new option 'emitted_symbols' in the struct diff_options which
controls whether all output is buffered up until all output is available.
It is set internally in diff.c when necessary.

We'll have a new struct 'emitted_string' in diff.c which will be used to
buffer each line. The emitted_string will duplicate the memory of the
line to buffer as that is easiest to reason about for now. In a future
patch we may want to decrease the memory usage by not duplicating all
output for buffering but rather we may want to store offsets into the
file or in case of hunk descriptions such as the similarity score, we
could just store the relevant number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_diff_symbol factors out the emission part and depending on the
diff_options->emitted_symbols the emission will be performed directly
when calling emit_diff_symbol or after the whole process is done, i.e.
by buffering we have add the possibility for a second pass over the
whole output before doing the actual output.

In 6440d341 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_string for output.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

e6e045f8

S
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY · 146fdb0d
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
146fdb0d
S
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP · 30b7e1e7
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
30b7e1e7

diff.c: convert word diffing to use emit_diff_symbol · bd033291

由 Stefan Beller 提交于 6月 29, 2017

The word diffing is not line oriented and would need some serious
effort to be transformed into a line oriented approach, so
just go with a symbol DIFF_SYMBOL_WORD_DIFF that is a partial line.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

bd033291

diff.c: convert show_stats to use emit_diff_symbol · 0911c475

由 Stefan Beller 提交于 6月 29, 2017

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_string machinery and
keep print_stat_summary with the same arguments around.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

0911c475

S
diff.c: convert emit_binary_diff_body to use emit_diff_symbol · 4eed0ebd
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
4eed0ebd

submodule.c: migrate diff output to use emit_diff_symbol · f3597138

由 Stefan Beller 提交于 6月 29, 2017

As the submodule process is no longer attached to the same file pointer
'o->file' as the superprojects process, there is a different result in
color.c::check_auto_color. That is why we need to pass coloring explicitly,
such that the submodule coloring decision will be made by the child process
processing the submodule. Only DIFF_SYMBOL_SUBMODULE_PIPETHROUGH contains
color, the other symbols are for embedding the submodule output into the
superprojects output.

Remove the colors from the function signatures, as all the coloring
decisions will be made either inside the child process or the final
emit_diff_symbol, but not in the functions driving the submodule diff.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f3597138

S
diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF · 5af6ea95
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
5af6ea95

diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES · 4acaaa7a

由 Stefan Beller 提交于 6月 29, 2017

we could save a little bit of memory when buffering in a later mode
by just passing the inner part ("%s and %s", file1, file 2), but
those a just a few bytes, so instead let's reuse the implementation from
DIFF_SYMBOL_HEADER and keep the whole line around.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

4acaaa7a

diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER · a29b0a13

由 Stefan Beller 提交于 6月 29, 2017

The header is constructed lazily including line breaks, so just emit
the raw string as is.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a29b0a13

diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS} · 3ee8b7bf

由 Stefan Beller 提交于 6月 29, 2017

We have to use fprintf instead of emit_line, because we want to emit the
tab after the color. This is important for ancient versions of gnu patch
AFAICT, although we probably do not want to feed colored output to the
patch utility, such that it would not matter if the trailing tab is
colored. Keep the corner case as-is though.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

3ee8b7bf

diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE · f2bb1218

由 Stefan Beller 提交于 6月 29, 2017

The context marker use the exact same output pattern, so reuse it.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f2bb1218

S
diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN] · ff958679
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
ff958679

diff.c: migrate emit_line_checked to use emit_diff_symbol · 091f8e28

由 Stefan Beller 提交于 6月 29, 2017

Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
  Take a note in cahe.c as well, when this ws rules are extended we have
  to fix the bits in the flags field.
* how the rules are evaluated (actually this double encodes the sign
  of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)

The check if new lines need to be marked up as extra lines at the end of
file, is now done unconditionally. That should be ok, as
'new_blank_line_at_eof' has a quick early return.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

091f8e28

S
diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF · b9cbfde6
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
b9cbfde6
S
diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO · 68abc6f1
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
68abc6f1
S
diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_MARKER · c64b420b
由 Stefan Beller 提交于 6月 29, 2017
```
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
c64b420b

diff.c: introduce emit_diff_symbol · 36a4cefd

由 Stefan Beller 提交于 6月 29, 2017

In a later patch we want to buffer all output before emitting it as a
new feature ("markup moved lines") conceptually cannot be implemented
in a single pass over the output.

There are different approaches to buffer all output such as:
* Buffering on the char level, i.e. we'd have a char[] which would
  grow at approximately 80 characters a line. This would keep the
  output completely unstructured, but might be very easy to implement,
  such as redirecting all output to a temporary file and working off
  that. The later passes over the buffer are quite complicated though,
  because we have to parse back any output and then decide if it should
  be modified.

* Buffer on a line level. As the output is mostly line oriented already,
  this would make sense, but it still is a bit awkward as we'd have to
  make sense of it again by looking at the first characters of a line
  to decide what part of a diff a line is.

* Buffer semantically. Imagine there is a formal grammar for the diff
  output and we'd keep the symbols of this grammar around. This keeps
  the highest level of structure in the buffered data, such that the
  actual memory requirements are less than say the first option. Instead
  of buffering the characters of the line, we'll buffer what we intend
  to do plus additional information for the specifics. An output of

    diff --git a/new.txt b/new.txt
    index fa69b07..412428c 100644
    Binary files a/new.txt and b/new.txt differ

  could be buffered as
     DIFF_SYMBOL_DIFF_START + new.txt
     DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
     DIFF_SYMBOL_BINARY_FILES + new.txt

This and the following patches introduce the third option of buffering
by first moving any output to emit_diff_symbol, and then introducing the
buffering in this function.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

36a4cefd

diff.c: factor out diff_flush_patch_all_file_pairs · ec331506

由 Stefan Beller 提交于 6月 29, 2017

In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ec331506

diff.c: move line ending check into emit_hunk_header · dfb7728f

由 Stefan Beller 提交于 6月 29, 2017

The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

dfb7728f

diff.c: readability fix · f2d2a5de

由 Stefan Beller 提交于 6月 29, 2017

We already have dereferenced 'p->two' into a local variable 'two'.
Use that.
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f2d2a5de

17 6月, 2017 1 次提交

coccinelle: make use of the "type" FREE_AND_NULL() rule · 6a83d902

由 Ævar Arnfjörð Bjarmason 提交于 6月 15, 2017

Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: NÆvar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

6a83d902

16 6月, 2017 1 次提交

config: don't include config.h by default · b2141fc1

由 Brandon Williams 提交于 6月 14, 2017

Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b2141fc1

14 6月, 2017 1 次提交
- B
  convert: convert convert_to_git to take an index · 82b474e0
  由 Brandon Williams 提交于 6月 12, 2017
```
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
  82b474e0
05 6月, 2017 1 次提交
- B
  diff: rename diff_fill_sha1_info to diff_fill_oid_info · 94e327e9
  由 Brandon Williams 提交于 5月 30, 2017
```
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
  94e327e9
02 6月, 2017 6 次提交

B
diff: convert diff_flush_patch_id to struct object_id · bd25f288
由 Brandon Williams 提交于 5月 30, 2017
```
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
bd25f288
B
diff: finish conversion for prepare_temp_file to struct object_id · 74014152
由 Brandon Williams 提交于 5月 30, 2017
```
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
74014152
B
diff: convert reuse_worktree_file to struct object_id · fb4a1c0d
由 Brandon Williams 提交于 5月 30, 2017
```
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
fb4a1c0d

diff: convert fill_filespec to struct object_id · f9704c2d

由 Brandon Williams 提交于 5月 30, 2017

Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f9704c2d

diff: convert diff_change to struct object_id · 94a0097a

由 Brandon Williams 提交于 5月 30, 2017

Convert diff_change to take a struct object_id.  In addition convert the
function pointer type 'change_fn_t' to also take a struct object_id.
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

94a0097a

diff: convert diff_addremove to struct object_id · c26022ea

由 Brandon Williams 提交于 5月 30, 2017

Convert diff_addremove to take a struct object_id.  In addtion convert
the function pointer type 'add_remove_fn_t' to also take a struct
object_id.
Signed-off-by: NBrandon Williams <bmwill@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c26022ea

26 5月, 2017 1 次提交

use xfopen() in more places · 23a9e071

由 Nguyễn Thái Ngọc Duy 提交于 5月 03, 2017

xfopen()

 - provides error details
 - explains error on reading, or writing, or whatever operation
 - has l10n support
 - prints file name in the error

Some of these are missing in the places that are replaced with xfopen(),
which is a clear win. In some other places, it's just less code (not as
clearly a win as the previous case but still is).

The only slight regresssion is in remote-testsvn, where we don't report
the file class (marks files) in the error messages anymore. But since
this is a _test_ svn remote transport, I'm not too concerned.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

23a9e071

24 5月, 2017 1 次提交

blame: move textconv_object with related functions · 3a35cb2e

由 Jeff Smith 提交于 5月 24, 2017

textconv_object is used in places other than blame.c and should be moved
to a more appropriate location.  Other textconv related functions are
located in diff.c so that seems as good a place as any.
Signed-off-by: NJeff Smith <whydoubt@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

3a35cb2e

09 5月, 2017 2 次提交

diff: enable indent heuristic by default · 33de7163

由 Stefan Beller 提交于 5月 08, 2017

The feature was included in v2.11 (released 2016-11-29) and we got no
negative feedback. Quite the opposite, all feedback we got was positive.

Turn it on by default. Users who dislike the feature can turn it off
by setting diff.indentHeuristic (which also configures plumbing commands,
see prior patches).

The change to t/t4051-diff-function-context.sh is needed because the
heuristic shifts the changed hunk in the patch.  To get the same result
regardless of the heuristic configuration, we modify the test file
differently:  We insert a completely new line after line 2, instead of
simply duplicating it.
Helped-by: NJeff King <peff@peff.net>
Signed-off-by: NStefan Beller <sbeller@google.com>
Signed-off-by: NMarc Branchaud <marcnarc@xiplink.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

33de7163

diff: make the indent heuristic part of diff's basic configuration · cf5e7722

由 Marc Branchaud 提交于 5月 08, 2017

This heuristic was originally introduced as an experimental feature,
and therefore part of the UI configuration.

But the user often sees diffs generated by plumbing commands like
diff-tree.  Moving the indent heuristic into diff's basic configuration
prepares the way for diff plumbing commands to respect the setting.

The heuristic itself merely makes the diffs more aesthetically
pleasing, without changing their correctness.  Scripts that rely on
the diff plumbing commands should not care whether or not the heuristic
is employed.
Signed-off-by: NMarc Branchaud <marcnarc@xiplink.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

cf5e7722

08 5月, 2017 1 次提交

notes-cache: convert to struct object_id · 569aa376

由 brian m. carlson 提交于 5月 06, 2017

Convert as many instances of unsigned char [20] as possible.  Update the
callers of notes_cache_get and notes_cache_put to use the new interface.
Among the functions updated are callers of
lookup_commit_reference_gently, which we will soon convert.
Signed-off-by: Nbrian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

569aa376

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致