1. 21 8月, 2017 1 次提交
  2. 18 8月, 2017 1 次提交
    • J
      diff: retire sane_truncate_fn · 08a8509e
      Junio C Hamano 提交于
      Long time ago, 23707811 ("diff: do not chomp hunk-header in the
      middle of a character", 2008-01-02) introduced sane_truncate_line()
      helper function to trim the "function header" line that is shown at
      the end of the hunk header line, in order to avoid chomping it in
      the middle of a single UTF-8 character.  It also added a facility to
      define a custom callback function to make it possible to extend it
      to non UTF-8 encodings.
      
      During the following 8 1/2 years, nobody found need for this custom
      callback facility.
      
      A custom callback function is a wrong design to use here anyway---if
      your contents need support for non UTF-8 encoding, you shouldn't
      have to write a custom function and recompile Git to plumb it in.  A
      better approach would be to extend sane_truncate_line() function and
      have a new member in emit_callback to conditionally trigger it.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      08a8509e
  3. 17 8月, 2017 2 次提交
    • J
      diff: define block by number of alphanumeric chars · f0b8fb6e
      Jonathan Tan 提交于
      The existing behavior of diff --color-moved=zebra does not define the
      minimum size of a block at all, instead relying on a heuristic applied
      later to filter out sets of adjacent moved lines that are shorter than 3
      lines long. This can be confusing, because a block could thus be colored
      as moved at the source but not at the destination (or vice versa),
      depending on its neighbors.
      
      Instead, teach diff that the minimum size of a block is 20 alphanumeric
      characters, the same heuristic used by "git blame". This allows diff to
      still exclude uninteresting lines appearing on their own (such as those
      solely consisting of one or a few closing braces), as was the intention
      of the adjacent-moved-line heuristic.
      
      This requires a change in some tests in that some of their lines are no
      longer considered to be part of a block, because they are too short.
      Signed-off-by: NJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f0b8fb6e
    • J
      diff: respect MIN_BLOCK_LENGTH for last block · 09153277
      Jonathan Tan 提交于
      Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
      that does not belong to the current block. In particular, this means
      that MIN_BLOCK_LENGTH is not checked after all lines are encountered.
      
      Perform that check.
      Signed-off-by: NJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      09153277
  4. 15 8月, 2017 1 次提交
  5. 04 8月, 2017 1 次提交
  6. 14 7月, 2017 1 次提交
    • J
      color: check color.ui in git_default_config() · 136c8c8b
      Jeff King 提交于
      Back in prehistoric times, our decision on whether or not to
      show color by default relied on using a config callback that
      either did or didn't load color config like color.diff.
      When we introduced color.ui, we put it in the same boat:
      commands had to manually respect it by using git_color_config()
      or its git_color_default_config() convenience wrapper.
      
      But in 4c7f1819 (make color.ui default to 'auto',
      2013-06-10), that changed. Since then, we default color.ui
      to auto in all programs, meaning that even plumbing commands
      like "git diff-tree --pretty" might colorize the output.
      Nobody seems to have complained in the intervening years,
      presumably because the "is stdout a tty" check does a good
      job of catching the right cases.
      
      But that leaves an interesting curiosity: color.ui defaults
      to auto even in plumbing, but you can't actually _disable_
      the color via config. So if you really hate color and set
      "color.ui" to false, diff-tree will still show color (but
      porcelain like git-diff won't).  Nobody noticed that either,
      probably because very few people disable color.
      
      One could argue that the plumbing should _always_ disable
      color unless an explicit --color option is given on the
      command line. But in practice, this creates a lot of
      complications for scripts which do want plumbing to show
      user-visible output. They can't just pass "--color" blindly;
      they need to check the user's config and decide what to
      send.
      
      Given that nobody has complained about the current behavior,
      let's assume it's a good path, and follow it to its
      conclusion: supporting color.ui everywhere.
      
      Note that you can create havoc by setting color.ui=always in
      your config, but that's more or less already the case. We
      could disallow it entirely, but it is handy for one-offs
      like:
      
        git -c color.ui=always foo >not-a-tty
      
      when "foo" does not take a --color option itself.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      136c8c8b
  7. 11 7月, 2017 1 次提交
  8. 01 7月, 2017 24 次提交
    • S
      diff.c: add dimming to moved line detection · 86b452e2
      Stefan Beller 提交于
      Any lines inside a moved block of code are not interesting. Boundaries
      of blocks are only interesting if they are next to another block of moved
      code.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      86b452e2
    • S
      diff.c: color moved lines differently, plain mode · 176841f0
      Stefan Beller 提交于
      Add the 'plain' mode for move detection of code. This omits the checking
      for adjacent blocks, so it is not as useful. If you have a lot of the
      same blocks moved in the same patch, the 'Zebra' would end up slow as it
      is O(n^2) (n is number of same blocks). So this may be useful there and
      is generally easy to add. Instead be very literal at the move detection,
      do not skip over short blocks here.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      176841f0
    • S
      diff.c: color moved lines differently · 2e2d5ac1
      Stefan Beller 提交于
      When a patch consists mostly of moving blocks of code around, it can
      be quite tedious to ensure that the blocks are moved verbatim, and not
      undesirably modified in the move. To that end, color blocks that are
      moved within the same patch differently. For example (OM, del, add,
      and NM are different colors):
      
          [OM]  -void sensitive_stuff(void)
          [OM]  -{
          [OM]  -        if (!is_authorized_user())
          [OM]  -                die("unauthorized");
          [OM]  -        sensitive_stuff(spanning,
          [OM]  -                        multiple,
          [OM]  -                        lines);
          [OM]  -}
      
                 void another_function()
                 {
          [del] -        printf("foo");
          [add] +        printf("bar");
                 }
      
          [NM]  +void sensitive_stuff(void)
          [NM]  +{
          [NM]  +        if (!is_authorized_user())
          [NM]  +                die("unauthorized");
          [NM]  +        sensitive_stuff(spanning,
          [NM]  +                        multiple,
          [NM]  +                        lines);
          [NM]  +}
      
      However adjacent blocks may be problematic. For example, in this
      potentially malicious patch, the swapping of blocks can be spotted:
      
          [OM]  -void sensitive_stuff(void)
          [OM]  -{
          [OMA] -        if (!is_authorized_user())
          [OMA] -                die("unauthorized");
          [OM]  -        sensitive_stuff(spanning,
          [OM]  -                        multiple,
          [OM]  -                        lines);
          [OMA] -}
      
                 void another_function()
                 {
          [del] -        printf("foo");
          [add] +        printf("bar");
                 }
      
          [NM]  +void sensitive_stuff(void)
          [NM]  +{
          [NMA] +        sensitive_stuff(spanning,
          [NMA] +                        multiple,
          [NMA] +                        lines);
          [NM]  +        if (!is_authorized_user())
          [NM]  +                die("unauthorized");
          [NMA] +}
      
      If the moved code is larger, it is easier to hide some permutation in the
      code, which is why some alternative coloring is needed.
      
      This patch implements the first mode:
      * basic alternating 'Zebra' mode
        This conveys all information needed to the user.  Defer customization to
        later patches.
      
      First I implemented an alternative design, which would try to fingerprint
      a line by its neighbors to detect if we are in a block or at the boundary.
      This idea iss error prone as it inspected each line and its neighboring
      lines to determine if the line was (a) moved and (b) if was deep inside
      a hunk by having matching neighboring lines. This is unreliable as the
      we can construct hunks which have equal neighbors that just exceed the
      number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
      as a line, that is permutated to AXYZCXYZBXYZD..').
      
      Instead this provides a dynamic programming greedy algorithm that finds
      the largest moved hunk and then has several modes on highlighting bounds.
      
      A note on the options '--submodule=diff' and '--color-words/--word-diff':
      In the conversion to use emit_line in the prior patches both submodules
      as well as word diff output carefully chose to call emit_line with sign=0.
      All output with sign=0 is ignored for move detection purposes in this
      patch, such that no weird looking output will be generated for these
      cases. This leads to another thought: We could pass on '--color-moved' to
      submodules such that they color up moved lines for themselves. If we'd do
      so only line moves within a repository boundary are marked up.
      
      It is useful to have moved lines colored, but there are annoying corner
      cases, such as a single line moved, that is very common. For example
      in a typical patch of C code, we have closing braces that end statement
      blocks or functions.
      
      While it is technically true that these lines are moved as they show up
      elsewhere, it is harmful for the review as the reviewers attention is
      drawn to such a minor side annoyance.
      
      For now let's have a simple solution of hardcoding the number of
      moved lines to be at least 3 before coloring them. Note, that the
      length is applied across all blocks to find the 'lonely' blocks
      that pollute new code, but do not interfere with a permutated
      block where each permutation has less lines than 3.
      Helped-by: NJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      2e2d5ac1
    • S
      diff.c: buffer all output if asked to · e6e045f8
      Stefan Beller 提交于
      Introduce a new option 'emitted_symbols' in the struct diff_options which
      controls whether all output is buffered up until all output is available.
      It is set internally in diff.c when necessary.
      
      We'll have a new struct 'emitted_string' in diff.c which will be used to
      buffer each line.  The emitted_string will duplicate the memory of the
      line to buffer as that is easiest to reason about for now. In a future
      patch we may want to decrease the memory usage by not duplicating all
      output for buffering but rather we may want to store offsets into the
      file or in case of hunk descriptions such as the similarity score, we
      could just store the relevant number and reproduce the text later on.
      
      This approach was chosen as a first step because it is quite simple
      compared to the alternative with less memory footprint.
      
      emit_diff_symbol factors out the emission part and depending on the
      diff_options->emitted_symbols the emission will be performed directly
      when calling emit_diff_symbol or after the whole process is done, i.e.
      by buffering we have add the possibility for a second pass over the
      whole output before doing the actual output.
      
      In 6440d341 (2012-03-14, diff: tweak a _copy_ of diff_options with
      word-diff) we introduced a duplicate diff options struct for word
      emissions as we may have different regex settings in there.
      When buffering the output, we need to operate on just one buffer,
      so we have to copy back the emissions of the word buffer into the
      main buffer.
      
      Unconditionally enable output via buffer in this patch as it yields
      a great opportunity for testing, i.e. all the diff tests from the
      test suite pass without having reordering issues (i.e. only parts
      of the output got buffered, and we forgot to buffer other parts).
      The test suite passes, which gives confidence that we converted all
      functions to use emit_string for output.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e6e045f8
    • S
      146fdb0d
    • S
      30b7e1e7
    • S
      diff.c: convert word diffing to use emit_diff_symbol · bd033291
      Stefan Beller 提交于
      The word diffing is not line oriented and would need some serious
      effort to be transformed into a line oriented approach, so
      just go with a symbol DIFF_SYMBOL_WORD_DIFF that is a partial line.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      bd033291
    • S
      diff.c: convert show_stats to use emit_diff_symbol · 0911c475
      Stefan Beller 提交于
      We call print_stat_summary from builtin/apply, so we still
      need the version with a file pointer, so introduce
      print_stat_summary_0 that uses emit_string machinery and
      keep print_stat_summary with the same arguments around.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      0911c475
    • S
    • S
      submodule.c: migrate diff output to use emit_diff_symbol · f3597138
      Stefan Beller 提交于
      As the submodule process is no longer attached to the same file pointer
      'o->file' as the superprojects process, there is a different result in
      color.c::check_auto_color. That is why we need to pass coloring explicitly,
      such that the submodule coloring decision will be made by the child process
      processing the submodule. Only DIFF_SYMBOL_SUBMODULE_PIPETHROUGH contains
      color, the other symbols are for embedding the submodule output into the
      superprojects output.
      
      Remove the colors from the function signatures, as all the coloring
      decisions will be made either inside the child process or the final
      emit_diff_symbol, but not in the functions driving the submodule diff.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f3597138
    • S
      5af6ea95
    • S
      diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES · 4acaaa7a
      Stefan Beller 提交于
      we could save a little bit of memory when buffering in a later mode
      by just passing the inner part ("%s and %s", file1, file 2), but
      those a just a few bytes, so instead let's reuse the implementation from
      DIFF_SYMBOL_HEADER and keep the whole line around.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      4acaaa7a
    • S
      diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER · a29b0a13
      Stefan Beller 提交于
      The header is constructed lazily including line breaks, so just emit
      the raw string as is.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a29b0a13
    • S
      diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS} · 3ee8b7bf
      Stefan Beller 提交于
      We have to use fprintf instead of emit_line, because we want to emit the
      tab after the color. This is important for ancient versions of gnu patch
      AFAICT, although we probably do not want to feed colored output to the
      patch utility, such that it would not matter if the trailing tab is
      colored. Keep the corner case as-is though.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      3ee8b7bf
    • S
      diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE · f2bb1218
      Stefan Beller 提交于
      The context marker use the exact same output pattern, so reuse it.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f2bb1218
    • S
    • S
      diff.c: migrate emit_line_checked to use emit_diff_symbol · 091f8e28
      Stefan Beller 提交于
      Add a new flags field to emit_diff_symbol, that will be used by
      context lines for:
      * white space rules that are applicable (The first 12 bits)
        Take a note in cahe.c as well, when this ws rules are extended we have
        to fix the bits in the flags field.
      * how the rules are evaluated (actually this double encodes the sign
        of the line, but the code is easier to keep this way, bits 13,14,15)
      * if the line a blank line at EOF (bit 16)
      
      The check if new lines need to be marked up as extra lines at the end of
      file, is now done unconditionally. That should be ok, as
      'new_blank_line_at_eof' has a quick early return.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      091f8e28
    • S
      b9cbfde6
    • S
    • S
      c64b420b
    • S
      diff.c: introduce emit_diff_symbol · 36a4cefd
      Stefan Beller 提交于
      In a later patch we want to buffer all output before emitting it as a
      new feature ("markup moved lines") conceptually cannot be implemented
      in a single pass over the output.
      
      There are different approaches to buffer all output such as:
      * Buffering on the char level, i.e. we'd have a char[] which would
        grow at approximately 80 characters a line. This would keep the
        output completely unstructured, but might be very easy to implement,
        such as redirecting all output to a temporary file and working off
        that. The later passes over the buffer are quite complicated though,
        because we have to parse back any output and then decide if it should
        be modified.
      
      * Buffer on a line level. As the output is mostly line oriented already,
        this would make sense, but it still is a bit awkward as we'd have to
        make sense of it again by looking at the first characters of a line
        to decide what part of a diff a line is.
      
      * Buffer semantically. Imagine there is a formal grammar for the diff
        output and we'd keep the symbols of this grammar around. This keeps
        the highest level of structure in the buffered data, such that the
        actual memory requirements are less than say the first option. Instead
        of buffering the characters of the line, we'll buffer what we intend
        to do plus additional information for the specifics. An output of
      
          diff --git a/new.txt b/new.txt
          index fa69b07..412428c 100644
          Binary files a/new.txt and b/new.txt differ
      
        could be buffered as
           DIFF_SYMBOL_DIFF_START + new.txt
           DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
           DIFF_SYMBOL_BINARY_FILES + new.txt
      
      This and the following patches introduce the third option of buffering
      by first moving any output to emit_diff_symbol, and then introducing the
      buffering in this function.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      36a4cefd
    • S
      diff.c: factor out diff_flush_patch_all_file_pairs · ec331506
      Stefan Beller 提交于
      In a later patch we want to do more things before and after all filepairs
      are flushed. So factor flushing out all file pairs into its own function
      that the new code can be plugged in easily.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ec331506
    • S
      diff.c: move line ending check into emit_hunk_header · dfb7728f
      Stefan Beller 提交于
      The emit_hunk_header() function is responsible for assembling a
      hunk header and calling emit_line() to send the hunk header
      to the output file.  Its only caller fn_out_consume() needs
      to prepare for a case where the function emits an incomplete
      line and add the terminating LF.
      
      Instead make sure emit_hunk_header() to always send a
      completed line to emit_line().
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      dfb7728f
    • S
      diff.c: readability fix · f2d2a5de
      Stefan Beller 提交于
      We already have dereferenced 'p->two' into a local variable 'two'.
      Use that.
      Signed-off-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f2d2a5de
  9. 17 6月, 2017 1 次提交
  10. 16 6月, 2017 1 次提交
  11. 14 6月, 2017 1 次提交
  12. 05 6月, 2017 1 次提交
  13. 02 6月, 2017 4 次提交