提交 · c876d6da88d9bf1f3377d4eed66fc7705e31c30e · 李少辉-开发者 / git

03 2月, 2012 4 次提交

grep: drop grep_buffer's "name" parameter · c876d6da

由 Jeff King 提交于 2月 02, 2012

Before the grep_source interface existed, grep_buffer was
used by two types of callers:

  1. Ones which pulled a file into a buffer, and then wanted
     to supply the file's name for the output (i.e.,
     git grep).

  2. Ones which really just wanted to grep a buffer (i.e.,
     git log --grep).

Callers in set (1) should now be using grep_source. Callers
in set (2) always pass NULL for the "name" parameter of
grep_buffer. We can therefore get rid of this now-useless
parameter.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c876d6da

grep: refactor the concept of "grep source" into an object · e1327023

由 Jeff King 提交于 2月 02, 2012

The main interface to the low-level grep code is
grep_buffer, which takes a pointer to a buffer and a size.
This is convenient and flexible (we use it to grep commit
bodies, files on disk, and blobs by sha1), but it makes it
hard to pass extra information about what we are grepping
(either for correctness, like overriding binary
auto-detection, or for optimizations, like lazily loading
blob contents).

Instead, let's encapsulate the idea of a "grep source",
including the buffer, its size, and where the data is coming
from. This is similar to the diff_filespec structure used by
the diff code (unsurprising, since future patches will
implement some of the same optimizations found there).

The diffstat is slightly scarier than the actual patch
content. Most of the modified lines are simply replacing
access to raw variables with their counterparts that are now
in a "struct grep_source". Most of the added lines were
taken from builtin/grep.c, which partially abstracted the
idea of grep sources (for file vs sha1 sources).

Instead of dropping the now-redundant code, this patch
leaves builtin/grep.c using the traditional grep_buffer
interface (which now wraps the grep_source interface). That
makes it easy to test that there is no change of behavior
(yet).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

e1327023

grep: move sha1-reading mutex into low-level code · b3aeb285

由 Jeff King 提交于 2月 02, 2012

The multi-threaded git-grep code needs to serialize access
to the thread-unsafe read_sha1_file call. It does this with
a mutex that is local to builtin/grep.c.

Let's instead push this down into grep.c, where it can be
used by both builtin/grep.c and grep.c. This will let us
safely teach the low-level grep.c code tricks that involve
reading from the object db.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b3aeb285

grep: make locking flag global · 78db6ea9

由 Jeff King 提交于 2月 02, 2012

The low-level grep code traditionally didn't care about
threading, as it doesn't do any threading itself and didn't
call out to other non-thread-safe code.  That changed with
0579f91d (grep: enable threading with -p and -W using lazy
attribute lookup, 2011-12-12), which pushed the lookup of
funcname attributes (which is not thread-safe) into the
low-level grep code.

As a result, the low-level code learned about a new global
"grep_attr_mutex" to serialize access to the attribute code.
A multi-threaded caller (e.g., builtin/grep.c) is expected
to initialize the mutex and set "use_threads" in the
grep_opt structure. The low-level code only uses the lock if
use_threads is set.

However, putting the use_threads flag into the grep_opt
struct is not the most logical place. Whether threading is
in use is not something that matters for each call to
grep_buffer, but is instead global to the whole program
(i.e., if any thread is doing multi-threaded grep, every
other thread, even if it thinks it is doing its own
single-threaded grep, would need to use the locking).  In
practice, this distinction isn't a problem for us, because
the only user of multi-threaded grep is "git-grep", which
does nothing except call grep.

This patch turns the opt->use_threads flag into a global
flag. More important than the nit-picking semantic argument
above is that this means that the locking functions don't
need to actually have access to a grep_opt to know whether
to lock. Which in turn can make adding new locks simpler, as
we don't need to pass around a grep_opt.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

78db6ea9

17 12月, 2011 1 次提交

grep: enable threading with -p and -W using lazy attribute lookup · 0579f91d

由 Thomas Rast 提交于 12月 12, 2011

Lazily load the userdiff attributes in match_funcname().  Use a
separate mutex around this loading to protect the (not thread-safe)
attributes machinery.  This lets us re-enable threading with -p and
-W while reducing the overhead caused by looking up attributes.
Signed-off-by: NThomas Rast <trast@student.ethz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

0579f91d

13 12月, 2011 1 次提交

grep: load funcname patterns for -W · b8ffedca

由 Thomas Rast 提交于 12月 12, 2011

git-grep avoids loading the funcname patterns unless they are needed.
ba8ea749 (grep: add option to show whole function as context,
2011-08-01) forgot to extend this test also to the new funcbody
feature.  Do so.

The catch is that we also have to disable threading when using
userdiff, as explained in grep_threads_ok().  So we must be careful to
introduce the same test there.
Signed-off-by: NThomas Rast <trast@student.ethz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b8ffedca

21 8月, 2011 1 次提交

Use kwset in grep · 9eceddee

由 Fredrik Kuivinen 提交于 8月 21, 2011

Benchmarks for the hot cache case:

before:
$ perf stat --repeat=5 git grep qwerty > /dev/null

Performance counter stats for 'git grep qwerty' (5 runs):

        3,478,085 cache-misses             #      2.322 M/sec   ( +-   2.690% )
       11,356,177 cache-references         #      7.582 M/sec   ( +-   2.598% )
        3,872,184 branch-misses            #      0.363 %       ( +-   0.258% )
    1,067,367,848 branches                 #    712.673 M/sec   ( +-   2.622% )
    3,828,370,782 instructions             #      0.947 IPC     ( +-   0.033% )
    4,043,832,831 cycles                   #   2700.037 M/sec   ( +-   0.167% )
            8,518 page-faults              #      0.006 M/sec   ( +-   3.648% )
              847 CPU-migrations           #      0.001 M/sec   ( +-   3.262% )
            6,546 context-switches         #      0.004 M/sec   ( +-   2.292% )
      1497.695495 task-clock-msecs         #      3.303 CPUs    ( +-   2.550% )

       0.453394396  seconds time elapsed   ( +-   0.912% )

after:
$ perf stat --repeat=5 git grep qwerty > /dev/null

Performance counter stats for 'git grep qwerty' (5 runs):

        2,989,918 cache-misses             #      3.166 M/sec   ( +-   5.013% )
       10,986,041 cache-references         #     11.633 M/sec   ( +-   4.899% )  (scaled from 95.06%)
        3,511,993 branch-misses            #      1.422 %       ( +-   0.785% )
      246,893,561 branches                 #    261.433 M/sec   ( +-   3.967% )
    1,392,727,757 instructions             #      0.564 IPC     ( +-   0.040% )
    2,468,142,397 cycles                   #   2613.494 M/sec   ( +-   0.110% )
            7,747 page-faults              #      0.008 M/sec   ( +-   3.995% )
              897 CPU-migrations           #      0.001 M/sec   ( +-   2.383% )
            6,535 context-switches         #      0.007 M/sec   ( +-   1.993% )
       944.384228 task-clock-msecs         #      3.177 CPUs    ( +-   0.268% )

       0.297257643  seconds time elapsed   ( +-   0.450% )

So we gain about 35% by using the kwset code.

As a side effect of using kwset two grep tests are fixed by this
patch. The first is fixed because kwset can deal with case-insensitive
search containing NULs, something strcasestr cannot do. The second one
is fixed because we consider patterns containing NULs as fixed strings
(regcomp cannot accept patterns with NULs).
Signed-off-by: NFredrik Kuivinen <frekui@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9eceddee

20 8月, 2011 1 次提交

color: delay auto-color decision until point of use · daa0c3d9

由 Jeff King 提交于 8月 17, 2011

When we read a color value either from a config file or from
the command line, we use git_config_colorbool to convert it
from the tristate always/never/auto into a single yes/no
boolean value.

This has some timing implications with respect to starting
a pager.

If we start (or decide not to start) the pager before
checking the colorbool, everything is fine. Either isatty(1)
will give us the right information, or we will properly
check for pager_in_use().

However, if we decide to start a pager after we have checked
the colorbool, things are not so simple. If stdout is a tty,
then we will have already decided to use color. However, the
user may also have configured color.pager not to use color
with the pager. In this case, we need to actually turn off
color. Unfortunately, the pager code has no idea which color
variables were turned on (and there are many of them
throughout the code, and they may even have been manipulated
after the colorbool selection by something like "--color" on
the command line).

This bug can be seen any time a pager is started after
config and command line options are checked. This has
affected "git diff" since 89d07f75 (diff: don't run pager if
user asked for a diff style exit code, 2007-08-12). It has
also affect the log family since 1fda91b5 (Fix 'git log'
early pager startup error case, 2010-08-24).

This patch splits the notion of parsing a colorbool and
actually checking the configuration. The "use_color"
variables now have an additional possible value,
GIT_COLOR_AUTO. Users of the variable should use the new
"want_color()" wrapper, which will lazily determine and
cache the auto-color decision.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

daa0c3d9

02 8月, 2011 1 次提交

grep: add option to show whole function as context · ba8ea749

由 René Scharfe 提交于 8月 01, 2011

Add a new option, -W, to show the whole surrounding function of a match.

It uses the same regular expressions as -p and diff to find the beginning
of sections.

Currently it will not display comments in front of a function, but those
that are following one.  Despite this shortcoming it is already useful,
e.g. to simply see a more complete applicable context or to extract whole
functions.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ba8ea749

06 6月, 2011 3 次提交

grep: add --heading · 1d84f72e

由 René Scharfe 提交于 6月 05, 2011

With --heading, the filename is printed once before matches from that
file instead of at the start of each line, giving more screen space to
the actual search results.

This option is taken from ack (http://betterthangrep.com/).  And now
git grep can dress up like it:

	$ git config alias.ack "grep --break --heading --line-number"

	$ git ack -e --heading
	Documentation/git-grep.txt
	154:--heading::

	t/t7810-grep.sh
	785:test_expect_success 'grep --heading' '
	786:    git grep --heading -e char -e lo_w hello.c hello_world >actual &&
	808:    git grep --break --heading -n --color \
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1d84f72e

grep: add --break · a8f0e764

由 René Scharfe 提交于 6月 05, 2011

With --break, an empty line is printed between matches from different
files, increasing readability.  This option is taken from ack
(http://betterthangrep.com/).
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a8f0e764

grep: fix coloring of hunk marks between files · 08303c36

由 René Scharfe 提交于 6月 05, 2011

Commit 431d6e7b (grep: enable threading for context line printing)
split the printing of the "--\n" mark between results from different
files out into two places: show_line() in grep.c for the non-threaded
case and work_done() in builtin/grep.c for the threaded case. Commit
55f638bd (grep: Colorize filename, line number, and separator) updated
the former, but not the latter, so the separators between files are
not colored if threads are used.

This patch merges the two. In the threaded case, hunk marks are now
printed by show_line() for every file, including the first one, and the
very first mark is simply skipped in work_done(). This ensures that the
output is properly colored and works just as well.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

08303c36

10 5月, 2011 3 次提交

git-grep: Learn PCRE · 63e7e9d8

由 Michał Kiedrowicz 提交于 5月 09, 2011

This patch teaches git-grep the --perl-regexp/-P options (naming
borrowed from GNU grep) in order to allow specifying PCRE regexes on the
command line.

PCRE has a number of features which make them more handy to use than
POSIX regexes, like consistent escaping rules, extended character
classes, ungreedy matching etc.

git isn't build with PCRE support automatically. USE_LIBPCRE environment
variable must be enabled (like `make USE_LIBPCRE=YesPlease`).
Signed-off-by: NMichał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

63e7e9d8

grep: Extract compile_regexp_failed() from compile_regexp() · a30c148a

由 Michał Kiedrowicz 提交于 5月 09, 2011

This simplifies compile_regexp() a little and allows re-using error
handling code.
Signed-off-by: NMichał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a30c148a

grep: Fix a typo in a comment · 8997da38

由 Michał Kiedrowicz 提交于 5月 09, 2011

Signed-off-by: NMichał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

8997da38

05 5月, 2011 1 次提交

grep: Put calls to fixmatch() and regmatch() into patmatch() · 97e77784

由 Michał Kiedrowicz 提交于 5月 05, 2011

Both match_one_pattern() and look_ahead() use fixmatch() and regmatch()
in the same way. They really want to match a pattern againt a string,
but now they need to know if the pattern is fixed or regexp.

This change cleans this up by introducing patmatch() (from "pattern
match") and also simplifies inserting other ways of matching a string.
Signed-off-by: NMichał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

97e77784

13 9月, 2010 2 次提交

log --author: take union of multiple "author" requests · 5aaeb733

由 Junio C Hamano 提交于 9月 12, 2010

In the olden days,

    log --author=me --committer=him --grep=this --grep=that

used to be turned into:

    (OR (HEADER-AUTHOR me)
        (HEADER-COMMITTER him)
        (PATTERN this)
        (PATTERN that))

showing my patches that do not have any "this" nor "that", which was
totally useless.

80235ba7 ("log --author=me --grep=it" should find intersection, not union,
2010-01-17) improved it greatly to turn the same into:

    (ALL-MATCH
      (HEADER-AUTHOR me)
      (HEADER-COMMITTER him)
      (OR (PATTERN this) (PATTERN that)))

That is, "show only patches by me and committed by him, that have either
this or that", which is a lot more natural thing to ask.

We however need to be a bit more clever when the user asks more than one
"author" (or "committer"); because a commit has only one author (and one
committer), they ought to be interpreted as asking for union to be useful.
The current implementation simply added another author/committer pattern
at the same top-level for ALL-MATCH to insist on matching all, finding
nothing.

Turn

    log --author=me --author=her \
    	--committer=him --committer=you \
	--grep=this --grep=that

into

    (ALL-MATCH
      (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her))
      (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you))
      (OR (PATTERN this) (PATTERN that)))

instead.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5aaeb733

grep: move logic to compile header pattern into a separate helper · 95ce9ce2

由 Junio C Hamano 提交于 9月 12, 2010

The callers should be queuing only GREP_PATTERN_HEAD elements to the
header_list queue; simplify the switch and guard it with an assert.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

95ce9ce2

25 5月, 2010 7 次提交

grep: support NUL chars in search strings for -F · ed40a095

由 René Scharfe 提交于 5月 22, 2010

Search patterns in a file specified with -f can contain NUL characters.
The current code ignores all characters on a line after a NUL.

Pass the actual length of the line all the way from the pattern file to
fixmatch() and use it for case-sensitive fixed string matching.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ed40a095

grep: use REG_STARTEND for all matching if available · f96e5673

由 René Scharfe 提交于 5月 22, 2010

Refactor REG_STARTEND handling inlook_ahead() into a new helper,
regmatch(), and use it for line matching, too.  This allows regex
matching beyond NUL characters if regexec() supports the flag.  NUL
characters themselves are not matched in any way, though.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f96e5673

grep: continue case insensitive fixed string search after NUL chars · 52d799a7

由 René Scharfe 提交于 5月 22, 2010

Functions for C strings, like strcasestr(), can't see beyond NUL
characters.  Check if there is such an obstacle on the line and try
again behind it.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

52d799a7

grep: use memmem() for fixed string search · 1baddf4b

由 René Scharfe 提交于 5月 22, 2010

Allow searching beyond NUL characters by using memmem() instead of
strstr().
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1baddf4b

grep: --name-only over binary · 321ffcc0

由 René Scharfe 提交于 5月 22, 2010

As with the option -c/--count, git grep with the option -l/--name-only
should work the same with binary files as with text files because
there is no danger of messing up the terminal with control characters
from the contents of matching files.  GNU grep does the same.

Move the check for ->name_only before the one for binary_match_only,
thus making the latter irrelevant for git grep -l.
Reported-by: NDmitry Potapov <dpotapov@gmail.com>
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

321ffcc0

grep: --count over binary · c30c10cf

由 René Scharfe 提交于 5月 22, 2010

The intent of showing the message "Binary file xyz matches" for
binary files is to avoid annoying users by potentially messing up
their terminals by printing control characters.  In --count mode,
this precaution isn't necessary.

Display counts of matches if -c/--count was specified, even if -a
was not given.  GNU grep does the same.

Moving the check for ->count before the code for handling binary
file also avoids printing context lines if --count and -[ABC] were
used together, so we can remove the part of the comment that
mentions this behaviour.  Again, GNU grep does the same.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c30c10cf

grep: grep: refactor handling of binary mode options · 64fcec78

由 René Scharfe 提交于 5月 22, 2010

Turn the switch inside-out and add labels for each possible value
of ->binary.  This makes the code easier to read and avoids calling
buffer_is_binary() if the option -a was given.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

64fcec78

16 3月, 2010 1 次提交

grep: enable threading for context line printing · 431d6e7b

由 René Scharfe 提交于 3月 15, 2010

If context lines are to be printed, grep separates them with hunk marks
("--\n").  These marks are printed between matches from different files,
too.  They are not printed before the first file, though.

Threading was disabled when context line printing was enabled because
avoiding to print the mark before the first line was an unsolved
synchronisation problem.  This patch separates the code for printing
hunk marks for the threaded and the unthreaded case, allowing threading
to be turned on together with the common -ABC options.

->show_hunk_mark, which controls printing of hunk marks between files in
show_line(), is now set in grep_buffer_1(), but only if some results
have already been printed and threading is disabled.  The threaded case
is handled in work_done().
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

431d6e7b

08 3月, 2010 2 次提交

grep: Colorize selected, context, and function lines · 00588bb5

由 Mark Lodato 提交于 3月 07, 2010

Colorize non-matching text of selected lines, context lines, and
function name lines.  The default for all three is no color, but they
can be configured using color.grep.<slot>.  The first two are similar
to the corresponding options in GNU grep, except that GNU grep applies
the color to the entire line, not just non-matching text.
Signed-off-by: NMark Lodato <lodatom@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

00588bb5

grep: Colorize filename, line number, and separator · 55f638bd

由 Mark Lodato 提交于 3月 07, 2010

Colorize the filename, line number, and separator in git grep output, as
GNU grep does. The colors are customizable through color.grep.<slot>.
The default is to only color the separator (in cyan), since this gives
the biggest legibility increase without overwhelming the user with
colors. GNU grep also defaults cyan for the separator, but defaults to
magenta for the filename and to green for the line number, as well.

There is one difference from GNU grep: When a binary file matches
without -a, GNU grep does not color the <file> in "Binary file <file>
matches", but we do.

Like GNU grep, if --null is given, the null separators are not colored.

For config.txt, use a a sub-list to describe the slots, rather than
a single paragraph with parentheses, since this is much more readable.

Remove the cast to int for `rm_eo - rm_so` since it is not necessary.
Signed-off-by: NMark Lodato <lodatom@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

55f638bd

04 2月, 2010 1 次提交

grep: simplify assignment of ->fixed · 79286102

由 René Scharfe 提交于 2月 03, 2010

After 885d211e, the value of the ->fixed pattern option only depends on
the grep option of the same name. Regex flags don't matter any more,
because fixed mode and regex mode are strictly separated. Thus we can
simply copy the value from struct grep_opt to struct grep_pat, as we do
already for ->word_regexp and ->ignore_case.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

79286102

27 1月, 2010 2 次提交

grep: use REG_STARTEND (if available) to speed up regexec · 24072c02

由 Benjamin Kramer 提交于 1月 26, 2010

BSD and glibc have an extension to regexec which takes a buffer + length pair
instead of a NUL-terminated string. Since we already have the length computed
this can save us a strlen call inside regexec.
Signed-off-by: NBenjamin Kramer <benny.kra@googlemail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

24072c02

Threaded grep · 5b594f45

由 Fredrik Kuivinen 提交于 1月 25, 2010

Make git grep use threads when it is available.

The results below are best of five runs in the Linux repository (on a
box with two cores).

With the patch:

git grep qwerty
1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+5774minor)pagefaults 0swaps

Without:

git grep qwerty
1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+3716minor)pagefaults 0swaps

And with a pattern with quite a few matches:

With the patch:

$ /usr/bin/time git grep void
5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+5587minor)pagefaults 0swaps

Without:

$ /usr/bin/time git grep void
5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+3693minor)pagefaults 0swaps

In either case we gain about 40% by the threading.
Signed-off-by: NFredrik Kuivinen <frekui@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5b594f45

26 1月, 2010 1 次提交

"log --author=me --grep=it" should find intersection, not union · 80235ba7

由 Junio C Hamano 提交于 1月 17, 2010

Historically, any grep filter in "git log" family of commands were taken
as restricting to commits with any of the words in the commit log message.
However, the user almost always want to find commits "done by this person
on that topic".  With "--all-match" option, a series of grep patterns can
be turned into a requirement that all of them must produce a match, but
that makes it impossible to ask for "done by me, on either this or that"
with:

	log --author=me --committer=him --grep=this --grep=that

because it will require both "this" and "that" to appear.

Change the "header" parser of grep library to treat the headers specially,
and parse it as:

	(all-match-OR (HEADER-AUTHOR me)
		      (HEADER-COMMITTER him)
		      (OR
		      	(PATTERN this)
			(PATTERN that) ) )

Even though the "log" command line parser doesn't give direct access to
the extended grep syntax to group terms with parentheses, this change will
cover the majority of the case the users would want.

This incidentally revealed that one test in t7002 was bogus.  It ran:

	log --author=Thor --grep=Thu --format='%s'

and expected (wrongly) "Thu" to match "Thursday" in the author/committer
date, but that would never match, as the timestamp in raw commit buffer
does not have the name of the day-of-the-week.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

80235ba7

13 1月, 2010 1 次提交

grep: rip out pessimization to use fixmatch() · 885d211e

由 Junio C Hamano 提交于 1月 12, 2010

Even when running without the -F (--fixed-strings) option, we checked the
pattern and used fixmatch() codepath when it does not contain any regex
magic.  Finding fixed strings with strstr() surely must be faster than
running the regular expression crud.

Not so.  It turns out that on some libc implementations, using the
regcomp()/regexec() pair is a lot faster than running strstr() and
strcasestr() the fixmatch() codepath uses.  Drop the optimization and use
the fixmatch() codepath only when the user explicitly asked for it with
the -F option.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

885d211e

12 1月, 2010 1 次提交

grep: optimize built-in grep by skipping lines that do not hit · a26345b6

由 Junio C Hamano 提交于 1月 10, 2010

The internal "grep" engine we use checks for hits line-by-line, instead of
letting the underlying regexec()/fixmatch() routines scan for the first
match from the rest of the buffer.  This was a major source of overhead
compared to the external grep.

Introduce a "look-ahead" mechanism to find the next line that would
potentially match by using regexec()/fixmatch() in the remainder of the
text to skip unmatching lines, and use it when the query criteria is
simple enough (i.e. punt for an advanced grep boolean expression like
"lines that have both X and Y but not Z" for now) and we are not running
under "-v" (aka "--invert-match") option.

Note that "-L" (aka "--files-without-match") is not a reason to disable
this optimization.  Under the option, we are interested if the file has
any hit at all, and that is what we determine reliably with or without the
optimization.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a26345b6

17 11月, 2009 1 次提交

grep: Allow case insensitive search of fixed-strings · 5183bf67

由 Brian Collins 提交于 11月 06, 2009

"git grep" currently an error when you combine the -F and -i flags.
This isn't in line with how GNU grep handles it.

This patch allows the simultaneous use of those flags.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NBrian Collins <bricollins@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5183bf67

03 7月, 2009 1 次提交

grep: simplify -p output · ed24e401

由 René Scharfe 提交于 7月 02, 2009

It was found a bit too loud to show == separators between the function
headers.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ed24e401

02 7月, 2009 4 次提交

grep -p: support user defined regular expressions · 60ecac98

由 René Scharfe 提交于 7月 02, 2009

Respect the userdiff attributes and config settings when looking for
lines with function definitions in git grep -p.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

60ecac98

grep: add option -p/--show-function · 2944e4e6

由 René Scharfe 提交于 7月 02, 2009

The new option -p instructs git grep to print the previous function
definition as a context line, similar to diff -p. Such context lines
are marked with an equal sign instead of a dash. This option
complements the existing context options -A, -B, -C.

Function definitions are detected using the same heuristic that diff
uses. User defined regular expressions are not supported, yet.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

2944e4e6

grep: handle pre context lines on demand · 49de3216

由 René Scharfe 提交于 7月 02, 2009

Factor out pre context line handling into the new function
show_pre_context() and change the algorithm to rewind by looking for
newline characters and roll forward again, instead of maintaining an
array of line beginnings and ends.

This is slower for hits, but the cost for non-matching lines becomes
zero.  Normally, there are far more non-matching lines, so the time
spent in total decreases.

Before this patch (current Linux kernel repo, best of five runs):

	$ time git grep --no-ext-grep -B1 memset >/dev/null

	real	0m2.134s
	user	0m1.932s
	sys	0m0.196s

	$ time git grep --no-ext-grep -B1000 memset >/dev/null

	real	0m12.059s
	user	0m11.837s
	sys	0m0.224s

The same with this patch:

	$ time git grep --no-ext-grep -B1 memset >/dev/null

	real	0m2.117s
	user	0m1.892s
	sys	0m0.228s

	$ time git grep --no-ext-grep -B1000 memset >/dev/null

	real	0m2.986s
	user	0m2.696s
	sys	0m0.288s
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

49de3216

grep: print context hunk marks between files · 046802d0

由 René Scharfe 提交于 7月 02, 2009

Print a hunk mark before matches from a new file are shown, in addition
to the current behaviour of printing them if lines have been skipped.

The result is easier to read, as (presumably unrelated) matches from
different files are separated by a hunk mark. GNU grep does the same.
Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

046802d0

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致