1. 21 5月, 2012 1 次提交
  2. 03 2月, 2012 6 次提交
    • J
      grep: respect diff attributes for binary-ness · 41b59bfc
      Jeff King 提交于
      There is currently no way for users to tell git-grep that a
      particular path is or is not a binary file; instead, grep
      always relies on its auto-detection (or the user specifying
      "-a" to treat all binary-looking files like text).
      
      This patch teaches git-grep to use the same attribute lookup
      that is used by git-diff. We could add a new "grep" flag,
      but that is unnecessarily complex and unlikely to be useful.
      Despite the name, the "-diff" attribute (or "diff=foo" and
      the associated diff.foo.binary config option) are really
      about describing the contents of the path. It's simply
      historical that diff was the only thing that cared about
      these attributes in the past.
      
      And if this simple approach turns out to be insufficient, we
      still have a backwards-compatible path forward: we can add a
      separate "grep" attribute, and fall back to respecting
      "diff" if it is unset.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      41b59bfc
    • J
      grep: cache userdiff_driver in grep_source · 94ad9d9e
      Jeff King 提交于
      Right now, grep only uses the userdiff_driver for one thing:
      looking up funcname patterns for "-p" and "-W".  As new uses
      for userdiff drivers are added to the grep code, we want to
      minimize attribute lookups, which can be expensive.
      
      It might seem at first that this would also optimize multiple
      lookups when the funcname pattern for a file is needed
      multiple times. However, the compiled funcname pattern is
      already cached in struct grep_opt's "priv" member, so
      multiple lookups are already suppressed.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      94ad9d9e
    • J
      grep: drop grep_buffer's "name" parameter · c876d6da
      Jeff King 提交于
      Before the grep_source interface existed, grep_buffer was
      used by two types of callers:
      
        1. Ones which pulled a file into a buffer, and then wanted
           to supply the file's name for the output (i.e.,
           git grep).
      
        2. Ones which really just wanted to grep a buffer (i.e.,
           git log --grep).
      
      Callers in set (1) should now be using grep_source. Callers
      in set (2) always pass NULL for the "name" parameter of
      grep_buffer. We can therefore get rid of this now-useless
      parameter.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c876d6da
    • J
      grep: refactor the concept of "grep source" into an object · e1327023
      Jeff King 提交于
      The main interface to the low-level grep code is
      grep_buffer, which takes a pointer to a buffer and a size.
      This is convenient and flexible (we use it to grep commit
      bodies, files on disk, and blobs by sha1), but it makes it
      hard to pass extra information about what we are grepping
      (either for correctness, like overriding binary
      auto-detection, or for optimizations, like lazily loading
      blob contents).
      
      Instead, let's encapsulate the idea of a "grep source",
      including the buffer, its size, and where the data is coming
      from. This is similar to the diff_filespec structure used by
      the diff code (unsurprising, since future patches will
      implement some of the same optimizations found there).
      
      The diffstat is slightly scarier than the actual patch
      content. Most of the modified lines are simply replacing
      access to raw variables with their counterparts that are now
      in a "struct grep_source". Most of the added lines were
      taken from builtin/grep.c, which partially abstracted the
      idea of grep sources (for file vs sha1 sources).
      
      Instead of dropping the now-redundant code, this patch
      leaves builtin/grep.c using the traditional grep_buffer
      interface (which now wraps the grep_source interface). That
      makes it easy to test that there is no change of behavior
      (yet).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e1327023
    • J
      grep: move sha1-reading mutex into low-level code · b3aeb285
      Jeff King 提交于
      The multi-threaded git-grep code needs to serialize access
      to the thread-unsafe read_sha1_file call. It does this with
      a mutex that is local to builtin/grep.c.
      
      Let's instead push this down into grep.c, where it can be
      used by both builtin/grep.c and grep.c. This will let us
      safely teach the low-level grep.c code tricks that involve
      reading from the object db.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b3aeb285
    • J
      grep: make locking flag global · 78db6ea9
      Jeff King 提交于
      The low-level grep code traditionally didn't care about
      threading, as it doesn't do any threading itself and didn't
      call out to other non-thread-safe code.  That changed with
      0579f91d (grep: enable threading with -p and -W using lazy
      attribute lookup, 2011-12-12), which pushed the lookup of
      funcname attributes (which is not thread-safe) into the
      low-level grep code.
      
      As a result, the low-level code learned about a new global
      "grep_attr_mutex" to serialize access to the attribute code.
      A multi-threaded caller (e.g., builtin/grep.c) is expected
      to initialize the mutex and set "use_threads" in the
      grep_opt structure. The low-level code only uses the lock if
      use_threads is set.
      
      However, putting the use_threads flag into the grep_opt
      struct is not the most logical place. Whether threading is
      in use is not something that matters for each call to
      grep_buffer, but is instead global to the whole program
      (i.e., if any thread is doing multi-threaded grep, every
      other thread, even if it thinks it is doing its own
      single-threaded grep, would need to use the locking).  In
      practice, this distinction isn't a problem for us, because
      the only user of multi-threaded grep is "git-grep", which
      does nothing except call grep.
      
      This patch turns the opt->use_threads flag into a global
      flag. More important than the nit-picking semantic argument
      above is that this means that the locking functions don't
      need to actually have access to a grep_opt to know whether
      to lock. Which in turn can make adding new locks simpler, as
      we don't need to pass around a grep_opt.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      78db6ea9
  3. 17 12月, 2011 1 次提交
  4. 21 8月, 2011 1 次提交
    • F
      Use kwset in grep · 9eceddee
      Fredrik Kuivinen 提交于
      Benchmarks for the hot cache case:
      
      before:
      $ perf stat --repeat=5 git grep qwerty > /dev/null
      
      Performance counter stats for 'git grep qwerty' (5 runs):
      
              3,478,085 cache-misses             #      2.322 M/sec   ( +-   2.690% )
             11,356,177 cache-references         #      7.582 M/sec   ( +-   2.598% )
              3,872,184 branch-misses            #      0.363 %       ( +-   0.258% )
          1,067,367,848 branches                 #    712.673 M/sec   ( +-   2.622% )
          3,828,370,782 instructions             #      0.947 IPC     ( +-   0.033% )
          4,043,832,831 cycles                   #   2700.037 M/sec   ( +-   0.167% )
                  8,518 page-faults              #      0.006 M/sec   ( +-   3.648% )
                    847 CPU-migrations           #      0.001 M/sec   ( +-   3.262% )
                  6,546 context-switches         #      0.004 M/sec   ( +-   2.292% )
            1497.695495 task-clock-msecs         #      3.303 CPUs    ( +-   2.550% )
      
             0.453394396  seconds time elapsed   ( +-   0.912% )
      
      after:
      $ perf stat --repeat=5 git grep qwerty > /dev/null
      
      Performance counter stats for 'git grep qwerty' (5 runs):
      
              2,989,918 cache-misses             #      3.166 M/sec   ( +-   5.013% )
             10,986,041 cache-references         #     11.633 M/sec   ( +-   4.899% )  (scaled from 95.06%)
              3,511,993 branch-misses            #      1.422 %       ( +-   0.785% )
            246,893,561 branches                 #    261.433 M/sec   ( +-   3.967% )
          1,392,727,757 instructions             #      0.564 IPC     ( +-   0.040% )
          2,468,142,397 cycles                   #   2613.494 M/sec   ( +-   0.110% )
                  7,747 page-faults              #      0.008 M/sec   ( +-   3.995% )
                    897 CPU-migrations           #      0.001 M/sec   ( +-   2.383% )
                  6,535 context-switches         #      0.007 M/sec   ( +-   1.993% )
             944.384228 task-clock-msecs         #      3.177 CPUs    ( +-   0.268% )
      
             0.297257643  seconds time elapsed   ( +-   0.450% )
      
      So we gain about 35% by using the kwset code.
      
      As a side effect of using kwset two grep tests are fixed by this
      patch. The first is fixed because kwset can deal with case-insensitive
      search containing NULs, something strcasestr cannot do. The second one
      is fixed because we consider patterns containing NULs as fixed strings
      (regcomp cannot accept patterns with NULs).
      Signed-off-by: NFredrik Kuivinen <frekui@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      9eceddee
  5. 02 8月, 2011 1 次提交
  6. 06 6月, 2011 2 次提交
  7. 10 5月, 2011 1 次提交
    • M
      git-grep: Learn PCRE · 63e7e9d8
      Michał Kiedrowicz 提交于
      This patch teaches git-grep the --perl-regexp/-P options (naming
      borrowed from GNU grep) in order to allow specifying PCRE regexes on the
      command line.
      
      PCRE has a number of features which make them more handy to use than
      POSIX regexes, like consistent escaping rules, extended character
      classes, ungreedy matching etc.
      
      git isn't build with PCRE support automatically. USE_LIBPCRE environment
      variable must be enabled (like `make USE_LIBPCRE=YesPlease`).
      Signed-off-by: NMichał Kiedrowicz <michal.kiedrowicz@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      63e7e9d8
  8. 13 9月, 2010 1 次提交
    • J
      log --author: take union of multiple "author" requests · 5aaeb733
      Junio C Hamano 提交于
      In the olden days,
      
          log --author=me --committer=him --grep=this --grep=that
      
      used to be turned into:
      
          (OR (HEADER-AUTHOR me)
              (HEADER-COMMITTER him)
              (PATTERN this)
              (PATTERN that))
      
      showing my patches that do not have any "this" nor "that", which was
      totally useless.
      
      80235ba7 ("log --author=me --grep=it" should find intersection, not union,
      2010-01-17) improved it greatly to turn the same into:
      
          (ALL-MATCH
            (HEADER-AUTHOR me)
            (HEADER-COMMITTER him)
            (OR (PATTERN this) (PATTERN that)))
      
      That is, "show only patches by me and committed by him, that have either
      this or that", which is a lot more natural thing to ask.
      
      We however need to be a bit more clever when the user asks more than one
      "author" (or "committer"); because a commit has only one author (and one
      committer), they ought to be interpreted as asking for union to be useful.
      The current implementation simply added another author/committer pattern
      at the same top-level for ALL-MATCH to insist on matching all, finding
      nothing.
      
      Turn
      
          log --author=me --author=her \
          	--committer=him --committer=you \
      	--grep=this --grep=that
      
      into
      
          (ALL-MATCH
            (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her))
            (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you))
            (OR (PATTERN this) (PATTERN that)))
      
      instead.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5aaeb733
  9. 01 6月, 2010 1 次提交
    • G
      enums: omit trailing comma for portability · 4b05548f
      Gary V. Vaughan 提交于
      Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX
      5.1 fails to compile git.
      
      enum style is inconsistent already, with some enums declared on one
      line, some over 3 lines with the enum values all on the middle line,
      sometimes with 1 enum value per line... and independently of that the
      trailing comma is sometimes present and other times absent, often
      mixing with/without trailing comma styles in a single file, and
      sometimes in consecutive enum declarations.
      
      Clearly, omitting the comma is the more portable style, and this patch
      changes all enum declarations to use the portable omitted dangling
      comma style consistently.
      Signed-off-by: NGary V. Vaughan <gary@thewrittenword.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      4b05548f
  10. 25 5月, 2010 1 次提交
  11. 08 3月, 2010 2 次提交
    • M
      grep: Colorize selected, context, and function lines · 00588bb5
      Mark Lodato 提交于
      Colorize non-matching text of selected lines, context lines, and
      function name lines.  The default for all three is no color, but they
      can be configured using color.grep.<slot>.  The first two are similar
      to the corresponding options in GNU grep, except that GNU grep applies
      the color to the entire line, not just non-matching text.
      Signed-off-by: NMark Lodato <lodatom@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      00588bb5
    • M
      grep: Colorize filename, line number, and separator · 55f638bd
      Mark Lodato 提交于
      Colorize the filename, line number, and separator in git grep output, as
      GNU grep does.  The colors are customizable through color.grep.<slot>.
      The default is to only color the separator (in cyan), since this gives
      the biggest legibility increase without overwhelming the user with
      colors.  GNU grep also defaults cyan for the separator, but defaults to
      magenta for the filename and to green for the line number, as well.
      
      There is one difference from GNU grep: When a binary file matches
      without -a, GNU grep does not color the <file> in "Binary file <file>
      matches", but we do.
      
      Like GNU grep, if --null is given, the null separators are not colored.
      
      For config.txt, use a a sub-list to describe the slots, rather than
      a single paragraph with parentheses, since this is much more readable.
      
      Remove the cast to int for `rm_eo - rm_so` since it is not necessary.
      Signed-off-by: NMark Lodato <lodatom@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      55f638bd
  12. 27 1月, 2010 1 次提交
    • F
      Threaded grep · 5b594f45
      Fredrik Kuivinen 提交于
      Make git grep use threads when it is available.
      
      The results below are best of five runs in the Linux repository (on a
      box with two cores).
      
      With the patch:
      
      git grep qwerty
      1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+5774minor)pagefaults 0swaps
      
      Without:
      
      git grep qwerty
      1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+3716minor)pagefaults 0swaps
      
      And with a pattern with quite a few matches:
      
      With the patch:
      
      $ /usr/bin/time git grep void
      5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+5587minor)pagefaults 0swaps
      
      Without:
      
      $ /usr/bin/time git grep void
      5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+3693minor)pagefaults 0swaps
      
      In either case we gain about 40% by the threading.
      Signed-off-by: NFredrik Kuivinen <frekui@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5b594f45
  13. 26 1月, 2010 1 次提交
    • J
      "log --author=me --grep=it" should find intersection, not union · 80235ba7
      Junio C Hamano 提交于
      Historically, any grep filter in "git log" family of commands were taken
      as restricting to commits with any of the words in the commit log message.
      However, the user almost always want to find commits "done by this person
      on that topic".  With "--all-match" option, a series of grep patterns can
      be turned into a requirement that all of them must produce a match, but
      that makes it impossible to ask for "done by me, on either this or that"
      with:
      
      	log --author=me --committer=him --grep=this --grep=that
      
      because it will require both "this" and "that" to appear.
      
      Change the "header" parser of grep library to treat the headers specially,
      and parse it as:
      
      	(all-match-OR (HEADER-AUTHOR me)
      		      (HEADER-COMMITTER him)
      		      (OR
      		      	(PATTERN this)
      			(PATTERN that) ) )
      
      Even though the "log" command line parser doesn't give direct access to
      the extended grep syntax to group terms with parentheses, this change will
      cover the majority of the case the users would want.
      
      This incidentally revealed that one test in t7002 was bogus.  It ran:
      
      	log --author=Thor --grep=Thu --format='%s'
      
      and expected (wrongly) "Thu" to match "Thursday" in the author/committer
      date, but that would never match, as the timestamp in raw commit buffer
      does not have the name of the day-of-the-week.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      80235ba7
  14. 13 1月, 2010 1 次提交
  15. 17 11月, 2009 1 次提交
  16. 08 9月, 2009 1 次提交
  17. 23 7月, 2009 1 次提交
    • M
      grep: Add --max-depth option. · a91f453f
      Michał Kiedrowicz 提交于
      It is useful to grep directories non-recursively, e.g. when one wants to
      look for all files in the toplevel directory, but not in any subdirectory,
      or in Documentation/, but not in Documentation/technical/.
      
      This patch adds support for --max-depth <depth> option to git-grep. If it is
      given, git-grep descends at most <depth> levels of directories below paths
      specified on the command line.
      
      Note that if path specified on command line contains wildcards, this option
      makes no sense, e.g.
      
          $ git grep -l --max-depth 0 GNU -- 'contrib/*'
      
      (note the quotes) will search all files in contrib/, even in
      subdirectories, because '*' matches all files.
      
      Documentation updates, bash-completion and simple test cases are also
      provided.
      Signed-off-by: NMichał Kiedrowicz <michal.kiedrowicz@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a91f453f
  18. 02 7月, 2009 4 次提交
  19. 09 5月, 2009 1 次提交
  20. 08 3月, 2009 3 次提交
    • R
      grep: add support for coloring with external greps · a94982ef
      René Scharfe 提交于
      Add the config variable color.grep.external, which can be used to
      switch on coloring of external greps.  To enable auto coloring with
      GNU grep, one needs to set color.grep.external to --color=always to
      defeat the pager started by git grep.  The value of the config
      variable will be passed to the external grep only if it would
      colorize internal grep's output, so automatic terminal detected
      works.  The default is to not pass any option, because the external
      grep command could be a program without color support.
      
      Also set the environment variables GREP_COLOR and GREP_COLORS to
      pass the configured color for matches to the external grep.  This
      works with GNU grep; other variables could be added as needed.
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a94982ef
    • R
      grep: color patterns in output · 7e8f59d5
      René Scharfe 提交于
      Coloring matches makes them easier to spot in the output.
      
      Add two options and two parameters: color.grep (to turn coloring on
      or off), color.grep.match (to set the color of matches), --color
      and --no-color (to turn coloring on or off, respectively).
      
      The output of external greps is not changed.
      
      This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and
      Thiago Alves.
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7e8f59d5
    • R
      grep: remove grep_opt argument from match_expr_eval() · d7eb527d
      René Scharfe 提交于
      The only use of the struct grep_opt argument of match_expr_eval()
      is to pass the option word_regexp to match_one_pattern().  By adding
      a pattern flag for it we can reduce the number of function arguments
      of these two functions, as a cleanup and preparation for adding more
      in the next patch.
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d7eb527d
  21. 10 1月, 2009 1 次提交
    • R
      grep: don't call regexec() for fixed strings · c822255c
      René Scharfe 提交于
      Add the new flag "fixed" to struct grep_pat and set it if the pattern
      is doesn't contain any regex control characters in addition to if the
      flag -F/--fixed-strings was specified.
      
      This gives a nice speed up on msysgit, where regexec() seems to be
      extra slow.  Before (best of five runs):
      
      	$ time git grep grep v1.6.1 >/dev/null
      
      	real    0m0.552s
      	user    0m0.000s
      	sys     0m0.000s
      
      	$ time git grep -F grep v1.6.1 >/dev/null
      
      	real    0m0.170s
      	user    0m0.000s
      	sys     0m0.015s
      
      With the patch:
      
      	$ time git grep grep v1.6.1 >/dev/null
      
      	real    0m0.173s
      	user    0m0.000s
      	sys     0m0.000s
      
      The difference is much smaller on Linux, but still measurable.
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c822255c
  22. 02 10月, 2008 1 次提交
  23. 05 9月, 2008 1 次提交
    • J
      log --author/--committer: really match only with name part · a4d7d2c6
      Junio C Hamano 提交于
      When we tried to find commits done by AUTHOR, the first implementation
      tried to pattern match a line with "^author .*AUTHOR", which later was
      enhanced to strip leading caret and look for "^author AUTHOR" when the
      search pattern was anchored at the left end (i.e. --author="^AUTHOR").
      
      This had a few problems:
      
       * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"),
         the regexp internally used "^author .*x" would never match anything;
      
       * To match at the end (e.g. "git log --author='google.com>$'"), the
         generated regexp has to also match the trailing timestamp part the
         commit header lines have.  Also, in order to determine if the '$' at
         the end means "match at the end of the line" or just a literal dollar
         sign (probably backslash-quoted), we would need to parse the regexp
         ourselves.
      
      An earlier alternative tried to make sure that a line matches "^author "
      (to limit by field name) and the user supplied pattern at the same time.
      While it solved the -F problem by introducing a special override for
      matching the "^author ", it did not solve the trailing timestamp nor tail
      match problem.  It also would have matched every commit if --author=author
      was asked for, not because the author's email part had this string, but
      because every commit header line that talks about the author begins with
      that field name, regardleses of who wrote it.
      
      Instead of piling more hacks on top of hacks, this rethinks the grep
      machinery that is used to look for strings in the commit header, and makes
      sure that (1) field name matches literally at the beginning of the line,
      followed by a SP, and (2) the user supplied pattern is matched against the
      remainder of the line, excluding the trailing timestamp data.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a4d7d2c6
  24. 28 9月, 2006 2 次提交
  25. 21 9月, 2006 2 次提交
    • J
      Update grep internal for grepping only in head/body · 480c1ca6
      Junio C Hamano 提交于
      This further updates the built-in grep engine so that we can say
      something like "this pattern should match only in head".  This
      can be used to simplify grepping in the log messages.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      480c1ca6
    • J
      builtin-grep: make pieces of it available as library. · 83b5d2f5
      Junio C Hamano 提交于
      This makes three functions and associated option structures from
      builtin-grep available from other parts of the system.
      
       * options to drive built-in grep engine is stored in struct
         grep_opt;
      
       * pattern strings and extended grep expressions are added to
         struct grep_opt with append_grep_pattern();
      
       * when finished calling append_grep_pattern(), call
         compile_grep_patterns() to prepare for execution;
      
       * call grep_buffer() to find matches in the in-core buffer.
      
      This also adds an internal option "status_only" to grep_opt,
      which suppresses any output from grep_buffer().  Callers of the
      function as library can use it to check if there is a match
      without producing any output.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      83b5d2f5