1. 03 2月, 2012 4 次提交
    • J
      grep: drop grep_buffer's "name" parameter · c876d6da
      Jeff King 提交于
      Before the grep_source interface existed, grep_buffer was
      used by two types of callers:
      
        1. Ones which pulled a file into a buffer, and then wanted
           to supply the file's name for the output (i.e.,
           git grep).
      
        2. Ones which really just wanted to grep a buffer (i.e.,
           git log --grep).
      
      Callers in set (1) should now be using grep_source. Callers
      in set (2) always pass NULL for the "name" parameter of
      grep_buffer. We can therefore get rid of this now-useless
      parameter.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c876d6da
    • J
      grep: refactor the concept of "grep source" into an object · e1327023
      Jeff King 提交于
      The main interface to the low-level grep code is
      grep_buffer, which takes a pointer to a buffer and a size.
      This is convenient and flexible (we use it to grep commit
      bodies, files on disk, and blobs by sha1), but it makes it
      hard to pass extra information about what we are grepping
      (either for correctness, like overriding binary
      auto-detection, or for optimizations, like lazily loading
      blob contents).
      
      Instead, let's encapsulate the idea of a "grep source",
      including the buffer, its size, and where the data is coming
      from. This is similar to the diff_filespec structure used by
      the diff code (unsurprising, since future patches will
      implement some of the same optimizations found there).
      
      The diffstat is slightly scarier than the actual patch
      content. Most of the modified lines are simply replacing
      access to raw variables with their counterparts that are now
      in a "struct grep_source". Most of the added lines were
      taken from builtin/grep.c, which partially abstracted the
      idea of grep sources (for file vs sha1 sources).
      
      Instead of dropping the now-redundant code, this patch
      leaves builtin/grep.c using the traditional grep_buffer
      interface (which now wraps the grep_source interface). That
      makes it easy to test that there is no change of behavior
      (yet).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e1327023
    • J
      grep: move sha1-reading mutex into low-level code · b3aeb285
      Jeff King 提交于
      The multi-threaded git-grep code needs to serialize access
      to the thread-unsafe read_sha1_file call. It does this with
      a mutex that is local to builtin/grep.c.
      
      Let's instead push this down into grep.c, where it can be
      used by both builtin/grep.c and grep.c. This will let us
      safely teach the low-level grep.c code tricks that involve
      reading from the object db.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b3aeb285
    • J
      grep: make locking flag global · 78db6ea9
      Jeff King 提交于
      The low-level grep code traditionally didn't care about
      threading, as it doesn't do any threading itself and didn't
      call out to other non-thread-safe code.  That changed with
      0579f91d (grep: enable threading with -p and -W using lazy
      attribute lookup, 2011-12-12), which pushed the lookup of
      funcname attributes (which is not thread-safe) into the
      low-level grep code.
      
      As a result, the low-level code learned about a new global
      "grep_attr_mutex" to serialize access to the attribute code.
      A multi-threaded caller (e.g., builtin/grep.c) is expected
      to initialize the mutex and set "use_threads" in the
      grep_opt structure. The low-level code only uses the lock if
      use_threads is set.
      
      However, putting the use_threads flag into the grep_opt
      struct is not the most logical place. Whether threading is
      in use is not something that matters for each call to
      grep_buffer, but is instead global to the whole program
      (i.e., if any thread is doing multi-threaded grep, every
      other thread, even if it thinks it is doing its own
      single-threaded grep, would need to use the locking).  In
      practice, this distinction isn't a problem for us, because
      the only user of multi-threaded grep is "git-grep", which
      does nothing except call grep.
      
      This patch turns the opt->use_threads flag into a global
      flag. More important than the nit-picking semantic argument
      above is that this means that the locking functions don't
      need to actually have access to a grep_opt to know whether
      to lock. Which in turn can make adding new locks simpler, as
      we don't need to pass around a grep_opt.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      78db6ea9
  2. 17 12月, 2011 1 次提交
  3. 13 12月, 2011 1 次提交
  4. 21 8月, 2011 1 次提交
    • F
      Use kwset in grep · 9eceddee
      Fredrik Kuivinen 提交于
      Benchmarks for the hot cache case:
      
      before:
      $ perf stat --repeat=5 git grep qwerty > /dev/null
      
      Performance counter stats for 'git grep qwerty' (5 runs):
      
              3,478,085 cache-misses             #      2.322 M/sec   ( +-   2.690% )
             11,356,177 cache-references         #      7.582 M/sec   ( +-   2.598% )
              3,872,184 branch-misses            #      0.363 %       ( +-   0.258% )
          1,067,367,848 branches                 #    712.673 M/sec   ( +-   2.622% )
          3,828,370,782 instructions             #      0.947 IPC     ( +-   0.033% )
          4,043,832,831 cycles                   #   2700.037 M/sec   ( +-   0.167% )
                  8,518 page-faults              #      0.006 M/sec   ( +-   3.648% )
                    847 CPU-migrations           #      0.001 M/sec   ( +-   3.262% )
                  6,546 context-switches         #      0.004 M/sec   ( +-   2.292% )
            1497.695495 task-clock-msecs         #      3.303 CPUs    ( +-   2.550% )
      
             0.453394396  seconds time elapsed   ( +-   0.912% )
      
      after:
      $ perf stat --repeat=5 git grep qwerty > /dev/null
      
      Performance counter stats for 'git grep qwerty' (5 runs):
      
              2,989,918 cache-misses             #      3.166 M/sec   ( +-   5.013% )
             10,986,041 cache-references         #     11.633 M/sec   ( +-   4.899% )  (scaled from 95.06%)
              3,511,993 branch-misses            #      1.422 %       ( +-   0.785% )
            246,893,561 branches                 #    261.433 M/sec   ( +-   3.967% )
          1,392,727,757 instructions             #      0.564 IPC     ( +-   0.040% )
          2,468,142,397 cycles                   #   2613.494 M/sec   ( +-   0.110% )
                  7,747 page-faults              #      0.008 M/sec   ( +-   3.995% )
                    897 CPU-migrations           #      0.001 M/sec   ( +-   2.383% )
                  6,535 context-switches         #      0.007 M/sec   ( +-   1.993% )
             944.384228 task-clock-msecs         #      3.177 CPUs    ( +-   0.268% )
      
             0.297257643  seconds time elapsed   ( +-   0.450% )
      
      So we gain about 35% by using the kwset code.
      
      As a side effect of using kwset two grep tests are fixed by this
      patch. The first is fixed because kwset can deal with case-insensitive
      search containing NULs, something strcasestr cannot do. The second one
      is fixed because we consider patterns containing NULs as fixed strings
      (regcomp cannot accept patterns with NULs).
      Signed-off-by: NFredrik Kuivinen <frekui@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      9eceddee
  5. 20 8月, 2011 1 次提交
    • J
      color: delay auto-color decision until point of use · daa0c3d9
      Jeff King 提交于
      When we read a color value either from a config file or from
      the command line, we use git_config_colorbool to convert it
      from the tristate always/never/auto into a single yes/no
      boolean value.
      
      This has some timing implications with respect to starting
      a pager.
      
      If we start (or decide not to start) the pager before
      checking the colorbool, everything is fine. Either isatty(1)
      will give us the right information, or we will properly
      check for pager_in_use().
      
      However, if we decide to start a pager after we have checked
      the colorbool, things are not so simple. If stdout is a tty,
      then we will have already decided to use color. However, the
      user may also have configured color.pager not to use color
      with the pager. In this case, we need to actually turn off
      color. Unfortunately, the pager code has no idea which color
      variables were turned on (and there are many of them
      throughout the code, and they may even have been manipulated
      after the colorbool selection by something like "--color" on
      the command line).
      
      This bug can be seen any time a pager is started after
      config and command line options are checked. This has
      affected "git diff" since 89d07f75 (diff: don't run pager if
      user asked for a diff style exit code, 2007-08-12). It has
      also affect the log family since 1fda91b5 (Fix 'git log'
      early pager startup error case, 2010-08-24).
      
      This patch splits the notion of parsing a colorbool and
      actually checking the configuration. The "use_color"
      variables now have an additional possible value,
      GIT_COLOR_AUTO. Users of the variable should use the new
      "want_color()" wrapper, which will lazily determine and
      cache the auto-color decision.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      daa0c3d9
  6. 02 8月, 2011 1 次提交
  7. 06 6月, 2011 3 次提交
    • R
      grep: add --heading · 1d84f72e
      René Scharfe 提交于
      With --heading, the filename is printed once before matches from that
      file instead of at the start of each line, giving more screen space to
      the actual search results.
      
      This option is taken from ack (http://betterthangrep.com/).  And now
      git grep can dress up like it:
      
      	$ git config alias.ack "grep --break --heading --line-number"
      
      	$ git ack -e --heading
      	Documentation/git-grep.txt
      	154:--heading::
      
      	t/t7810-grep.sh
      	785:test_expect_success 'grep --heading' '
      	786:    git grep --heading -e char -e lo_w hello.c hello_world >actual &&
      	808:    git grep --break --heading -n --color \
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1d84f72e
    • R
      grep: add --break · a8f0e764
      René Scharfe 提交于
      With --break, an empty line is printed between matches from different
      files, increasing readability.  This option is taken from ack
      (http://betterthangrep.com/).
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a8f0e764
    • R
      grep: fix coloring of hunk marks between files · 08303c36
      René Scharfe 提交于
      Commit 431d6e7b (grep: enable threading for context line printing)
      split the printing of the "--\n" mark between results from different
      files out into two places: show_line() in grep.c for the non-threaded
      case and work_done() in builtin/grep.c for the threaded case.  Commit
      55f638bd (grep: Colorize filename, line number, and separator) updated
      the former, but not the latter, so the separators between files are
      not colored if threads are used.
      
      This patch merges the two.  In the threaded case, hunk marks are now
      printed by show_line() for every file, including the first one, and the
      very first mark is simply skipped in work_done().  This ensures that the
      output is properly colored and works just as well.
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      08303c36
  8. 10 5月, 2011 3 次提交
  9. 05 5月, 2011 1 次提交
  10. 13 9月, 2010 2 次提交
    • J
      log --author: take union of multiple "author" requests · 5aaeb733
      Junio C Hamano 提交于
      In the olden days,
      
          log --author=me --committer=him --grep=this --grep=that
      
      used to be turned into:
      
          (OR (HEADER-AUTHOR me)
              (HEADER-COMMITTER him)
              (PATTERN this)
              (PATTERN that))
      
      showing my patches that do not have any "this" nor "that", which was
      totally useless.
      
      80235ba7 ("log --author=me --grep=it" should find intersection, not union,
      2010-01-17) improved it greatly to turn the same into:
      
          (ALL-MATCH
            (HEADER-AUTHOR me)
            (HEADER-COMMITTER him)
            (OR (PATTERN this) (PATTERN that)))
      
      That is, "show only patches by me and committed by him, that have either
      this or that", which is a lot more natural thing to ask.
      
      We however need to be a bit more clever when the user asks more than one
      "author" (or "committer"); because a commit has only one author (and one
      committer), they ought to be interpreted as asking for union to be useful.
      The current implementation simply added another author/committer pattern
      at the same top-level for ALL-MATCH to insist on matching all, finding
      nothing.
      
      Turn
      
          log --author=me --author=her \
          	--committer=him --committer=you \
      	--grep=this --grep=that
      
      into
      
          (ALL-MATCH
            (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her))
            (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you))
            (OR (PATTERN this) (PATTERN that)))
      
      instead.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5aaeb733
    • J
      grep: move logic to compile header pattern into a separate helper · 95ce9ce2
      Junio C Hamano 提交于
      The callers should be queuing only GREP_PATTERN_HEAD elements to the
      header_list queue; simplify the switch and guard it with an assert.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      95ce9ce2
  11. 25 5月, 2010 7 次提交
  12. 16 3月, 2010 1 次提交
    • R
      grep: enable threading for context line printing · 431d6e7b
      René Scharfe 提交于
      If context lines are to be printed, grep separates them with hunk marks
      ("--\n").  These marks are printed between matches from different files,
      too.  They are not printed before the first file, though.
      
      Threading was disabled when context line printing was enabled because
      avoiding to print the mark before the first line was an unsolved
      synchronisation problem.  This patch separates the code for printing
      hunk marks for the threaded and the unthreaded case, allowing threading
      to be turned on together with the common -ABC options.
      
      ->show_hunk_mark, which controls printing of hunk marks between files in
      show_line(), is now set in grep_buffer_1(), but only if some results
      have already been printed and threading is disabled.  The threaded case
      is handled in work_done().
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      431d6e7b
  13. 08 3月, 2010 2 次提交
    • M
      grep: Colorize selected, context, and function lines · 00588bb5
      Mark Lodato 提交于
      Colorize non-matching text of selected lines, context lines, and
      function name lines.  The default for all three is no color, but they
      can be configured using color.grep.<slot>.  The first two are similar
      to the corresponding options in GNU grep, except that GNU grep applies
      the color to the entire line, not just non-matching text.
      Signed-off-by: NMark Lodato <lodatom@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      00588bb5
    • M
      grep: Colorize filename, line number, and separator · 55f638bd
      Mark Lodato 提交于
      Colorize the filename, line number, and separator in git grep output, as
      GNU grep does.  The colors are customizable through color.grep.<slot>.
      The default is to only color the separator (in cyan), since this gives
      the biggest legibility increase without overwhelming the user with
      colors.  GNU grep also defaults cyan for the separator, but defaults to
      magenta for the filename and to green for the line number, as well.
      
      There is one difference from GNU grep: When a binary file matches
      without -a, GNU grep does not color the <file> in "Binary file <file>
      matches", but we do.
      
      Like GNU grep, if --null is given, the null separators are not colored.
      
      For config.txt, use a a sub-list to describe the slots, rather than
      a single paragraph with parentheses, since this is much more readable.
      
      Remove the cast to int for `rm_eo - rm_so` since it is not necessary.
      Signed-off-by: NMark Lodato <lodatom@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      55f638bd
  14. 04 2月, 2010 1 次提交
  15. 27 1月, 2010 2 次提交
    • B
      grep: use REG_STARTEND (if available) to speed up regexec · 24072c02
      Benjamin Kramer 提交于
      BSD and glibc have an extension to regexec which takes a buffer + length pair
      instead of a NUL-terminated string. Since we already have the length computed
      this can save us a strlen call inside regexec.
      Signed-off-by: NBenjamin Kramer <benny.kra@googlemail.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      24072c02
    • F
      Threaded grep · 5b594f45
      Fredrik Kuivinen 提交于
      Make git grep use threads when it is available.
      
      The results below are best of five runs in the Linux repository (on a
      box with two cores).
      
      With the patch:
      
      git grep qwerty
      1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+5774minor)pagefaults 0swaps
      
      Without:
      
      git grep qwerty
      1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+3716minor)pagefaults 0swaps
      
      And with a pattern with quite a few matches:
      
      With the patch:
      
      $ /usr/bin/time git grep void
      5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+5587minor)pagefaults 0swaps
      
      Without:
      
      $ /usr/bin/time git grep void
      5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+800outputs (0major+3693minor)pagefaults 0swaps
      
      In either case we gain about 40% by the threading.
      Signed-off-by: NFredrik Kuivinen <frekui@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5b594f45
  16. 26 1月, 2010 1 次提交
    • J
      "log --author=me --grep=it" should find intersection, not union · 80235ba7
      Junio C Hamano 提交于
      Historically, any grep filter in "git log" family of commands were taken
      as restricting to commits with any of the words in the commit log message.
      However, the user almost always want to find commits "done by this person
      on that topic".  With "--all-match" option, a series of grep patterns can
      be turned into a requirement that all of them must produce a match, but
      that makes it impossible to ask for "done by me, on either this or that"
      with:
      
      	log --author=me --committer=him --grep=this --grep=that
      
      because it will require both "this" and "that" to appear.
      
      Change the "header" parser of grep library to treat the headers specially,
      and parse it as:
      
      	(all-match-OR (HEADER-AUTHOR me)
      		      (HEADER-COMMITTER him)
      		      (OR
      		      	(PATTERN this)
      			(PATTERN that) ) )
      
      Even though the "log" command line parser doesn't give direct access to
      the extended grep syntax to group terms with parentheses, this change will
      cover the majority of the case the users would want.
      
      This incidentally revealed that one test in t7002 was bogus.  It ran:
      
      	log --author=Thor --grep=Thu --format='%s'
      
      and expected (wrongly) "Thu" to match "Thursday" in the author/committer
      date, but that would never match, as the timestamp in raw commit buffer
      does not have the name of the day-of-the-week.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      80235ba7
  17. 13 1月, 2010 1 次提交
    • J
      grep: rip out pessimization to use fixmatch() · 885d211e
      Junio C Hamano 提交于
      Even when running without the -F (--fixed-strings) option, we checked the
      pattern and used fixmatch() codepath when it does not contain any regex
      magic.  Finding fixed strings with strstr() surely must be faster than
      running the regular expression crud.
      
      Not so.  It turns out that on some libc implementations, using the
      regcomp()/regexec() pair is a lot faster than running strstr() and
      strcasestr() the fixmatch() codepath uses.  Drop the optimization and use
      the fixmatch() codepath only when the user explicitly asked for it with
      the -F option.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      885d211e
  18. 12 1月, 2010 1 次提交
    • J
      grep: optimize built-in grep by skipping lines that do not hit · a26345b6
      Junio C Hamano 提交于
      The internal "grep" engine we use checks for hits line-by-line, instead of
      letting the underlying regexec()/fixmatch() routines scan for the first
      match from the rest of the buffer.  This was a major source of overhead
      compared to the external grep.
      
      Introduce a "look-ahead" mechanism to find the next line that would
      potentially match by using regexec()/fixmatch() in the remainder of the
      text to skip unmatching lines, and use it when the query criteria is
      simple enough (i.e. punt for an advanced grep boolean expression like
      "lines that have both X and Y but not Z" for now) and we are not running
      under "-v" (aka "--invert-match") option.
      
      Note that "-L" (aka "--files-without-match") is not a reason to disable
      this optimization.  Under the option, we are interested if the file has
      any hit at all, and that is what we determine reliably with or without the
      optimization.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a26345b6
  19. 17 11月, 2009 1 次提交
  20. 03 7月, 2009 1 次提交
  21. 02 7月, 2009 4 次提交