1. 09 3月, 2010 1 次提交
    • J
      revision: introduce setup_revision_opt · 32962c9b
      Junio C Hamano 提交于
      So far the last parameter to setup_revisions() was to specify the default
      ref when the command line did not give any (typically "HEAD").  This changes
      it to take a pointer to a structure so that we can add other information without
      touching too many codepaths in later patches.
      
      There is no functionality change.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      32962c9b
  2. 25 1月, 2010 1 次提交
  3. 24 1月, 2010 1 次提交
    • J
      Make ce_uptodate() trustworthy again · 125fd984
      Junio C Hamano 提交于
      The rule has always been that a cache entry that is ce_uptodate(ce)
      means that we already have checked the work tree entity and we know
      there is no change in the work tree compared to the index, and nobody
      should have to double check.  Note that false ce_uptodate(ce) does not
      mean it is known to be dirty---it only means we don't know if it is
      clean.
      
      There are a few codepaths (refresh-index and preload-index are among
      them) that mark a cache entry as up-to-date based solely on the return
      value from ie_match_stat(); this function uses lstat() to see if the
      work tree entity has been touched, and for a submodule entry, if its
      HEAD points at the same commit as the commit recorded in the index of
      the superproject (a submodule that is not even cloned is considered
      clean).
      
      A submodule is no longer considered unmodified merely because its HEAD
      matches the index of the superproject these days, in order to prevent
      people from forgetting to commit in the submodule and updating the
      superproject index with the new submodule commit, before commiting the
      state in the superproject.  However, the patch to do so didn't update
      the codepath that marks cache entries up-to-date based on the updated
      definition and instead worked it around by saying "we don't trust the
      return value of ce_uptodate() for submodules."
      
      This makes ce_uptodate() trustworthy again by not marking submodule
      entries up-to-date.
      
      The next step _could_ be to introduce a few "in-core" flag bits to
      cache_entry structure to record "this entry is _known_ to be dirty",
      call is_submodule_modified() from ie_match_stat(), and use these new
      bits to avoid running this rather expensive check more than once, but
      that can be a separate patch.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      125fd984
  4. 19 1月, 2010 1 次提交
    • J
      Performance optimization for detection of modified submodules · e3d42c47
      Jens Lehmann 提交于
      In the worst case is_submodule_modified() got called three times for
      each submodule. The information we got from scanning the whole
      submodule tree the first time can be reused instead.
      
      New parameters have been added to diff_change() and diff_addremove(),
      the information is stored in a new member of struct diff_filespec. Its
      value is then reused instead of calling is_submodule_modified() again.
      
      When no explicit "-dirty" is needed in the output the call to
      is_submodule_modified() is not necessary when the submodules HEAD
      already disagrees with the ref of the superproject, as this alone
      marks it as modified. To achieve that, get_stat_data() got an extra
      argument.
      Signed-off-by: NJens Lehmann <Jens.Lehmann@web.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e3d42c47
  5. 17 1月, 2010 1 次提交
  6. 08 1月, 2010 2 次提交
    • J
      unpack-trees.c: look ahead in the index · 730f7284
      Junio C Hamano 提交于
      This makes the traversal of index be in sync with the tree traversal.
      When unpack_callback() is fed a set of tree entries from trees, it
      inspects the name of the entry and checks if the an index entry with
      the same name could be hiding behind the current index entry, and
      
       (1) if the name appears in the index as a leaf node, it is also
           fed to the n_way_merge() callback function;
      
       (2) if the name is a directory in the index, i.e. there are entries in
           that are underneath it, then nothing is fed to the n_way_merge()
           callback function;
      
       (3) otherwise, if the name comes before the first eligible entry in the
           index, the index entry is first unpacked alone.
      
      When traverse_trees_recursive() descends into a subdirectory, the
      cache_bottom pointer is moved to walk index entries within that directory.
      
      All of these are omitted for diff-index, which does not even want to be
      fed an index entry and a tree entry with D/F conflicts.
      
      This fixes 3-way read-tree and exposes a bug in other parts of the system
      in t6035, test #5.  The test prepares these three trees:
      
       O = HEAD^
          100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a/b-2/c/d
          100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a/b/c/d
          100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a/x
      
       A = HEAD
          100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a/b-2/c/d
          100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a/b/c/d
          100644 blob 587be6b4c3f93f93c489c0111bba5596147a26cb    a/x
      
       B = master
          120000 blob a36b77384451ea1de7bd340ffca868249626bc52    a/b
          100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a/b-2/c/d
          100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a/x
      
      With a clean index that matches HEAD, running
      
          git read-tree -m -u --aggressive $O $A $B
      
      now yields
      
          120000 a36b77384451ea1de7bd340ffca868249626bc52 3       a/b
          100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       a/b-2/c/d
          100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 1       a/b/c/d
          100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 2       a/b/c/d
          100644 587be6b4c3f93f93c489c0111bba5596147a26cb 0       a/x
      
      which is correct.  "master" created "a/b" symlink that did not exist,
      and removed "a/b/c/d" while HEAD did not do touch either path.
      
      Before this series, read-tree did not notice the situation and resolved
      addition of "a/b" and removal of "a/b/c/d" independently.  If A = HEAD had
      another path "a/b/c/e" added, this merge should conflict but instead it
      silently resolved "a/b" and then immediately overwrote it to add
      "a/b/c/e", which was quite bogus.
      
      Tests in t1012 start to work with this.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      730f7284
    • J
      unpack-trees.c: prepare for looking ahead in the index · da165f47
      Junio C Hamano 提交于
      This prepares but does not yet implement a look-ahead in the index entries
      when traverse-trees.c decides to give us tree entries in an order that
      does not match what is in the index.
      
      A case where a look-ahead in the index is necessary happens when merging
      branch B into branch A while the index matches the current branch A, using
      a tree O as their common ancestor, and these three trees looks like this:
      
         O        A       B
         t                t
         t-i      t-i     t-i
         t-j      t-j
                  t/1
                  t/2
      
      The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and
      B first, and notices that A may have a matching "t" behind "t-i" and "t-j"
      (indeed it does), and tells A to give that entry instead.  After unpacking
      blob "t" from tree B (as it hasn't changed since O in B and A removed it,
      it will result in its removal), it descends into directory "t/".
      
      The side that walked index in parallel to the tree traversal used to be
      implemented with one pointer, o->pos, that points at the next index entry
      to be processed.  When this happens, the pointer o->pos still points at
      "t-i" that is the first entry.  We should be able to skip "t-i" and "t-j"
      and locate "t/1" from the index while the recursive invocation of
      traverse_trees() walks and match entries found there, and later come back
      to process "t-i".
      
      While that look-ahead is not implemented yet, this adds a flag bit,
      CE_UNPACKED, to mark the entries in the index that has already been
      processed.  o->pos pointer has been renamed to o->cache_bottom and it
      points at the first entry that may still need to be processed.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      da165f47
  7. 12 10月, 2009 1 次提交
    • J
      diff-lib.c: fix misleading comments on oneway_diff() · da8ba5e7
      Junio C Hamano 提交于
      20a16eb3 (unpack_trees(): fix diff-index regression., 2008-03-10) adjusted
      diff-index to the new world order since 34110cd4 (Make 'unpack_trees()'
      have a separate source and destination index, 2008-03-06).  Callbacks are
      expected to return anything non-negative as "success", and instead of
      reporting how many index entries they have processed, they are expected to
      advance o->pos themselves.  The code did so, but a stale comment was left
      behind.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      da8ba5e7
  8. 24 8月, 2009 1 次提交
  9. 12 8月, 2009 1 次提交
  10. 05 8月, 2009 2 次提交
    • J
      diff-index: keep the original index intact · 26da1d78
      Junio C Hamano 提交于
      When comparing the index and a tree, we used to read the contents of the
      tree into stage #1 of the index and compared them with stage #0.  In order
      not to lose sight of entries originally unmerged in the index, we hoisted
      them to stage #3 before reading the tree.
      
      Commit d1f2d7e8 (Make run_diff_index() use unpack_trees(), not read_tree(),
      2008-01-19) changed all this.  These days, we instead use unpack_trees()
      API to traverse the tree and compare the contents with the index, without
      modifying the index at all.  There is no reason to hoist the unmerged
      entries to stage #3 anymore.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      26da1d78
    • J
      diff-index: report unmerged new entries · 29796c6c
      Junio C Hamano 提交于
      Since an earlier change to diff-index by d1f2d7e8 (Make run_diff_index()
      use unpack_trees(), not read_tree(), 2008-01-19), we stopped reporting an
      unmerged path that does not exist in the tree, but we should.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      29796c6c
  11. 30 7月, 2009 1 次提交
    • J
      diff: Rename QUIET internal option to QUICK · 90b19941
      Junio C Hamano 提交于
      The option "QUIET" primarily meant "find if we have _any_ difference as
      quick as possible and report", which means we often do not even have to
      look at blobs if we know the trees are different by looking at the higher
      level (e.g. "diff-tree A B").  As a side effect, because there is no point
      showing one change that we happened to have found first, it also enables
      NO_OUTPUT and EXIT_WITH_STATUS options, making the end result look quiet.
      
      Rename the internal option to QUICK to reflect this better; it also makes
      grepping the source tree much easier, as there are other kinds of QUIET
      option everywhere.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      90b19941
  12. 26 5月, 2009 2 次提交
    • J
      Avoid "diff-index --cached" optimization under --find-copies-harder · a0919ced
      Junio C Hamano 提交于
      When find-copies-harder is in effect, the diff frontends are expected to
      feed all paths, not just changed paths, to the diffcore, so that copy
      sources can be picked up.  In such a case, not descending into subtrees
      using the cache-tree information is simply wrong.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a0919ced
    • J
      Optimize "diff-index --cached" using cache-tree · b65982b6
      Junio C Hamano 提交于
      When running "diff-index --cached" after making a change to only a small
      portion of the index, there is no point unpacking unchanged subtrees into
      the index recursively, only to find that all entries match anyway.  Tweak
      unpack_trees() logic that is used to read in the tree object to catch the
      case where the tree entry we are looking at matches the index as a whole
      by looking at the cache-tree.
      
      As an exercise, after modifying a few paths in the kernel tree, here are
      a few numbers on my Athlon 64X2 3800+:
      
          (without patch, hot cache)
          $ /usr/bin/time git diff --cached --raw
          :100644 100644 b57e1f5... e69de29... M  Makefile
          :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
          :000000 100644 0000000... e69de29... A  arche
          0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
          0inputs+0outputs (0major+9407minor)pagefaults 0swaps
      
          (with patch, hot cache)
          $ /usr/bin/time ../git.git/git-diff --cached --raw
          :100644 100644 b57e1f5... e69de29... M  Makefile
          :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
          :000000 100644 0000000... e69de29... A  arche
          0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
          0inputs+0outputs (0major+2446minor)pagefaults 0swaps
      
      Cold cache numbers are very impressive, but it does not matter very much
      in practice:
      
          (without patch, cold cache)
          $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
          $ /usr/bin/time git diff --cached --raw
          :100644 100644 b57e1f5... e69de29... M  Makefile
          :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
          :000000 100644 0000000... e69de29... A  arche
          0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
          247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps
      
          (with patch, cold cache)
          $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
          $ /usr/bin/time ../git.git/git-diff --cached --raw
          :100644 100644 b57e1f5... e69de29... M  Makefile
          :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
          :000000 100644 0000000... e69de29... A  arche
          0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
          18440inputs+0outputs (79major+2369minor)pagefaults 0swaps
      
      This of course helps "git status" as well.
      
          (without patch, hot cache)
          $ /usr/bin/time ../git.git/git-status >/dev/null
          0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
          0inputs+5336outputs (0major+10970minor)pagefaults 0swaps
      
          (with patch, hot cache)
          $ /usr/bin/time ../git.git/git-status >/dev/null
          0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
          0inputs+5336outputs (0major+3921minor)pagefaults 0swaps
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b65982b6
  13. 10 5月, 2009 1 次提交
    • L
      Avoid unnecessary 'lstat()' calls in 'get_stat_data()' · 658dd48c
      Linus Torvalds 提交于
      When we ask get_stat_data() to get the mode and size of an index entry,
      we can avoid the lstat() call if we have marked the index entry as being
      uptodate due to earlier lstat() calls.
      
      This avoids a lot of unnecessary lstat() calls in eg 'git checkout',
      where the last phase shows the differences to the working tree
      (requiring a diff), but earlier phases have already verified the index.
      
      On the kernel repo (with a fast machine and everything cached), this
      changes timings of a nul 'git checkout' from
      
       - Before (best of ten):
      
      	0.14user 0.05system 0:00.19elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
      	0inputs+0outputs (0major+13237minor)pagefaults 0swaps
      
       - After
      	0.11user 0.03system 0:00.15elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
      	0inputs+0outputs (0major+13235minor)pagefaults 0swaps
      
      so it can obviously be noticeable, although equally obviously it's not a
      show-stopper on this particular machine. The difference is likely larger
      on slower machines, or with operating systems that don't do as good a job
      of name caching.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      658dd48c
  14. 11 2月, 2009 1 次提交
    • S
      Generalize and libify index_is_dirty() to index_differs_from(...) · 75f3ff2e
      Stephan Beyer 提交于
      index_is_dirty() in builtin-revert.c checks if the index is dirty.
      This patch generalizes this function to check if the index differs
      from a revision, i.e. the former index_is_dirty() behavior can now be
      achieved by index_differs_from("HEAD", 0).
      
      The second argument "diff_flags" allows to set further diff option
      flags like DIFF_OPT_IGNORE_SUBMODULES. See DIFF_OPT_* macros in diff.h
      for a list.
      
      index_differs_from() seems to be useful for more than builtin-revert.c,
      so it is moved into diff-lib.c and also used in builtin-commit.c.
      
      Yet to mention:
      
       - "rev.abbrev = 0;" can be safely removed.
         This has no impact on performance or functioning of neither
         setup_revisions() nor run_diff_index().
      
       - rev.pending.objects is free()d because this fixes a leak.
         (Also see 295dd2ad "Fix memory leak in traverse_commit_list")
      Mentored-by: NDaniel Barkalow <barkalow@iabervon.org>
      Mentored-by: NChristian Couder <chriscool@tuxfamily.org>
      Signed-off-by: NStephan Beyer <s-beyer@gmx.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      75f3ff2e
  15. 10 2月, 2009 1 次提交
  16. 12 1月, 2009 1 次提交
    • K
      Cleanup of unused symcache variable inside diff-lib.c · ff7e6aad
      Kjetil Barvik 提交于
      Commit c40641b7, 'Optimize
      symlink/directory detection' by Linus Torvalds, removed the 'char
      *symcache' parameter to the has_symlink_leading_path() function.  This
      made all variables currently named 'symcache' inside diff-lib.c
      unnecessary.
      
      This also let us throw away the 'struct oneway_unpack_data', and
      instead directly use the 'struct rev_info *revs' member, which
      was the only member left after removal of the 'symcache[] array'
      member.  The 'struct oneway_unpack_data' was introduced by the
      following commit:
      
        948dd346  "diff-files: careful when inspecting work tree items"
      
      Impact: cleanup
              PATH_MAX bytes less memory stack usage in some cases
      Signed-off-by: NKjetil Barvik <barvik@broadpark.no>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ff7e6aad
  17. 31 8月, 2008 1 次提交
    • J
      diff: vary default prefix depending on what are compared · a5a818ee
      Junio C Hamano 提交于
      With a new configuration "diff.mnemonicprefix", "git diff" shows the
      differences between various combinations of preimage and postimage trees
      with prefixes different from the standard "a/" and "b/".  Hopefully this
      will make the distinction stand out for some people.
      
          "git diff" compares the (i)ndex and the (w)ork tree;
          "git diff HEAD" compares a (c)ommit and the (w)ork tree;
          "git diff --cached" compares a (c)ommit and the (i)ndex;
          "git-diff HEAD:file1 file2" compares an (o)bject and a (w)ork tree entity;
          "git diff --no-index a b" compares two non-git things (1) and (2).
      
      Because these mnemonics now have meanings, they are swapped when reverse
      diff is in effect and this feature is enabled.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      a5a818ee
  18. 17 7月, 2008 1 次提交
  19. 24 5月, 2008 1 次提交
    • J
      "git diff": do not ignore index without --no-index · 0569e9b8
      Junio C Hamano 提交于
      Even if "foo" and/or "bar" does not exist in index, "git diff foo bar"
      should not change behaviour drastically from "git diff foo bar baz" or
      "git diff foo".  A feature that "sometimes works and is handy" is an
      unreliable cute hack.
      
      "git diff foo bar" outside a git repository continues to work as a more
      colourful alternative to "diff -u" as before.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      0569e9b8
  20. 11 5月, 2008 1 次提交
    • L
      Optimize symlink/directory detection · c40641b7
      Linus Torvalds 提交于
      This is the base for making symlink detection in the middle fo a pathname
      saner and (much) more efficient.
      
      Under various loads, we want to verify that the full path leading up to a
      filename is a real directory tree, and that when we successfully do an
      'lstat()' on a filename, we don't get a false positive due to a symlink in
      the middle of the path that git should have seen as a symlink, not as a
      normal path component.
      
      The 'has_symlink_leading_path()' function already did this, and cached
      a single level of symlink information, but didn't cache the _lack_ of a
      symlink, so the normal behaviour was actually the wrong way around, and we
      ended up doing an 'lstat()' on each path component to check that it was a
      real directory.
      
      This caches the last detected full directory and symlink entries, and
      speeds up especially deep directory structures a lot by avoiding to
      lstat() all the directories leading up to each entry in the index.
      
      [ This can - and should - probably be extended upon so that we eventually
        never do a bare 'lstat()' on any path entries at *all* when checking the
        index, but always check the full path carefully. Right now we do not
        generally check the whole path for all our normal quick index
        revalidation.
      
        We should also make sure that we're careful about all the invalidation,
        ie when we remove a link and replace it by a directory we should
        invalidate the symlink cache if it matches (and vice versa for the
        directory cache).
      
        But regardless, the basic function needs to be sane to do that. The old
        'has_symlink_leading_path()' was not capable enough - or indeed the code
        readable enough - to really do that sanely. So I'm pushing this as not
        just an optimization, but as a base for further work. ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c40641b7
  21. 05 5月, 2008 2 次提交
    • J
      diff-lib.c: rename check_work_tree_entity() · 451244d7
      Junio C Hamano 提交于
      The function is about checking for removed work tree item, so name it
      accordingly to avoid future confusion.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      451244d7
    • J
      diff: a submodule not checked out is not modified · 1392a377
      Junio C Hamano 提交于
      948dd346 (diff-index: careful when inspecting work tree items, 2008-03-30)
      made the work tree check careful not to be fooled by a new directory that
      exists at a place the index expects a blob.  For such a change to be a
      typechange from blob to submodule, the new directory has to be a
      repository.
      
      However, if the index expects a submodule there, we should not insist the
      work tree entity to be a repository --- a simple directory that is not a
      full fledged repository (even an empty directory would do) should be
      considered an unmodified subproject, because that is how a superproject
      with a submodule is checked out sparsely by default.
      
      This makes the function check_work_tree_entity() even more careful not to
      report a submodule that is not checked out as removed.  It fixes the
      recently added test in t4027.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1392a377
  22. 30 4月, 2008 1 次提交
    • M
      git-svn: detect and fail gracefully when dcommitting to a void · 59b0c24d
      Matthieu Moy 提交于
      The command
      
        git svn clone (URL of an empty SVN repo here)
      
      works, creates an empty git repository. I can perform the initial
      commit there, but then, "git svn dcommit" says :
      
      Use of uninitialized value in concatenation (.) or string at .../git-svn line 414.
      Committing to  ...
      Unable to determine upstream SVN information from HEAD history
      
      I guess a correct management of the initial commit in git-svn would be
      hard to implement, but at least, the error message can be improved.
      First step is something like the patch below, and better would be for
      "git svn clone" to warn that it won't be able to do much with the
      cloned repo.
      Acked-by: NEric Wong <normalperson@yhbt.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      59b0c24d
  23. 13 4月, 2008 1 次提交
  24. 31 3月, 2008 2 次提交
    • J
      diff-files: careful when inspecting work tree items · f58dbf23
      Junio C Hamano 提交于
      This fixes the same breakage in diff-files.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      f58dbf23
    • J
      diff-index: careful when inspecting work tree items · 948dd346
      Junio C Hamano 提交于
      Earlier, if you changed a staged path into a directory in the work tree,
      we happily ran lstat(2) on it and found that it exists, and declared that
      the user changed it to a gitlink.
      
      This is wrong for two reasons:
      
       (1) It may be a directory, but it may not be a submodule, and in the
           latter case, the change we need to report is "the blob at the path
           has disappeared".  We need to check with resolve_gitlink_ref() to be
           consistent with what "git add" and "git update-index --add" does.
      
       (2) lstat(2) may have succeeded only because a leading component of the
           path was turned into a symbolic link that points at something that
           exists in the work tree.  In such a case, the path itself does not
           exist anymore, as far as the index is concerned.
      
      This fixes these breakages in diff-index that the previous patch has
      exposed.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      948dd346
  25. 11 3月, 2008 1 次提交
  26. 09 3月, 2008 2 次提交
  27. 02 3月, 2008 1 次提交
    • J
      diff-lib.c: constness strengthening · c8c16f28
      Junio C Hamano 提交于
      The internal implementation of diff-index codepath used to use non const
      pointer to pass sha1 around, but it did not have to.  With this, we can
      also lose the private no_sha1[] array, as we can use the public null_sha1[]
      array that exists exactly for the same purpose.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c8c16f28
  28. 10 2月, 2008 1 次提交
  29. 22 1月, 2008 3 次提交
    • J
      Also use unpack_trees() in do_diff_cache() · 204ce979
      Johannes Schindelin 提交于
      As in run_diff_index(), we call unpack_trees() with the oneway_diff()
      function in do_diff_cache() now.  This makes the function diff_cache()
      obsolete.
      Signed-off-by: NJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      204ce979
    • L
      Make run_diff_index() use unpack_trees(), not read_tree() · d1f2d7e8
      Linus Torvalds 提交于
      A plain "git commit" would still run lstat() a lot more than necessary,
      because wt_status_print() would cause the index to be repeatedly flushed
      and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit
      to be lost, resulting in the files in the index being lstat'ed three
      times each.
      
      The reason why wt-status.c ended up invalidating and re-reading the
      cache multiple times was that it uses "run_diff_index()", which in turn
      uses "read_tree()" to populate the index with *both* the old index and
      the tree we want to compare against.
      
      So this patch re-writes run_diff_index() to not use read_tree(), but
      instead use "unpack_trees()" to diff the index to a tree.  That, in
      turn, means that we don't need to modify the index itself, which then
      means that we don't need to invalidate it and re-read it!
      
      This, together with the lstat() optimizations, means that "git commit"
      on the kernel tree really only needs to lstat() the index entries once.
      That noticeably cuts down on the cached timings.
      
      Best time before:
      
      	[torvalds@woody linux]$ time git commit > /dev/null
      	real    0m0.399s
      	user    0m0.232s
      	sys     0m0.164s
      
      Best time after:
      
      	[torvalds@woody linux]$ time git commit > /dev/null
      	real    0m0.254s
      	user    0m0.140s
      	sys     0m0.112s
      
      so it's a noticeable improvement in addition to being a nice conceptual
      cleanup (it's really not that pretty that "run_diff_index()" dirties the
      index!)
      
      Doing an "strace -c" on it also shows that as it cuts the number of
      lstat() calls by two thirds, it goes from being lstat()-limited to being
      limited by getdents() (which is the readdir system call):
      
      Before:
      	% time     seconds  usecs/call     calls    errors syscall
      	------ ----------- ----------- --------- --------- ----------------
      	 60.69    0.000704           0     69230        31 lstat
      	 23.62    0.000274           0      5522           getdents
      	  8.36    0.000097           0      5508      2638 open
      	  2.59    0.000030           0      2869           close
      	  2.50    0.000029           0       274           write
      	  1.47    0.000017           0      2844           fstat
      
      After:
      	% time     seconds  usecs/call     calls    errors syscall
      	------ ----------- ----------- --------- --------- ----------------
      	 45.17    0.000276           0      5522           getdents
      	 26.51    0.000162           0     23112        31 lstat
      	 19.80    0.000121           0      5503      2638 open
      	  4.91    0.000030           0      2864           close
      	  1.48    0.000020           0       274           write
      	  1.34    0.000018           0      2844           fstat
      	...
      
      It passes the test-suite for me, but this is another of one of those
      really core functions, and certainly pretty subtle, so..
      
      NOTE! The Linux lstat() system call is really quite cheap when everything
      is cached, so the fact that this is quite noticeable on Linux is likely to
      mean that it is *much* more noticeable on other operating systems. I bet
      you'll see a much bigger performance improvement from this on Windows in
      particular.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1f2d7e8
    • L
      Make on-disk index representation separate from in-core one · 7a51ed66
      Linus Torvalds 提交于
      This converts the index explicitly on read and write to its on-disk
      format, allowing the in-core format to contain more flags, and be
      simpler.
      
      In particular, the in-core format is now host-endian (as opposed to the
      on-disk one that is network endian in order to be able to be shared
      across machines) and as a result we can dispense with all the
      htonl/ntohl on accesses to the cache_entry fields.
      
      This will make it easier to make use of various temporary flags that do
      not exist in the on-disk format.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a51ed66
  30. 27 11月, 2007 1 次提交
  31. 12 11月, 2007 1 次提交
  32. 10 11月, 2007 1 次提交
    • J
      git-add: make the entry stat-clean after re-adding the same contents · fb63d7f8
      Junio C Hamano 提交于
      Earlier in commit 0781b8a9
      (add_file_to_index: skip rehashing if the cached stat already
      matches), add_file_to_index() were taught not to re-add the path
      if it already matches the index.
      
      The change meant well, but was not executed quite right.  It
      used ie_modified() to see if the file on the work tree is really
      different from the index, and skipped adding the contents if the
      function says "not modified".
      
      This was wrong.  There are three possible comparison results
      between the index and the file in the work tree:
      
       - with lstat(2) we _know_ they are different.  E.g. if the
         length or the owner in the cached stat information is
         different from the length we just obtained from lstat(2), we
         can tell the file is modified without looking at the actual
         contents.
      
       - with lstat(2) we _know_ they are the same.  The same length,
         the same owner, the same everything (but this has a twist, as
         described below).
      
       - we cannot tell from lstat(2) information alone and need to go
         to the filesystem to actually compare.
      
      The last case arises from what we call 'racy git' situation,
      that can be caused with this sequence:
      
          $ echo hello >file
          $ git add file
          $ echo aeiou >file ;# the same length
      
      If the second "echo" is done within the same filesystem
      timestamp granularity as the first "echo", then the timestamp
      recorded by "git add" and the timestamp we get from lstat(2)
      will be the same, and we can mistakenly say the file is not
      modified.  The path is called 'racily clean'.  We need to
      reliably detect racily clean paths are in fact modified.
      
      To solve this problem, when we write out the index, we mark the
      index entry that has the same timestamp as the index file itself
      (that is the time from the point of view of the filesystem) to
      tell any later code that does the lstat(2) comparison not to
      trust the cached stat info, and ie_modified() then actually goes
      to the filesystem to compare the contents for such a path.
      
      That's all good, but it should not be used for this "git add"
      optimization, as the goal of "git add" is to actually update the
      path in the index and make it stat-clean.  With the false
      optimization, we did _not_ cause any data loss (after all, what
      we failed to do was only to update the cached stat information),
      but it made the following sequence leave the file stat dirty:
      
          $ echo hello >file
          $ git add file
          $ echo hello >file ;# the same contents
          $ git add file
      
      The solution is not to use ie_modified() which goes to the
      filesystem to see if it is really clean, but instead use
      ie_match_stat() with "assume racily clean paths are dirty"
      option, to force re-adding of such a path.
      
      There was another problem with "git add -u".  The codepath
      shares the same issue when adding the paths that are found to be
      modified, but in addition, it asked "git diff-files" machinery
      run_diff_files() function (which is "git diff-files") to list
      the paths that are modified.  But "git diff-files" machinery
      uses the same ie_modified() call so that it does not report
      racily clean _and_ actually clean paths as modified, which is
      not what we want.
      
      The patch allows the callers of run_diff_files() to pass the
      same "assume racily clean paths are dirty" option, and makes
      "git-add -u" codepath to use that option, to discover and re-add
      racily clean _and_ actually clean paths.
      
      We could further optimize on top of this patch to differentiate
      the case where the path really needs re-adding (i.e. the content
      of the racily clean entry was indeed different) and the case
      where only the cached stat information needs to be refreshed
      (i.e. the racily clean entry was actually clean), but I do not
      think it is worth it.
      
      This patch applies to maint and all the way up.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      fb63d7f8