提交 · 72441af7c4e3bde33cdf7edafcf09c227d5d5296 · 李少辉-开发者 / git

08 4月, 2014 1 次提交

tree-diff: rework diff_tree() to generate diffs for multiparent cases as well · 72441af7

由 Kirill Smelkov 提交于 4月 07, 2014

Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.

In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.

For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .

There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .

That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.

That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).

That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:

D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)      (w.r.t. paths)

where

D(A,P1...Pn) is combined diff between commit A, and parents Pi

and

D(A,Pi) is usual two-tree diff Pi..A

So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.

And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.

The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.

Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:

D(T,P1...Pn) calculation scheme
-------------------------------

D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn)	(regarding resulting paths set)

    D(T,Pj)		- diff between T..Pj
    D(T,P1...Pn)	- combined diff from T to parents P1,...,Pn

We start from all trees, which are sorted, and compare their entries in
lock-step:

     T     P1       Pn
     -     -        -
    |t|   |p1|     |pn|
    |-|   |--| ... |--|      imin = argmin(p1...pn)
    | |   |  |     |  |
    |-|   |--|     |--|
    |.|   |. |     |. |
     .     .        .
     .     .        .

at any time there could be 3 cases:

    1)  t < p[imin];
    2)  t > p[imin];
    3)  t = p[imin].

Schematic deduction of what every case means, and what to do, follows:

1)  t < p[imin]  ->  ∀j t ∉ Pj  ->  "+t" ∈ D(T,Pj)  ->  D += "+t";  t↓

2)  t > p[imin]

    2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
    2.2) ∀i  pi = p[imin]  ->  pi ∉ T  ->  "-pi" ∈ D(T,Pi)  ->  D += "-p[imin]";  ∀i pi↓

3)  t = p[imin]

    3.1) ∃j: pj > p[imin]  ->  "+t" ∈ D(T,Pj)  ->  only pi=p[imin] remains to investigate
    3.2) pi = p[imin]  ->  investigate δ(t,pi)
     |
     |
     v

    3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø  ->

                      ⎧δ(t,pi)  - if pi=p[imin]
             ->  D += ⎨
                      ⎩"+t"     - if pi>p[imin]

    in any case t↓  ∀ pi=p[imin]  pi↓

~

For comparison, here is how diff_tree() works:

D(A,B) calculation scheme
-------------------------

    A     B
    -     -
   |a|   |b|    a < b   ->  a ∉ B   ->   D(A,B) +=  +a    a↓
   |-|   |-|    a > b   ->  b ∉ A   ->   D(A,B) +=  -b    b↓
   | |   | |    a = b   ->  investigate δ(a,b)            a↓ b↓
   |-|   |-|
   |.|   |.|
    .     .
    .     .

~~~~~~~~

This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff

D(A,B)

is by definition the same as combined diff

D(A,[B]),

so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.

What we do is as follows:

1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
   a paths generator (new name diff_tree_paths()), with each generated path
   being `struct combine_diff_path` with info for path, new sha1,mode and for
   every parent which sha1,mode it was in it.

2) From that info, we can still generate usual diff queue with
   struct diff_filepairs, via "exporting" generated
   combine_diff_path, if we know we run for nparent=1 case.
   (see emit_diff() which is now named emit_diff_first_parent_only())

3) In order for diff_can_quit_early(), which checks

       DIFF_OPT_TST(opt, HAS_CHANGES))

   to work, that exporting have to be happening not in bulk, but
   incrementally, one diff path at a time.

   For such consumers, there is a new callback in diff_options
   introduced:

       ->pathchange(opt, struct combine_diff_path *)

   which, if set to !NULL, is called for every generated path.

   (see new compat ll_diff_tree_sha1() wrapper around new paths
    generator for setup)

4) The paths generation itself, is reworked from previous
   ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
   scheme" provided above:

   On the start we allocate [nparent] arrays in place what was
   earlier just for one parent tree.

   then we just generalize loops, and comparison according to the
   algorithm.

Some notes(*):

1) alloca(), for small arrays, is used for "runs not slower for
   nparent=1 case than before" goal - if we change it to xmalloc()/free()
   the timings get ~1% worse. For alloca() we use just-introduced
   xalloca/xalloca_free compatibility wrappers, so it should not be a
   portability problem.

2) For every parent tree, we need to keep a tag, whether entry from that
   parent equals to entry from minimal parent. For performance reasons I'm
   keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
   Not doing so, we'd need to alloca another [nparent] array, which hurts
   performance.

3) For emitted paths, memory could be reused, if we know the path was
   processed via callback and will not be needed later. We use efficient
   hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
   of potential additional slowdown.

4) goto(s) are used in several places, as the code executes a little bit
   faster with lowered register pressure.

Also

- we should now check for FIND_COPIES_HARDER not only when two entries
  names are the same, and their hashes are equal, but also for a case,
  when a path was removed from some of all parents having it.

  The reason is, if we don't, that path won't be emitted at all (see
  "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
  all paths - with diff or without - to be emitted, to be later analyzed
  for being copies sources.

  The new check is only necessary for nparent >1, as for nparent=1 case
  xmin_eqtotal always =1 =nparent, and a path is always added to diff as
  removal.

~~~~~~~~

Timings for

    # without -c, i.e. testing only nparent=1 case
    `git log --raw --no-abbrev --no-renames`

before and after the patch are as follows:

                navy.git        linux.git v3.10..v3.11

    before      0.611s          1.889s
    after       0.619s          1.907s
    slowdown    1.3%            0.9%

This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.

The output also stayed the same.

(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
    we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
   "do no harm for nparent=1 case" rule.

For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.

P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with

    `git log --reverse --raw --no-abbrev --no-renames -c`

to extract log of what was created/changed when, as a result building a
map

    {}  sha1    ->  in which commit (and date) a content was added

that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.

That case was my initial motivation for combined diffs speedup.
Signed-off-by: NKirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

72441af7

15 1月, 2014 1 次提交

refname_match(): always use the rules in ref_rev_parse_rules · 54457fe5

由 Michael Haggerty 提交于 1月 14, 2014

We used to use two separate rules for the normal ref resolution
dwimming and dwimming done to decide which remote ref to grab.  The
third parameter to refname_match() selected which rules to use.

When these two rules were harmonized in

    2011-11-04 dd621df9 refs DWIMmery: use the same rule for both "git fetch" and others

, ref_fetch_rules was #defined to avoid potential breakages for
in-flight topics.

It is now safe to remove the backwards-compatibility code, so remove
refname_match()'s third parameter, make ref_rev_parse_rules private to
refs.c, and remove ref_fetch_rules entirely.
Suggested-by: NJunio C Hamano <gitster@pobox.com>
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

54457fe5

07 1月, 2014 2 次提交

safe_create_leading_directories(): add new error value SCLD_VANISHED · 18d37e86

由 Michael Haggerty 提交于 1月 06, 2014

Add a new possible error result that can be returned by
safe_create_leading_directories() and
safe_create_leading_directories_const(): SCLD_VANISHED.  This value
indicates that a file or directory on the path existed at one point
(either it already existed or the function created it), but then it
disappeared.  This probably indicates that another process deleted the
directory while we were working.  If SCLD_VANISHED is returned, the
caller might want to retry the function call, as there is a chance
that a new attempt will succeed.

Why doesn't safe_create_leading_directories() do the retrying
internally?  Because an empty directory isn't really ever safe until
it holds a file.  So even if safe_create_leading_directories() were
absolutely sure that the directory existed before it returned, there
would be no guarantee that the directory still existed when the caller
tried to write something in it.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

18d37e86

safe_create_leading_directories(): introduce enum for return values · 0be0521b

由 Michael Haggerty 提交于 1月 06, 2014

Instead of returning magic integer values (which a couple of callers
go to the trouble of distinguishing), return values from an enum.  Add
a docstring.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

0be0521b

27 12月, 2013 1 次提交

sha1_object_info_extended: provide delta base sha1s · 5d642e75

由 Jeff King 提交于 12月 21, 2013

A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.

However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.

For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5d642e75

13 12月, 2013 3 次提交

sha1_object_info_extended(): add an "unsigned flags" parameter · de7b5d62

由 Christian Couder 提交于 12月 11, 2013

This parameter is not used yet, but it will be used to tell
sha1_object_info_extended() if it should perform object
replacement or not.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

de7b5d62

sha1_file.c: add lookup_replace_object_extended() to pass flags · bf93eea0

由 Christian Couder 提交于 12月 11, 2013

Currently, there is only one caller to lookup_replace_object()
that can benefit from passing it some flags, but we expect
that there could be more.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

bf93eea0

rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT · ffe68cf9

由 Christian Couder 提交于 12月 11, 2013

The READ_SHA1_FILE_REPLACE flag is more related to using the
lookup_replace_object() function rather than the
read_sha1_file() function.

We also need such a flag to be used with sha1_object_info()
instead of read_sha1_file().

The name LOOKUP_REPLACE_OBJECT is therefore better for this
flag.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ffe68cf9

11 12月, 2013 2 次提交

add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses · 069c0532

由 Nguyễn Thái Ngọc Duy 提交于 12月 05, 2013

This may be needed when a hook is run after a new shallow pack is
received, but .git/shallow is not settled yet. A temporary shallow
file to plug all loose ends should be used instead. GIT_SHALLOW_FILE
is overriden by --shallow-file.

--shallow-file does not work in this case because the hook may spawn
many git subprocesses and the launch commands do not have
--shallow-file as it's a recent addition.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

069c0532

shallow.c: the 8 steps to select new commits for .git/shallow · 58babfff

由 Nguyễn Thái Ngọc Duy 提交于 12月 05, 2013

Suppose a fetch or push is requested between two shallow repositories
(with no history deepening or shortening). A pack that contains
necessary objects is transferred over together with .git/shallow of
the sender. The receiver has to determine whether it needs to update
.git/shallow if new refs needs new shallow comits.

The rule here is avoid updating .git/shallow by default. But we don't
want to waste the received pack. If the pack contains two refs, one
needs new shallow commits installed in .git/shallow and one does not,
we keep the latter and reject/warn about the former.

Even if .git/shallow update is allowed, we only add shallow commits
strictly necessary for the former ref (remember the sender can send
more shallow commits than necessary) and pay attention not to
accidentally cut the receiver history short (no history shortening is
asked for)

So the steps to figure out what ref need what new shallow commits are:

1. Split the sender shallow commit list into "ours" and "theirs" list
   by has_sha1_file. Those that exist in current repo in "ours", the
   remaining in "theirs".

2. Check the receiver .git/shallow, remove from "ours" the ones that
   also exist in .git/shallow.

3. Fetch the new pack. Either install or unpack it.

4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
   has_sha1_file. Obviously the new pack does not need them.

5. If the pack is kept, remove from "ours" the ones that do not exist
   in the new pack.

6. Walk the new refs to answer the question "what shallow commits,
   both ours and theirs, are required in .git/shallow in order to add
   this ref?". Shallow commits not associated to any refs are removed
   from their respective list.

7. (*) Check reachability (from the current refs) of all remaining
   commits in "ours". Those reachable are removed. We do not want to
   cut any part of our (reachable) history. We only check up
   commits. True reachability test is done by
   check_everything_connected() at the end as usual.

8. Combine the final "ours" and "theirs" and add them all to
   .git/shallow. Install new refs. The case where some hook rejects
   some refs on a push is explained in more detail in the push
   patches.

Of these steps, #6 and #7 are expensive. Both require walking through
some commits, or in the worst case all commits. And we rather avoid
them in at least common case, where the transferred pack does not
contain any shallow commits that the sender advertises. Let's look at
each scenario:

1) the sender has longer history than the receiver

   All shallow commits from the sender will be put into "theirs" list
   at step 1 because none of them exists in current repo. In the
   common case, "theirs" becomes empty at step 4 and exit early.

2) the sender has shorter history than the receiver

   All shallow commits from the sender are likely in "ours" list at
   step 1. In the common case, if the new pack is kept, we could empty
   "ours" and exit early at step 5.

   If the pack is not kept, we hit the expensive step 6 then exit
   after "ours" is emptied. There'll be only a handful of objects to
   walk in fast-forward case. If it's forced update, we may need to
   walk to the bottom.

3) the sender has same .git/shallow as the receiver

   This is similar to case 2 except that "ours" should be emptied at
   step 2 and exit early.

A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
a shallow repo) is case 3. Luckily they're cheap for the common case.

A push from "clone --depth=X" falls into case 2, which is expensive.
Some more work may be done at the sender/client side to avoid more
work on the server side: if the transferred pack does not contain any
shallow commits, send-pack should not send any shallow commits to the
receive-pack, effectively turning it into a normal push and avoid all
steps.

This patch implements all steps except #3, already handled by
fetch-pack and receive-pack, #6 and #7, which has their own patch due
to their size.

(*) in previous versions step 7 was put before step 3. I reorder it so
    that the common case that keeps the pack does not need to walk
    commits at all. In future if we implement faster commit
    reachability check (maybe with the help of pack bitmaps or commit
    cache), step 7 could become cheap and be moved up before 6 again.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

58babfff

28 10月, 2013 1 次提交

cache: remove unused function 'have_git_dir' · 84471a12

由 Stefan Beller 提交于 10月 26, 2013

This function was added in d2b0708e (2008-09-27, add have_git_dir()
function) as a preparation for adbc0b6b (2008-09-30, cygwin: Use native
Win32 API for stat).

However the second referenced commit was reverted in f66450ae (2013-06-22,
cygwin: Remove the Win32 l/stat() implementation), so we don't need to
expose this wrapper function any more as a public API.
Signed-off-by: NStefan Beller <stefanbeller@googlemail.com>
Acked-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

84471a12

25 10月, 2013 1 次提交

checkout_entry(): clarify the use of topath[] parameter · af2a651d

由 Junio C Hamano 提交于 10月 23, 2013

The said function has this signature:

	extern int checkout_entry(struct cache_entry *ce,
				  const struct checkout *state,
				  char *topath);

At first glance, it might appear that the caller of checkout_entry()
can specify to which path the contents are written out by the last
parameter, and it is tempting to add "const" in front of its type.

In reality, however, topath[] is to point at a buffer to store the
temporary path generated by the callchain originating from this
function, and the temporary path is always short, much shorter than
the buffer prepared by its only caller in builtin/checkout-index.c.

Document the code a bit to clarify so that future callers know how
to use the function better.
Noticed-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

af2a651d

14 10月, 2013 1 次提交

Use simpler relative_path when set_git_dir · 41894ae3

由 Jiang Xin 提交于 10月 14, 2013

Using a relative_path as git_dir first appears in v1.5.6-1-g044bbbcb.
It will make git_dir shorter only if git_dir is inside work_tree,
and this will increase performance. But my last refactor effort on
relative_path function (commit v1.8.3-rc2-12-ge02ca72f) changed that.
Always use relative_path as git_dir may bring troubles like
$gmane/234434.

Because new relative_path is a combination of original relative_path
from path.c and original path_relative from quote.c, so in order to
restore the origin implementation, save the original relative_path
as remove_leading_path, and call it in setup.c.
Suggested-by: NKarsten Blees <karsten.blees@gmail.com>
Signed-off-by: NJiang Xin <worldhello.net@gmail.com>
Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>

41894ae3

21 9月, 2013 1 次提交

format-patch: print in-body "From" only when needed · 662cc30c

由 Jeff King 提交于 9月 20, 2013

Commit a9080475 taught format-patch the "--from" option,
which places the author ident into an in-body from header,
and uses the committer ident in the rfc822 from header.  The
documentation claims that it will omit the in-body header
when it is the same as the rfc822 header, but the code never
implemented that behavior.

This patch completes the feature by comparing the two idents
and doing nothing when they are the same (this is the same
as simply omitting the in-body header, as the two are by
definition indistinguishable in this case). This makes it
reasonable to turn on "--from" all the time (if it matches
your particular workflow), rather than only using it when
exporting other people's patches.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

662cc30c

18 9月, 2013 2 次提交

J
connect.c: make parse_feature_value() static · 5d54cffc
由 Junio C Hamano 提交于 9月 17, 2013
```
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
5d54cffc

name-hash: refactor polymorphic index_name_exists() · db5360f3

由 Eric Sunshine 提交于 9月 17, 2013

Depending upon the absence or presence of a trailing '/' on the incoming
pathname, index_name_exists() checks either if a file is present in the
index or if a directory is represented within the index. Each caller
explicitly chooses the mode of operation by adding or removing a
trailing '/' before invoking index_name_exists().

Since these two modes of operations are disjoint and have no code in
common (one searches index_state.name_hash; the other dir_hash), they
can be represented more naturally as distinct functions: one to search
for a file, and one for a directory.

Splitting index searching into two functions relieves callers of the
artificial burden of having to add or remove a slash to select the mode
of operation; instead they just call the desired function. A subsequent
patch will take advantage of this benefit in order to eliminate the
requirement that the incoming pathname for a directory search must have
a trailing slash.

(In order to avoid disturbing in-flight topics, index_name_exists() is
retained as a thin wrapper dispatching either to index_dir_exists() or
index_file_exists().)
Signed-off-by: NEric Sunshine <sunshine@sunshineco.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

db5360f3

10 9月, 2013 1 次提交

git-config: always treat --int as 64-bit internally · 00160242

由 Jeff King 提交于 9月 08, 2013

When you run "git config --int", the maximum size of integer
you get depends on how git was compiled, and what it
considers to be an "int".

This is almost useful, because your scripts calling "git
config" will behave similarly to git internally. But relying
on this is dubious; you have to actually know how git treats
each value internally (e.g., int versus unsigned long),
which is not documented and is subject to change. And even
if you know it is "unsigned long", we do not have a
git-config option to match that behavior.

Furthermore, you may simply be asking git to store a value
on your behalf (e.g., configuration for a hook). In that
case, the relevant range check has nothing at all to do with
git, but rather with whatever scripting tools you are using
(and git has no way of knowing what the appropriate range is
there).

Not only is the range check useless, but it is actively
harmful, as there is no way at all for scripts to look
at config variables with large values. For instance, one
cannot reliably get the value of pack.packSizeLimit via
git-config. On an LP64 system, git happily uses a 64-bit
"unsigned long" internally to represent the value, but the
script cannot read any value over 2G.

Ideally, the "--int" option would simply represent an
arbitrarily large integer. For practical purposes, however,
a 64-bit integer is large enough, and is much easier to
implement (and if somebody overflows it, we will still
notice the problem, and not simply return garbage).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

00160242

04 9月, 2013 1 次提交

sha1-name: pass len argument to interpret_branch_name() · cf99a761

由 Felipe Contreras 提交于 9月 02, 2013

This is useful to make sure we don't step outside the boundaries of what
we are interpreting at the moment. For example while interpreting
foobar@{u}~1, the job of interpret_branch_name() ends right before ~1,
but there's no way to figure that out inside the function, unless the
len argument is passed.

So let's do that.
Signed-off-by: NFelipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

cf99a761

21 8月, 2013 1 次提交

read-cache: use fixed width integer types · 7800c1eb

由 Thomas Gummerer 提交于 8月 18, 2013

Use the fixed width integer types uint16_t and uint32_t for on-disk
structures; unsigned short and unsigned int do not have a guaranteed
size.
Signed-off-by: NThomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

7800c1eb

30 7月, 2013 1 次提交

many small typofixes · 98e023de

由 Ondřej Bílka 提交于 7月 29, 2013

Signed-off-by: NOndřej Bílka <neleai@seznam.cz>
Reviewed-by: NMarc Branchaud <marcnarc@xiplink.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

98e023de

18 7月, 2013 1 次提交

daemon/shell: refactor redirection of 0/1/2 from /dev/null · 1d999ddd

由 Thomas Rast 提交于 7月 16, 2013

Both daemon.c and shell.c contain logic to open FDs 0/1/2 from
/dev/null if they are not already open.  Move the function in daemon.c
to setup.c and use it in shell.c, too.

While there, remove a 'not' that inverted the meaning of the comment.
The point is indeed to *avoid* messing up.
Signed-off-by: NThomas Rast <trast@inf.ethz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1d999ddd

16 7月, 2013 8 次提交

parse_pathspec: accept :(icase)path syntax · 93d93537

由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013

Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

93d93537

pathspec: support :(glob) syntax · bd30c2e4

由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013

:(glob)path differs from plain pathspec that it uses wildmatch with
WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The
difference lies in how '*' (and '**') is processed.

With the introduction of :(glob) and :(literal) and their global
options --[no]glob-pathspecs, the user can:

 - make everything literal by default via --noglob-pathspecs
   --literal-pathspecs cannot be used for this purpose as it
   disables _all_ pathspec magic.

 - individually turn on globbing with :(glob)

 - make everything globbing by default via --glob-pathspecs

 - individually turn off globbing with :(literal)

The implication behind this is, there is no way to gain the default
matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get
new globbing or literal. The old fnmatch behavior is considered
deprecated and discouraged to use.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

bd30c2e4

parse_pathspec: make sure the prefix part is wildcard-free · 645a29c4

由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013

Prepending prefix to pathspec is a trick to workaround the fact that
commands can be executed in a subdirectory, but all git commands run
at worktree's root. The prefix part should always be treated as
literal string. Make it so.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

645a29c4

N
convert add_files_to_cache to take struct pathspec · 3efe8e43
由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013
```
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
3efe8e43

convert refresh_index to take struct pathspec · 9b2d6149

由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013

Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9b2d6149

N
convert report_path_error to take struct pathspec · 17ddc66e
由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013
```
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
17ddc66e
N
convert read_cache_preload() to take struct pathspec · 5ab2a2da
由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013
```
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
5ab2a2da
N
move struct pathspec and related functions to pathspec.[ch] · 64acde94
由 Nguyễn Thái Ngọc Duy 提交于 7月 14, 2013
```
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
64acde94

13 7月, 2013 3 次提交

sha1_object_info_extended: make type calculation optional · 5b086407

由 Jeff King 提交于 7月 12, 2013

Each caller of sha1_object_info_extended sets up an
object_info struct to tell the function which elements of
the object it wants to get. Until now, getting the type of
the object has always been required (and it is returned via
the return type rather than a pointer in object_info).

This can involve actually opening a loose object file to
determine its type, or following delta chains to determine a
packed file's base type. These effects produce a measurable
slow-down when doing a "cat-file --batch-check" that does
not include %(objecttype).

This patch adds a "typep" query to struct object_info, so
that it can be optionally queried just like size and
disk_size. As a result, the return type of the function is
no longer the object type, but rather 0/-1 for success/error.

As there are only three callers total, we just fix up each
caller rather than keep a compatibility wrapper:

  1. The simpler sha1_object_info wrapper continues to
     always ask for and return the type field.

  2. The istream_source function wants to know the type, and
     so always asks for it.

  3. The cat-file batch code asks for the type only when
     %(objecttype) is part of the format string.

On linux.git, the best-of-five for running:

  $ git rev-list --objects --all >objects
  $ time git cat-file --batch-check='%(objectsize:disk)'

on a fully packed repository goes from:

  real    0m8.680s
  user    0m8.160s
  sys     0m0.512s

to:

  real    0m7.205s
  user    0m6.580s
  sys     0m0.608s
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5b086407

cat-file: disable object/refname ambiguity check for batch mode · 25fba78d

由 Jeff King 提交于 7月 12, 2013

A common use of "cat-file --batch-check" is to feed a list
of objects from "rev-list --objects" or a similar command.
In this instance, all of our input objects are 40-byte sha1
ids. However, cat-file has always allowed arbitrary revision
specifiers, and feeds the result to get_sha1().

Fortunately, get_sha1() recognizes a 40-byte sha1 before
doing any hard work trying to look up refs, meaning this
scenario should end up spending very little time converting
the input into an object sha1. However, since 798c35fc
(get_sha1: warn about full or short object names that look
like refs, 2013-05-29), when we encounter this case, we
spend the extra effort to do a refname lookup anyway, just
to print a warning. This is further exacerbated by ca919930
(get_packed_ref_cache: reload packed-refs file when it
changes, 2013-06-20), which makes individual ref lookup more
expensive by requiring a stat() of the packed-refs file for
each missing ref.

With no patches, this is the time it takes to run:

  $ git rev-list --objects --all >objects
  $ time git cat-file --batch-check='%(objectname)' <objects

on the linux.git repository:

  real    1m13.494s
  user    0m25.924s
  sys     0m47.532s

If we revert ca919930, the packed-refs up-to-date check, it
gets a little better:

  real    0m54.697s
  user    0m21.692s
  sys     0m32.916s

but we are still spending quite a bit of time on ref lookup
(and we would not want to revert that patch, anyway, which
has correctness issues).  If we revert 798c35fc, disabling
the warning entirely, we get a much more reasonable time:

  real    0m7.452s
  user    0m6.836s
  sys     0m0.608s

This patch does the moral equivalent of this final case (and
gets similar speedups). We introduce a global flag that
callers of get_sha1() can use to avoid paying the price for
the warning.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

25fba78d

teach config --blob option to parse config from database · 1bc88819

由 Heiko Voigt 提交于 7月 12, 2013

This can be used to read configuration values directly from git's
database. For example it is useful for reading to be checked out
.gitmodules files directly from the database.
Signed-off-by: NHeiko Voigt <hvoigt@hvoigt.net>
Acked-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1bc88819

10 7月, 2013 1 次提交

Convert "struct cache_entry *" to "const ..." wherever possible · 9c5e6c80

由 Nguyễn Thái Ngọc Duy 提交于 7月 09, 2013

I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9c5e6c80

09 7月, 2013 1 次提交

cache.h: move remote/connect API out of it · 47a59185

由 Junio C Hamano 提交于 7月 08, 2013

The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me.  This structure is not
about the local ref used by sha1-name API to name local objects.

It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values.  It belongs to "remote.h" together
with "struct refspec".

While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

47a59185

08 7月, 2013 1 次提交

teach sha1_object_info_extended a "disk_size" query · 161f00e7

由 Jeff King 提交于 7月 07, 2013

Using sha1_object_info_extended, a caller can find out the
type of an object, its size, and information about where it
is stored. In addition to the object's "true" size, it can
also be useful to know the size that the object takes on
disk (e.g., to generate statistics about which refs consume
space).

This patch adds a "disk_sizep" field to "struct object_info",
and fills it in during sha1_object_info_extended if it is
non-NULL.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

161f00e7

27 6月, 2013 1 次提交

path.c: refactor relative_path(), not only strip prefix · e02ca72f

由 Jiang Xin 提交于 6月 25, 2013

Original design of relative_path() is simple, just strip the prefix
(*base) from the absolute path (*abs).

In most cases, we need a real relative path, such as: ../foo,
../../bar.  That's why there is another reimplementation
(path_relative()) in quote.c.

Borrow some codes from path_relative() in quote.c to refactor
relative_path() in path.c, so that it could return real relative
path, and user can reuse this function without reimplementing
his/her own.  The function path_relative() in quote.c will be
substituted, and I would use the new relative_path() function when
implementing the interactive git-clean later.

Different results for relative_path() before and after this refactor:

    abs path  base path  relative (original)  relative (refactor)
    ========  =========  ===================  ===================
    /a/b      /a/b       .                    ./
    /a/b/     /a/b       .                    ./
    /a        /a/b/      /a                   ../
    /         /a/b/      /                    ../../
    /a/c      /a/b/      /a/c                 ../c
    /x/y      /a/b/      /x/y                 ../../x/y

    a/b/      a/b/       .                    ./
    a/b/      a/b        .                    ./
    a         a/b        a                    ../
    x/y       a/b/       x/y                  ../../x/y
    a/c       a/b        a/c                  ../c

    (empty)   (null)     (empty)              ./
    (empty)   (empty)    (empty)              ./
    (empty)   /a/b       (empty)              ./
    (null)    (null)     (null)               ./
    (null)    (empty)    (null)               ./
    (null)    /a/b       (segfault)           ./

You may notice that return value "." has been changed to "./".
It is because:

 * Function quote_path_relative() in quote.c will show the relative
   path as "./" if abs(in) and base(prefix) are the same.

 * Function relative_path() is called only once (in setup.c), and
   it will be OK for the return value as "./" instead of ".".
Signed-off-by: NJiang Xin <worldhello.net@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

e02ca72f

21 6月, 2013 2 次提交

add a stat_validity struct · 38612532

由 Michael Haggerty 提交于 6月 20, 2013

It can sometimes be useful to know whether a path in the
filesystem has been updated without going to the work of
opening and re-reading its content. We trust the stat()
information on disk already to handle index updates, and we
can use the same trick here.

This patch introduces a "stat_validity" struct which
encapsulates the concept of checking the stat-freshness of a
file. It is implemented on top of "struct stat_data" to
reuse the logic about which stat entries to trust for a
particular platform, but hides the complexity behind two
simple functions: check and update.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

38612532

Extract a struct stat_data from cache_entry · c21d39d7

由 Michael Haggerty 提交于 6月 20, 2013

Add public functions fill_stat_data() and match_stat_data() to work
with it.  This infrastructure will later be used to check the validity
of other types of file.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c21d39d7

10 6月, 2013 1 次提交

core: use env variable instead of config var to turn on logging pack access · b12ca963

由 Nguyễn Thái Ngọc Duy 提交于 6月 09, 2013

5f44324d (core: log offset pack data accesses happened - 2011-07-06)
provides a way to observe pack access patterns via a config
switch. Setting an environment variable looks more obvious than a
config var, especially when you just need to _observe_, and more
inline with other tracing knobs we have.

Document it as it may be useful for remote troubleshooting.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b12ca963

03 6月, 2013 1 次提交

read-cache: mark cache_entry pointers const · 21a6b9fa

由 René Scharfe 提交于 6月 02, 2013

ie_match_stat and ie_modified only derefence their struct cache_entry
pointers for reading. Add const to the parameter declaration here and
do the same for the static helper function used by them, as it's the
same there as well. This allows callers to pass in const pointers.
Signed-off-by: NRené Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

21a6b9fa

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致