提交 · 7999b2cf772956466baa8925491d6fb1b0963292 · 李少辉-开发者 / git

20 11月, 2015 1 次提交

Add several uses of get_object_hash. · 7999b2cf

由 brian m. carlson 提交于 11月 10, 2015

Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: Nbrian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: NJeff King <peff@peff.net>

7999b2cf

18 4月, 2015 1 次提交

type_from_string_gently: make sure length matches · b7994af0

由 Jeff King 提交于 4月 17, 2015

When commit fe8e3b71 refactored type_from_string to allow
input that was not NUL-terminated, it switched to using
strncmp instead of strcmp. But this means we check only the
first "len" bytes of the strings, and ignore any remaining
bytes in the object_type_string. We should make sure that it
is also "len" bytes, or else we would accept "comm" as
"commit", and so forth.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b7994af0

20 10月, 2014 1 次提交

drop add_object_array_with_mode · 189a1222

由 Jeff King 提交于 10月 18, 2014

This is a thin compatibility wrapper around
add_pending_object_with_path. But the only caller is
add_object_array, which is itself just a thin compatibility
wrapper. There are no external callers, so we can just
remove this middle wrapper.
Noticed-by: NRamsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

189a1222

17 10月, 2014 3 次提交

make add_object_array_with_context interface more sane · 9e0c3c4f

由 Jeff King 提交于 10月 15, 2014

When you resolve a sha1, you can optionally keep any context
found during the resolution, including the path and mode of
a tree entry (e.g., when looking up "HEAD:subdir/file.c").

The add_object_array_with_context function lets you then
attach that context to an entry in a list. Unfortunately,
the interface for doing so is horrible. The object_context
structure is large and most object_array users do not use
it. Therefore we keep a pointer to the structure to avoid
burdening other users too much. But that means when we do
use it that we must allocate the struct ourselves. And the
struct contains a fixed PATH_MAX-sized buffer, which makes
this wholly unsuitable for any large arrays.

We can observe that there is only a single user of the
"with_context" variant: builtin/grep.c. And in that use
case, the only element we care about is the path. We can
therefore store only the path as a pointer (the context's
mode field was redundant with the object_array_entry itself,
and nobody actually cared about the surrounding tree). This
still requires a strdup of the pathname, but at least we are
only consuming the minimum amount of memory for each string.

We can also handle the copying ourselves in
add_object_array_*, and free it as appropriate in
object_array_release_entry.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9e0c3c4f

object_array: add a "clear" function · 46be8231

由 Jeff King 提交于 10月 15, 2014

There's currently no easy way to free the memory associated
with an object_array (and in most cases, we simply leak the
memory in a rev_info's pending array). Let's provide a
helper to make this easier to handle.

We can make use of it in list-objects.c, which does the same
thing by hand (but fails to free the "name" field of each
entry, potentially leaking memory).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

46be8231

object_array: factor out slopbuf-freeing logic · 68f49235

由 Jeff King 提交于 10月 15, 2014

This is not a lot of code, but it's a logical construct that
should not need to be repeated (and we are about to add a
third repetition).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

68f49235

19 9月, 2014 1 次提交
- R
  use REALLOC_ARRAY for changing the allocation size of arrays · 2756ca43
  由 René Scharfe 提交于 9月 16, 2014
```
Signed-off-by: NRene Scharfe <l.s.r@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
  2756ca43
11 9月, 2014 1 次提交

Refactor type_from_string() to allow continuing after detecting an error · fe8e3b71

由 Johannes Schindelin 提交于 9月 10, 2014

In the next commits, we will enhance the fsck_tag() function to check
tag objects more thoroughly. To this end, we need a function to verify
that a given string is a valid object type, but that does not die() in
the negative case.

While at it, prepare type_from_string() for counted strings, i.e. strings
with an explicitly specified length rather than a NUL termination.
Signed-off-by: NJohannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

fe8e3b71

29 7月, 2014 4 次提交

object_as_type: set commit index · 34dfe197

由 Jeff King 提交于 7月 13, 2014

The point of the "index" field of struct commit is that
every allocated commit would have one. It is supposed to be
an invariant that whenever object->type is set to
OBJ_COMMIT, we have a unique index.

Commit 969eba63 (commit: push commit_index update into
alloc_commit_node, 2014-06-10) covered this case for
newly-allocated commits. However, we may also allocate an
"unknown" object via lookup_unknown_object, and only later
convert it to a commit. We must make sure that we set the
commit index when we switch the type field.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

34dfe197

add object_as_type helper for casting objects · c4ad00f8

由 Jeff King 提交于 7月 13, 2014

When we call lookup_commit, lookup_tree, etc, the logic goes
something like:

  1. Look for an existing object struct. If we don't have
     one, allocate and return a new one.

  2. Double check that any object we have is the expected
     type (and complain and return NULL otherwise).

  3. Convert an object with type OBJ_NONE (from a prior
     call to lookup_unknown_object) to the expected type.

We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.

Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.

Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:

  - for lookup_{commit,tree,etc} functions, we are just
    pulling steps 2 and 3 into a function that does the same
    thing.

  - for the call in peel_object, we currently only do step 3
    (but we want to consolidate it with the others, as
    mentioned above). However, step 2 is a noop here, as the
    surrounding conditional makes sure we have OBJ_NONE
    (which we want to keep to avoid an extraneous call to
    sha1_object_info).

  - for the call in lookup_commit_reference_gently, we are
    currently doing step 2 but not step 3. However, step 3
    is a noop here. The object we got will have just come
    from deref_tag, which must have figured out the type for
    each object in order to know when to stop peeling.
    Therefore the type will never be OBJ_NONE.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c4ad00f8

parse_object_buffer: do not set object type · fe0444b5

由 Jeff King 提交于 7月 13, 2014

The only way that "obj" can be non-NULL is if it came from
one of the lookup_* functions. These functions always ensure
that the object has the expected type (and return NULL
otherwise), so there is no need for us to set the type.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

fe0444b5

move setting of object->type to alloc_* functions · fe24d396

由 Jeff King 提交于 7月 13, 2014

The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

fe24d396

14 7月, 2014 4 次提交

object_as_type: set commit index · d66bebcb

由 Jeff King 提交于 7月 13, 2014

The point of the "index" field of struct commit is that
every allocated commit would have one. It is supposed to be
an invariant that whenever object->type is set to
OBJ_COMMIT, we have a unique index.

Commit 969eba63 (commit: push commit_index update into
alloc_commit_node, 2014-06-10) covered this case for
newly-allocated commits. However, we may also allocate an
"unknown" object via lookup_unknown_object, and only later
convert it to a commit. We must make sure that we set the
commit index when we switch the type field.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d66bebcb

add object_as_type helper for casting objects · 8ff226a9

由 Jeff King 提交于 7月 13, 2014

When we call lookup_commit, lookup_tree, etc, the logic goes
something like:

  1. Look for an existing object struct. If we don't have
     one, allocate and return a new one.

  2. Double check that any object we have is the expected
     type (and complain and return NULL otherwise).

  3. Convert an object with type OBJ_NONE (from a prior
     call to lookup_unknown_object) to the expected type.

We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.

Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.

Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:

  - for lookup_{commit,tree,etc} functions, we are just
    pulling steps 2 and 3 into a function that does the same
    thing.

  - for the call in peel_object, we currently only do step 3
    (but we want to consolidate it with the others, as
    mentioned above). However, step 2 is a noop here, as the
    surrounding conditional makes sure we have OBJ_NONE
    (which we want to keep to avoid an extraneous call to
    sha1_object_info).

  - for the call in lookup_commit_reference_gently, we are
    currently doing step 2 but not step 3. However, step 3
    is a noop here. The object we got will have just come
    from deref_tag, which must have figured out the type for
    each object in order to know when to stop peeling.
    Therefore the type will never be OBJ_NONE.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

8ff226a9

parse_object_buffer: do not set object type · 5af01caa

由 Jeff King 提交于 7月 13, 2014

The only way that "obj" can be non-NULL is if it came from
one of the lookup_* functions. These functions always ensure
that the object has the expected type (and return NULL
otherwise), so there is no need for us to set the type.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5af01caa

move setting of object->type to alloc_* functions · d36f51c1

由 Jeff King 提交于 7月 13, 2014

The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d36f51c1

08 7月, 2014 1 次提交

hashmap: factor out getting a hash code from a SHA1 · 039dc71a

由 Karsten Blees 提交于 7月 03, 2014

Copying the first bytes of a SHA1 is duplicated in six places,
however, the implications (the actual value would depend on the
endianness of the platform) is documented only once.

Add a properly documented API for this.
Signed-off-by: NKarsten Blees <blees@dcon.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

039dc71a

14 6月, 2014 3 次提交

commit: record buffer length in cache · 8597ea3a

由 Jeff King 提交于 6月 10, 2014

Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

8597ea3a

use get_cached_commit_buffer where appropriate · a97934d8

由 Jeff King 提交于 6月 10, 2014

Some call sites check commit->buffer to see whether we have
a cached buffer, and if so, do some work with it. In the
long run we may want to switch these code paths to make
their decision on a different boolean flag (because checking
the cache may get a little more expensive in the future).
But for now, we can easily support them by converting the
calls to use get_cached_commit_buffer.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a97934d8

provide a helper to set the commit buffer · 66c2827e

由 Jeff King 提交于 6月 10, 2014

Right now this is just a one-liner, but abstracting it will
make it easier to change later.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

66c2827e

01 3月, 2014 1 次提交

Document some functions defined in object.c · 33bef7ea

由 Michael Haggerty 提交于 2月 28, 2014

Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

33bef7ea

12 9月, 2013 1 次提交

lookup_object: remove hashtable_index() and optimize hash_obj() · 9f36c9b7

由 Nicolas Pitre 提交于 9月 10, 2013

hashtable_index() appears to be a close duplicate of hash_obj().
Keep only the later and make it usable for all cases.

Also remove the modulus as this is an expensive operation.
The size argument is always a power of 2 anyway, so a simple
mask operation provides the same result.

On a 'git rev-list --all --objects' run this decreased the time spent
in lookup_object from 27.5% to 24.1%.

[jc: with a few comments on "modulus turned into mask" by Peff]
Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9f36c9b7

18 7月, 2013 1 次提交

parse_object_buffer: correct freeing the buffer · 8e92e8f2

由 Stefan Beller 提交于 7月 18, 2013

If we exit early in the function parse_object_buffer, we did not
write to *eaten_p. Then the calling function parse_object, which looks
like the following with respect to the eaten variable, cannot rely on a
proper value set in eaten, hence the freeing of the buffer depends
on random values in memory.

	struct object *parse_object(const unsigned char *sha1)
	{
		int eaten;
		...
		obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
		if (!eaten)
			free(buffer);
	}

This change makes sure, the buffer freeing condition is deterministic.
Signed-off-by: NStefan Beller <stefanbeller@googlemail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

8e92e8f2

03 6月, 2013 1 次提交

object_array_entry: fix memory handling of the name field · 31faeb20

由 Michael Haggerty 提交于 5月 25, 2013

Previously, the memory management of the object_array_entry::name
field was inconsistent and undocumented.  object_array_entries are
ultimately created by a single function, add_object_array_with_mode(),
which has an argument "const char *name".  This function used to
simply set the name field to reference the string pointed to by the
name parameter, and nobody on the object_array side ever freed the
memory.  Thus, it assumed that the memory for the name field would be
managed by the caller, and that the lifetime of that string would be
at least as long as the lifetime of the object_array_entry.  But
callers were inconsistent:

* Some passed pointers to constant strings or argv entries, which was
  OK.

* Some passed pointers to newly-allocated memory, but didn't arrange
  for the memory ever to be freed.

* Some passed the return value of sha1_to_hex(), which is a pointer to
  a statically-allocated buffer that can be overwritten at any time.

* Some passed pointers to refnames that they received from a
  for_each_ref()-type iteration, but the lifetimes of such refnames is
  not guaranteed by the refs API.

Bring consistency to this mess by changing object_array to make its
own copy for the object_array_entry::name field and free this memory
when an object_array_entry is deleted from the array.

Many callers were passing the empty string as the name parameter, so
as a performance optimization, treat the empty string specially.
Instead of making a copy, store a pointer to a statically-allocated
empty string to object_array_entry::name.  When deleting such an
entry, skip the free().

Change the callers that were already passing copies to
add_object_array_with_mode() to either skip the copy, or (if the
memory needed to be allocated anyway) freeing the memory itself.

A part of this commit effectively reverts

    70d26c6e read_revisions_from_stdin: make copies for handle_revision_arg

because the copying introduced by that commit (which is still
necessary) is now done at a deeper level.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

31faeb20

29 5月, 2013 2 次提交

object_array_remove_duplicates(): rewrite to reduce copying · 1506510c

由 Michael Haggerty 提交于 5月 25, 2013

The old version copied one entry to its destination position, then
deleted any matching entries from the tail of the array.  This
required the tail of the array to be copied multiple times.  It didn't
affect the complexity of the algorithm because the whole tail has to
be searched through anyway.  But all the copying was unnecessary.

Instead, check for the existence of an entry with the same name in the
*head* of the list before copying an entry to its final position.
This way each entry has to be copied at most one time.

Extract a helper function contains_name() to do a bit of the work.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1506510c

object_array: add function object_array_filter() · aeb4a51e

由 Michael Haggerty 提交于 5月 25, 2013

Add a function that allows unwanted entries in an object_array to be
removed.  This encapsulation is a step towards giving object_array
ownership of its entries' name memory.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

aeb4a51e

11 5月, 2013 1 次提交

grep: honor --textconv for the case rev:path · afa15f3c

由 Michael J Gruber 提交于 5月 10, 2013

Make "grep" honor the "--textconv" option also for the object case, i.e.
when used with an argument "rev:path".
Signed-off-by: NMichael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

afa15f3c

02 5月, 2013 1 次提交

lookup_object: prioritize recently found objects · 9a414486

由 Jeff King 提交于 5月 01, 2013

The lookup_object function is backed by a hash table of all
objects we have seen in the program. We manage collisions
with a linear walk over the colliding entries, checking each
with hashcmp(). The main cost of lookup is in these
hashcmp() calls; finding our item in the first slot is
cheaper than finding it in the second slot, which is cheaper
than the third, and so on.

If we assume that there is some locality to the object
lookups (e.g., if X and Y collide, and we have just looked
up X, the next lookup is more likely to be for X than for
Y), then we can improve our average lookup speed by checking
X before Y.

This patch does so by swapping a found item to the front of
the collision chain. The p0001 perf test reveals that this
does indeed exploit locality in the case of "rev-list --all
--objects":

Test                               origin          this tree
-------------------------------------------------------------------------
0001.1: rev-list --all             0.40(0.38+0.02) 0.40(0.36+0.03) +0.0%
0001.2: rev-list --all --objects   2.24(2.17+0.05) 1.86(1.79+0.05) -17.0%

This is not surprising, as the full object traversal will
hit the same tree entries over and over (e.g., for every
commit that doesn't change "Documentation/", we will have to
look up the same sha1 just to find out that we already
processed it).

The reason why this technique works (and does not violate
any properties of the hash table) is subtle and bears some
explanation. Let's imagine we get a lookup for sha1 `X`, and
it hashes to bucket `i` in our table. That stretch of the
table may look like:

index       | i-1 |  i  | i+1 | i+2 |
       -----------------------------------
entry   ... |  A  |  B  |  C  |  X  | ...
       -----------------------------------

We start our probe at i, see that B does not match, nor does
C, and finally find X. There may be multiple C's in the
middle, but we know that there are no empty slots (or else
we would not find X at all).

We do not know the original index of B; it may be `i`, or it
may be less than i (e.g., if it were `i-1`, it would collide
with A and spill over into the `i` bucket). So it is
acceptable for us to move it to the right of a contiguous
stretch of entries (because we will find it from a linear
walk starting anywhere at `i` or before), but never to the
left (if we moved it to `i-1`, we would miss it when
starting our walk at `i`).

We do know the original index of X; it is `i`, so it is safe
to place it anywhere in the contiguous stretch between `i`
and where we found it (`i+2` in the this case).

This patch does a pure swap; after finding X in the
situation above, we would end with:

index       | i-1 |  i  | i+1 | i+2 |
       -----------------------------------
entry   ... |  A  |  X  |  C  |  B  | ...
       -----------------------------------

We could instead bump X into the `i` slot, and then shift
the whole contiguous chain down by one, resulting in:

index       | i-1 |  i  | i+1 | i+2 |
       -----------------------------------
entry   ... |  A  |  X  |  B  |  C  | ...
       -----------------------------------

That puts our chain in true most-recently-used order.
However, experiments show that it is not any faster (and in
fact, is slightly slower due to the extra manipulation).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9a414486

18 3月, 2013 1 次提交

avoid segfaults on parse_object failure · 75a95490

由 Jeff King 提交于 3月 17, 2013

Many call-sites of parse_object assume that they will get a
non-NULL return value; this is not the case if we encounter
an error while parsing the object.

This patch adds a wrapper function around parse_object that
handles dying automatically, and uses it anywhere we
immediately try to access the return value as a non-NULL
pointer (i.e., anywhere that we would currently segfault).

This wrapper may also be useful in other places. The most
obvious one is code like:

  o = parse_object(sha1);
  if (!o)
	  die(...);

However, these should not be mechanically converted to
parse_object_or_die, as the die message is sometimes
customized. Later patches can address these sites on a
case-by-case basis.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

75a95490

01 5月, 2012 1 次提交

remove superfluous newlines in error messages · 82247e9b

由 Pete Wyckoff 提交于 4月 29, 2012

The error handling routines add a newline.  Remove
the duplicate ones in error messages.
Signed-off-by: NPete Wyckoff <pw@padd.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

82247e9b

30 3月, 2012 1 次提交

Teach revision walking machinery to walk multiple times sequencially · bcc0a3ea

由 Heiko Voigt 提交于 3月 29, 2012

Previously it was not possible to iterate revisions twice using the
revision walking api. We add a reset_revision_walk() which clears the
used flags. This allows us to do multiple sequencial revision walks.

We add the appropriate calls to the existing submodule machinery doing
revision walks. This is done to avoid surprises if future code wants to
call these functions more than once during the processes lifetime.
Signed-off-by: NHeiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

bcc0a3ea

08 3月, 2012 1 次提交

parse_object: avoid putting whole blob in core · 090ea126

由 Nguyễn Thái Ngọc Duy 提交于 3月 07, 2012

Traditionally, all the callers of check_sha1_signature() first
called read_sha1_file() to prepare the whole object data in core,
and called this function. The function is used to revalidate what
we read from the object database actually matches the object name we
used to ask for the data from the object database.

Update the API to allow callers to pass NULL as the object data, and
have the function read and hash the object data using streaming API
to recompute the object name, without having to hold everything in
core at the same time. This is most useful in parse_object() that
parses a blob object, because this caller does not have to keep the
actual blob data around in memory after a "struct blob" is returned.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

090ea126

06 1月, 2012 1 次提交

parse_object: try internal cache before reading object db · ccdc6037

由 Jeff King 提交于 1月 05, 2012

When parse_object is called, we do the following:

  1. read the object data into a buffer via read_sha1_file

  2. call parse_object_buffer, which then:

     a. calls the appropriate lookup_{commit,tree,blob,tag}
	to either create a new "struct object", or to find
	an existing one. We know the appropriate type from
	the lookup in step 1.

     b. calls the appropriate parse_{commit,tree,blob,tag}
        to parse the buffer for the new (or existing) object

In step 2b, all of the called functions are no-ops for
object "X" if "X->object.parsed" is set. I.e., when we have
already parsed an object, we end up going to a lot of work
just to find out at a low level that there is nothing left
for us to do (and we throw away the data from read_sha1_file
unread).

We can optimize this by moving the check for "do we have an
in-memory object" from 2a before the expensive call to
read_sha1_file in step 1.

This might seem circular, since step 2a uses the type
information determined in step 1 to call the appropriate
lookup function. However, we can notice that all of the
lookup_* functions are backed by lookup_object. In other
words, all of the objects are kept in a master hash table,
and we don't actually need the type to do the "do we have
it" part of the lookup, only to do the "and create it if it
doesn't exist" part.

This can save time whenever we call parse_object on the same
sha1 twice in a single program. Some code paths already
perform this optimization manually, with either:

  if (!obj->parsed)
	  obj = parse_object(obj->sha1);

if you already have a "struct object", or:

  struct object *obj = lookup_unknown_object(sha1);
  if (!obj || !obj->parsed)
	  obj = parse_object(sha1);

if you don't.  This patch moves the optimization into
parse_object itself.

Most git operations won't notice any impact. Either they
don't parse a lot of duplicate sha1s, or the calling code
takes special care not to re-parse objects. I timed two
code paths that do benefit (there may be more, but these two
were immediately obvious and easy to time).

The first is fast-export, which calls parse_object on each
object it outputs, like this:

  object = parse_object(sha1);
  if (!object)
	  die(...);
  if (object->flags & SHOWN)
	  return;

which means that just to realize we have already shown an
object, we will read the whole object from disk!

With this patch, my best-of-five time for "fast-export --all" on
git.git dropped from 26.3s to 21.3s.

The second case is upload-pack, which will call parse_object
for each advertised ref (because it needs to peel tags to
show "^{}" entries). This doesn't matter for most
repositories, because they don't have a lot of refs pointing
to the same objects. However, if you have a big alternates
repository with a shared object db for a number of child
repositories, then the alternates repository will have
duplicated refs representing each of its children.

For example, GitHub's alternates repository for git.git has
~120,000 refs, of which only ~3200 are unique. The time for
upload-pack to print its list of advertised refs dropped
from 3.4s to 0.76s.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ccdc6037

17 11月, 2011 1 次提交

receive-pack, fetch-pack: reject bogus pack that records objects twice · 68be2fea

由 Junio C Hamano 提交于 11月 16, 2011

When receive-pack & fetch-pack are run and store the pack obtained over
the wire to a local repository, they internally run the index-pack command
with the --strict option. Make sure that we reject incoming packfile that
records objects twice to avoid spreading such a damage.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

68be2fea

16 5月, 2011 1 次提交

read_sha1_file(): get rid of read_sha1_file_repl() madness · 4bbf5a26

由 Junio C Hamano 提交于 5月 15, 2011

Most callers want to silently get a replacement object, and they do not
care what the real name of the replacement object is. Worse yet, no sane
interface to return the underlying object without replacement is provided.

Remove the function and make only the few callers that want the name of
the replacement object find it themselves.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

4bbf5a26

06 9月, 2010 1 次提交

Fix whitespace issue in object.c · 55b4e9e4

由 Jared Hance 提交于 9月 05, 2010

Change some expanded tabs (spaces) to tabs in object.c.
Signed-off-by: NJared Hance <jaredhance@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

55b4e9e4

04 9月, 2010 1 次提交

parse_object: pass on the original sha1, not the replaced one · 2e3400c0

由 Nguyễn Thái Ngọc Duy 提交于 9月 03, 2010

Commit 0e87c367 (object: call "check_sha1_signature" with the
replacement sha1) changed the first argument passed to
parse_object_buffer() from "sha1" to "repl". With that change,
the returned obj pointer has the replacement SHA1 in obj->sha1,
not the original one.

But when using lookup_commit() and then parse_commit() on a
commit, we get an object pointer with the original sha1, but
the commit content comes from the replacement commit.

So the result we get from using parse_object() is different
from the we get from using lookup_commit() followed by
parse_commit().

It looks much simpler and safer to fix this inconsistency by
passing "sha1" to parse_object_bufer() instead of "repl".

The commit comment should be used to tell the the replacement
commit is replacing another commit and why. So it should be
easy to see that we have a replacement commit instead of an
original one.

And it is not a problem if the content of the commit is not
consistent with the sha1 as cat-file piped to hash-object can
be used to see the difference.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

2e3400c0

20 4月, 2010 1 次提交

fix "bundle --stdin" segfault · 97a20eea

由 Jonathan Nieder 提交于 4月 19, 2010

When passed an empty list, objects_array_remove_duplicates() corrupts it
by changing the number of entries from 0 to 1.

The problem lies in the condition of its main loop:

	for (ref = 0; ref < array->nr - 1; ref++) {

The loop body manipulates the supplied object array.  In the case of an
empty array, it should not be doing anything at all.  But array->nr is an
unsigned quantity, so the code enters the loop, in particular increasing
array->nr.  Fix this by comparing (ref + 1 < array->nr) instead.

This bug can be triggered by git bundle --stdin:

	$ echo HEAD | git bundle create some.bundle --stdin’
	Segmentation fault (core dumped)

The list of commits to bundle appears to be empty because of another bug:
by the time the revision-walking machinery gets to look at it, standard
input has already been consumed by rev-list, so this function gets an
empty list of revisions.

After this patch, git bundle --stdin still does not work; it just doesn’t
segfault any more.
Reported-by: NJoey Hess <joey@kitenet.net>
Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

97a20eea

18 1月, 2010 1 次提交

object.c: remove unused functions · c7618987

由 Junio C Hamano 提交于 1月 11, 2010

object_list_append() and object_list_length}() are not used anywhere.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c7618987

01 6月, 2009 1 次提交

object: call "check_sha1_signature" with the replacement sha1 · 0e87c367

由 Christian Couder 提交于 1月 23, 2009

Otherwise we get a "sha1 mismatch" error for replaced objects.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

0e87c367

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致