提交 · 7c07385d902f6d8c177d533dc2faa36ef4a52a66 · 李少辉-开发者 / git

08 7月, 2013 1 次提交

zero-initialize object_info structs · 7c07385d

由 Jeff King 提交于 7月 07, 2013

The sha1_object_info_extended function expects the caller to
provide a "struct object_info" which contains pointers to
"query" items that will be filled in. The purpose of
providing pointers rather than storing the response directly
in the struct is so that callers can choose not to incur the
expense in finding particular fields that they do not care
about.

Right now the only query item is "sizep", and all callers
set it explicitly to choose whether or not to query it; they
can then leave the rest of the struct uninitialized.

However, as we add new query items, each caller will have to
be updated to explicitly turn off the new ones (by setting
them to NULL).  Instead, let's teach each caller to
zero-initialize the struct, so that they do not have to
learn about each new query item added.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

7c07385d

01 5月, 2013 1 次提交

unpack_entry: avoid freeing objects in base cache · 756a0426

由 Thomas Rast 提交于 4月 30, 2013

In the !delta_data error path of unpack_entry(), we run free(base).
This became a window for use-after-free() in abe601bb (sha1_file:
remove recursion in unpack_entry, 2013-03-27), as follows:

Before abe601bb, we got the 'base' from cache_or_unpack_entry(..., 0);
keep_cache=0 tells it to also remove that entry.  So the 'base' is at
this point not cached, and freeing it in the error path is the right
thing.

After abe601bb, the structure changed: we use a three-phase approach
where phase 1 finds the innermost base or a base that is already in
the cache.  In phase 3 we therefore know that all bases we unpack are
not part of the delta cache yet.  (Observe that we pop from the cache
in phase 1, so this is also true for the very first base.)  So we make
no further attempts to look up the bases in the cache, and just call
add_delta_base_cache() on every base object we have assembled.

But the !delta_data error path remained unchanged, and now calls
free() on a base that has already been entered in the cache.  This
means that there is a use-after-free if we later use the same base
again.

So remove that free(); we are still going to use that data.
Reported-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NThomas Rast <trast@inf.ethz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

756a0426

28 3月, 2013 3 次提交

check_sha1_signature: check return value from read_istream · f54fac53

由 Jeff King 提交于 3月 25, 2013

It's possible for read_istream to return an error, in which
case we just end up in an infinite loop (aside from EOF, we
do not even look at the result, but just feed it straight
into our running hash).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f54fac53

sha1_file: remove recursion in unpack_entry · abe601bb

由 Thomas Rast 提交于 3月 27, 2013

Similar to the recursion in packed_object_info(), this leads to
problems on stack-space-constrained systems in the presence of long
delta chains.

We proceed in three phases:

1. Dig through the delta chain, saving each delta object's offsets and
   size on an ad-hoc stack.

2. Unpack the base object at the bottom.

3. Unpack and apply the deltas from the stack.
Signed-off-by: NThomas Rast <trast@student.ethz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

abe601bb

Refactor parts of in_delta_base_cache/cache_or_unpack_entry · 84dd81c1

由 Thomas Rast 提交于 3月 27, 2013

The delta base cache lookup and test were shared.  Refactor them;
we'll need both parts again.  Also, we'll use the clearing routine
later.
Signed-off-by: NThomas Rast <trast@student.ethz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

84dd81c1

27 3月, 2013 1 次提交

safe_create_leading_directories: fix race that could give a false negative · 928734d9

由 Steven Walter 提交于 3月 17, 2013

If two processes are racing to create the same directory tree, they
will both see that the directory doesn't exist, both try to mkdir(),
and one of them will fail. This is okay, as we only care that the
directory gets created. So, we add a check for EEXIST from mkdir,
and continue when the directory exists, taking the same codepath as
the case where the earlier stat() succeeds and finds a directory.
Signed-off-by: NSteven Walter <stevenrwalter@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

928734d9

26 3月, 2013 1 次提交

sha1_file: remove recursion in packed_object_info · 790d96c0

由 Thomas Rast 提交于 3月 25, 2013

packed_object_info() and packed_delta_info() were mutually recursive.
The former would handle ordinary types and defer deltas to the latter;
the latter would use the former to resolve the delta base.

This arrangement, however, leads to trouble with threaded index-pack
and long delta chains on platforms where thread stacks are small, as
happened on OS X (512kB thread stacks by default) with the chromium
repo.

The task of the two functions is not all that hard to describe without
any recursion, however.  It proceeds in three steps:

- determine the representation type and size, based on the outermost
  object (delta or not)

- follow through the delta chain, if any

- determine the object type from what is found at the end of the delta
  chain

The only complication stems from the error recovery.  If parsing fails
at any step, we want to mark that object (within the pack) as bad and
try getting the corresponding SHA1 from elsewhere.  If that also
fails, we want to repeat this process back up the delta chain until we
find a reasonable solution or conclude that there is no way to
reconstruct the object.  (This is conveniently checked by t5303.)

To achieve that within the pack, we keep track of the entire delta
chain in a stack.  When things go sour, we process that stack from the
top, marking entries as bad and attempting to re-resolve by sha1.  To
avoid excessive malloc(), the stack starts out with a small
stack-allocated array.  The choice of 64 is based on the default of
pack.depth, which is 50, in the hope that it covers "most" delta
chains without any need for malloc().

It's much harder to make the actual re-resolving by sha1 nonrecursive,
so we skip that.  If you can't afford *that* recursion, your
corruption problems are more serious than your stack size problems.
Reported-by: NStefan Zager <szager@google.com>
Signed-off-by: NThomas Rast <trast@student.ethz.ch>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

790d96c0

16 2月, 2013 1 次提交

count-objects: report garbage files in pack directory too · 543c5caa

由 Nguyễn Thái Ngọc Duy 提交于 2月 15, 2013

prepare_packed_git_one() is modified to allow count-objects to hook a
report function to so we don't need to duplicate the pack searching
logic in count-objects.c. When report_pack_garbage is NULL, the
overhead is insignificant.

The garbage is reported with warning() instead of error() in packed
garbage case because it's not an error to have garbage. Loose garbage
is still reported as errors and will be converted to warnings later.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

543c5caa

13 2月, 2013 1 次提交

sha1_file: reorder code in prepare_packed_git_one() · d90906a9

由 Nguyễn Thái Ngọc Duy 提交于 2月 13, 2013

The current loop does

	while (...) {
		if (it is not an .idx file)
			continue;
		process .idx file;
	}

and is reordered to

	while (...) {
		if (it is an .idx file) {
			process .idx file;
		}
	}

This makes it easier to add new extension file processing.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d90906a9

09 11月, 2012 2 次提交

link_alt_odb_entries(): take (char *, len) rather than two pointers · c5950164

由 Michael Haggerty 提交于 11月 05, 2012

Change link_alt_odb_entries() to take the length of the "alt"
parameter rather than a pointer to the end of the "alt" string.  This
is the more common calling convention and simplifies the code a tiny
bit.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJeff King <peff@peff.net>

c5950164

link_alt_odb_entries(): use string_list_split_in_place() · 6eac50d8

由 Michael Haggerty 提交于 11月 05, 2012

Change link_alt_odb_entry() to take a NUL-terminated string instead of
(char *, len).  Use string_list_split_in_place() rather than inline
code in link_alt_odb_entries().

This approach saves some code and also avoids the (probably harmless)
error of passing a non-NUL-terminated string to is_absolute_path().
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJeff King <peff@peff.net>

6eac50d8

25 8月, 2012 1 次提交

sha1_file.c: introduce get_max_fd_limit() helper · a0788266

由 Joachim Schmitz 提交于 8月 24, 2012

Not all platforms have getrlimit(), but there are other ways to see
the maximum number of files that a process can have open.  If
getrlimit() is unavailable, fall back to sysconf(_SC_OPEN_MAX) if
available, and use OPEN_MAX from <limits.h>.
Signed-off-by: NJoachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

a0788266

30 7月, 2012 1 次提交

link_alt_odb_entry: fix read over array bounds reported by valgrind · cb2912c3

由 Heiko Voigt 提交于 7月 28, 2012

pfxlen can be longer than the path in objdir when relative_base
contains the path to gits object directory.  Here we are interested
in checking if ent->base[] (the part that corresponds to .git/objects)
is the same string as objdir, and the code NUL-terminated ent->base[]
to

	LEADING PATH\0XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\0

in preparation for these "duplicate check" step (before we return
from the function, the first NUL is turned into '/' so that we can
fill XX when probing for loose objects).  All we need to do is to
compare the string with the path to our object directory.
Signed-off-by: NHeiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

cb2912c3

15 5月, 2012 1 次提交

teach add_submodule_odb() to look for alternates · 5e73633d

由 Heiko Voigt 提交于 5月 14, 2012

Since we allow to link other object databases when loading a submodules
database we should also load possible alternates.
Signed-off-by: NHeiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5e73633d

01 5月, 2012 2 次提交

remove blank filename in error message · 5eaeda70

由 Pete Wyckoff 提交于 4月 29, 2012

When write_loose_object() finds that it is unable to
create a temporary file, it complains, for instance:

    unable to create temporary sha1 filename : Too many open files

That extra space was supposed to be the name of the file,
and will be an empty string if the git_mkstemps_mode() fails.

The name of the temporary file is unimportant; delete it.
Signed-off-by: NPete Wyckoff <pw@padd.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5eaeda70

remove superfluous newlines in error messages · 82247e9b

由 Pete Wyckoff 提交于 4月 29, 2012

The error handling routines add a newline.  Remove
the duplicate ones in error messages.
Signed-off-by: NPete Wyckoff <pw@padd.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

82247e9b

08 3月, 2012 1 次提交

parse_object: avoid putting whole blob in core · 090ea126

由 Nguyễn Thái Ngọc Duy 提交于 3月 07, 2012

Traditionally, all the callers of check_sha1_signature() first
called read_sha1_file() to prepare the whole object data in core,
and called this function. The function is used to revalidate what
we read from the object database actually matches the object name we
used to ask for the data from the object database.

Update the API to allow callers to pass NULL as the object data, and
have the function read and hash the object data using streaming API
to recompute the object name, without having to hold everything in
core at the same time. This is most useful in parse_object() that
parses a blob object, because this caller does not have to keep the
actual blob data around in memory after a "struct blob" is returned.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

090ea126

25 2月, 2012 1 次提交

do not stream large files to pack when filters are in use · 4f22b101

由 Jeff King 提交于 2月 24, 2012

Because git's object format requires us to specify the
number of bytes in the object in its header, we must know
the size before streaming a blob into the object database.
This is not a problem when adding a regular file, as we can
get the size from stat(). However, when filters are in use
(such as autocrlf, or the ident, filter, or eol
gitattributes), we have no idea what the ultimate size will
be.

The current code just punts on the whole issue and ignores
filter configuration entirely for files larger than
core.bigfilethreshold. This can generate confusing results
if you use filters for large binary files, as the filter
will suddenly stop working as the file goes over a certain
size.  Rather than try to handle unknown input sizes with
streaming, this patch just turns off the streaming
optimization when filters are in use.

This has a slight performance regression in a very specific
case: if you have autocrlf on, but no gitattributes, a large
binary file will avoid the streaming code path because we
don't know beforehand whether it will need conversion or
not. But if you are handling large binary files, you should
be marking them as such via attributes (or at least not
using autocrlf, and instead marking your text files as
such). And the flip side is that if you have a large
_non_-binary file, there is a correctness improvement;
before we did not apply the conversion at all.

The first half of the new t1051 script covers these failures
on input. The second half tests the matching output code
paths. These already work correctly, and do not need any
adjustment.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

4f22b101

07 2月, 2012 1 次提交

fsck: give accurate error message on empty loose object files · 33e42de0

由 Matthieu Moy 提交于 2月 06, 2012

Since 3ba7a065 (A loose object is not corrupt if it
cannot be read due to EMFILE), "git fsck" on a repository with an empty
loose object file complains with the error message

  fatal: failed to read object <sha1>: Invalid argument

This comes from a failure of mmap on this empty file, which sets errno to
EINVAL. Instead of calling xmmap on empty file, we display a clean error
message ourselves, and return a NULL pointer. The new message is

  error: object file .git/objects/09/<rest-of-sha1> is empty
  fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt

The second line was already there before the regression in 3ba7a065,
and the first is an additional message, that should help diagnosing the
problem for the user.
Signed-off-by: NMatthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

33e42de0

02 2月, 2012 2 次提交

find_pack_entry(): do not keep packed_git pointer locally · c01f51cc

由 Nguyễn Thái Ngọc Duy 提交于 2月 01, 2012

Commit f7c22cc6 (always start looking up objects in the last used pack
first - 2007-05-30) introduce a static packed_git* pointer as an
optimization.  The kept pointer however may become invalid if
free_pack_by_name() happens to free that particular pack.

Current code base does not access packs after calling
free_pack_by_name() so it should not be a problem. Anyway, move the
pointer out so that free_pack_by_name() can reset it to avoid running
into troubles in future.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c01f51cc

sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry() · 95099731

由 Nguyễn Thái Ngọc Duy 提交于 2月 01, 2012

The new helper function implements the logic to find the offset for the
object in one pack and fill a pack_entry structure. The next patch will
restructure the loop and will call the helper from two places.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

95099731

22 12月, 2011 1 次提交

Appease Sun Studio by renaming "tmpfile" · ab1900a3

由 Ævar Arnfjörð Bjarmason 提交于 12月 21, 2011

On Solaris the system headers define the "tmpfile" name, which'll
cause Git compiled with Sun Studio 12 Update 1 to whine about us
redefining the name:

"pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)
"sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)
"fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)
"builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)

Just renaming the "tmpfile" variable to "tmp_file" in the relevant
places is the easiest way to fix this.
Signed-off-by: NÆvar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ab1900a3

02 12月, 2011 1 次提交

bulk-checkin: replace fast-import based implementation · 568508e7

由 Junio C Hamano 提交于 10月 28, 2011

This extends the earlier approach to stream a large file directly from the
filesystem to its own packfile, and allows "git add" to send large files
directly into a single pack. Older code used to spawn fast-import, but the
new bulk-checkin API replaces it.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

568508e7

16 11月, 2011 1 次提交

sha1_file: don't mix enum with int · 5e12e78e

由 Ramkumar Ramachandra 提交于 11月 15, 2011

Signed-off-by: NRamkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5e12e78e

28 10月, 2011 1 次提交

unpack_object_header_buffer(): clear the size field upon error · ea4f9685

由 Junio C Hamano 提交于 10月 27, 2011

The callers do not use the returned size when the function says
it did not use any bytes and sets the type to OBJ_BAD, so this
should not matter in practice, but it is a good code hygiene
anyway.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ea4f9685

15 10月, 2011 2 次提交

downgrade "packfile cannot be accessed" errors to warnings · 58a6a9cc

由 Jeff King 提交于 10月 14, 2011

These can happen if another process simultaneously prunes a
pack. But that is not usually an error condition, because a
properly-running prune should have repacked the object into
a new pack. So we will notice that the pack has disappeared
unexpectedly, print a message, try other packs (possibly
after re-scanning the list of packs), and find it in the new
pack.
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

58a6a9cc

pack-objects: protect against disappearing packs · 4c080182

由 Jeff King 提交于 10月 14, 2011

It's possible that while pack-objects is running, a
simultaneously running prune process might delete a pack
that we are interested in. Because we load the pack indices
early on, we know that the pack contains our item, but by
the time we try to open and map it, it is gone.

Since c715f783, we already protect against this in the normal
object access code path, but pack-objects accesses the packs
at a lower level.  In the normal access path, we call
find_pack_entry, which will call find_pack_entry_one on each
pack index, which does the actual lookup. If it gets a hit,
we will actually open and verify the validity of the
matching packfile (using c715f783's is_pack_valid). If we
can't open it, we'll issue a warning and pretend that we
didn't find it, causing us to go on to the next pack (or on
to loose objects).

Furthermore, we will cache the descriptor to the opened
packfile. Which means that later, when we actually try to
access the object, we are likely to still have that packfile
opened, and won't care if it has been unlinked from the
filesystem.

Notice the "likely" above. If there is another pack access
in the interim, and we run out of descriptors, we could
close the pack. And then a later attempt to access the
closed pack could fail (we'll try to re-open it, of course,
but it may have been deleted). In practice, this doesn't
happen because we tend to look up items and then access them
immediately.

Pack-objects does not follow this code path. Instead, it
accesses the packs at a much lower level, using
find_pack_entry_one directly. This means we skip the
is_pack_valid check, and may end up with the name of a
packfile, but no open descriptor.

We can add the same is_pack_valid check here. Unfortunately,
the access patterns of pack-objects are not quite as nice
for keeping lookup and object access together. We look up
each object as we find out about it, and the only later when
writing the packfile do we necessarily access it. Which
means that the opened packfile may be closed in the interim.

In practice, however, adding this check still has value, for
three reasons.

  1. If you have a reasonable number of packs and/or a
     reasonable file descriptor limit, you can keep all of
     your packs open simultaneously. If this is the case,
     then the race is impossible to trigger.

  2. Even if you can't keep all packs open at once, you
     may end up keeping the deleted one open (i.e., you may
     get lucky).

  3. The race window is shortened. You may notice early that
     the pack is gone, and not try to access it. Triggering
     the problem without this check means deleting the pack
     any time after we read the list of index files, but
     before we access the looked-up objects.  Triggering it
     with this check means deleting the pack means deleting
     the pack after we do a lookup (and successfully access
     the packfile), but before we access the object. Which
     is a smaller window.
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

4c080182

08 9月, 2011 1 次提交

sha1_file: normalize alt_odb path before comparing and storing · 5bdf0a84

由 Hui Wang 提交于 9月 07, 2011

When it needs to compare and add an alt object path to the
alt_odb_list, we normalize this path first since comparing normalized
path is easy to get correct result.

Use strbuf to replace some string operations, since it is cleaner and
safer.
Helped-by: NJunio C Hamano <gitster@pobox.com>
Signed-off-by: NHui Wang <Hui.Wang@windriver.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5bdf0a84

24 8月, 2011 1 次提交

clone: clone from a repository with relative alternates · e6baf4a1

由 Junio C Hamano 提交于 8月 22, 2011

Cloning from a local repository blindly copies or hardlinks all the files
under objects/ hierarchy. This results in two issues:

 - If the repository cloned has an "objects/info/alternates" file, and the
   command line of clone specifies --reference, the ones specified on the
   command line get overwritten by the copy from the original repository.

 - An entry in a "objects/info/alternates" file can specify the object
   stores it borrows objects from as a path relative to the "objects/"
   directory. When cloning a repository with such an alternates file, if
   the new repository is not sitting next to the original repository, such
   relative paths needs to be adjusted so that they can be used in the new
   repository.

This updates add_to_alternates_file() to take the path to the alternate
object store, including the "/objects" part at the end (earlier, it was
taking the path to $GIT_DIR and was adding "/objects" itself), as it is
technically possible to specify in objects/info/alternates file the path
of a directory whose name does not end with "/objects".
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

e6baf4a1

12 8月, 2011 1 次提交

Tolerate zlib deflation with window size < 32Kb · 7f684a2a

由 Roberto Tyley 提交于 8月 07, 2011

Git currently reports loose objects as 'corrupt' if they've been
deflated using a window size less than 32Kb, because the
experimental_loose_object() function doesn't recognise the header
byte as a zlib header. This patch makes the function tolerant of
all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice
it's accuracy in distingushing the standard loose-object format
from the experimental (now abandoned) format.

On memory constrained systems zlib may use a much smaller window
size - working on Agit, I found that Android uses a 4KB window;
giving a header byte of 0x48, not 0x78. Consequently all loose
objects generated appear 'corrupt', which is why Agit is a read-only
Git client at this time - I don't want my client to generate Git
repos that other clients treat as broken :(

This patch makes Git tolerant of different deflate settings - it
might appear that it changes experimental_loose_object() to the point
where it could incorrectly identify the experimental format as the
standard one, but the two criteria (bitmask & checksum) can only
give a false result for an experimental object where both of the
following are true:

1) object size is exactly 8 bytes when uncompressed (bitmask)
2) [single-byte in-pack git type&size header] * 256
   + [1st byte of the following zlib header] % 31 = 0 (checksum)

As it happens, for all possible combinations of valid object type
(1-4) and window bits (0-7), the only time when the checksum will be
divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which,
due the fields all Commit objects must contain, could never be as
small as 8 bytes in size.

Given this, the combination of the two criteria (bitmask & checksum)
always correctly determines the buffer format, and is more tolerant
than the previous version.

The alternative to this patch is simply removing support for the
experimental format, which I am also totally cool with.

References:

Android uses a 4KB window for deflation:
http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53

Code snippet searching for false positives with the zlib checksum:
https://gist.github.com/1118177Signed-off-by: NRoberto Tyley <roberto.tyley@guardian.co.uk>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

7f684a2a

07 7月, 2011 1 次提交

core: log offset pack data accesses happened · 5f44324d

由 Junio C Hamano 提交于 7月 06, 2011

In a workload other than "git log" (without pathspec nor any option that
causes us to inspect trees and blobs), the recency pack order is said to
cause the access jump around quite a bit. Add a hook to allow us observe
how bad it is.

"git config core.logpackaccess /var/tmp/pal.txt" will give you the log
in the specified file.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5f44324d

11 6月, 2011 2 次提交

zlib: zlib can only process 4GB at a time · ef49a7a0

由 Junio C Hamano 提交于 6月 10, 2011

The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ef49a7a0

zlib: wrap deflate side of the API · 55bb5c91

由 Junio C Hamano 提交于 6月 10, 2011

Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

55bb5c91

09 6月, 2011 1 次提交

sha1_file.c: "legacy" is really the current format · cc5c54e7

由 Junio C Hamano 提交于 6月 08, 2011

Every time I look at the read-loose-object codepath, legacy_loose_object()
function makes my brain go through mental contortion. When we were playing
with the experimental loose object format, it may have made sense to call
the traditional format "legacy", in the hope that the experimental one
will some day replace it to become official, but it never happened.

This renames the function (and negates its return value) to detect if we
are looking at the experimental format, and move the code around in its
caller which used to do "if we are looing at legacy, do this special case,
otherwise the normal case is this". The codepath to read from the loose
objects in experimental format is the "unlikely" case.

Someday after Git 2.0, we should drop the support of this format.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

cc5c54e7

06 6月, 2011 1 次提交

verify-pack: use index-pack --verify · 3de89c9d

由 Junio C Hamano 提交于 6月 03, 2011

This finally gets rid of the inefficient verify-pack implementation that
walks objects in the packfile in their object name order and replaces it
with a call to index-pack --verify. As a side effect, it also removes
packed_object_info_detail() API which is rather expensive.

As this changes the way errors are reported (verify-pack used to rely on
the usual runtime error detection routine unpack_entry() to diagnose the
CRC errors in an entry in the *.idx file; index-pack --verify checks the
whole *.idx file in one go), update a test that expected the string "CRC"
to appear in the error message.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

3de89c9d

27 5月, 2011 1 次提交

sha1_file: use the correct type (ssize_t, not size_t) for read-style function · 23c7df6b

由 Jim Meyering 提交于 5月 26, 2011

Using an unsigned type, we would fail to detect a read error and then
proceed to try to write (size_t)-1 bytes.
Signed-off-by: NJim Meyering <meyering@redhat.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

23c7df6b

21 5月, 2011 3 次提交

sha1_file.c: expose helpers to read loose objects · f0270efd

由 Junio C Hamano 提交于 5月 14, 2011

Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header()
available to the streaming read API by exporting them via cache.h header
file.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f0270efd

unpack_object_header(): make it public · f8c8abc5

由 Junio C Hamano 提交于 5月 13, 2011

This function is used to read and skip over the per-object header
in a packfile.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f8c8abc5

sha1_object_info_extended(): hint about objects in delta-base cache · 5266d369

由 Junio C Hamano 提交于 5月 13, 2011

An object found in the delta-base cache is not guaranteed to
stay there, but we know it came from a pack and it is likely
to give us a quick access if we read_sha1_file() it right now,
which is a piece of useful information.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5266d369

20 5月, 2011 1 次提交

sha1_object_info_extended(): expose a bit more info · 9a490590

由 Junio C Hamano 提交于 5月 12, 2011

The original interface for sha1_object_info() takes an object name and
gives back a type and its size (the latter is given only when it was
asked).  The new interface wraps its implementation and exposes a bit
more pieces of information that the interface used to discard, namely:

 - where the object is stored (loose? cached? packed?)
 - if packed, where in which packfile?
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
---

 * In the earlier round, this used u.pack.delta to record the length of
   the delta chain, but the caller is not necessarily interested in the
   length of the delta chain per-se, but may only want to know if it is a
   delta against another object or is stored as a deflated data. Calling
   packed_object_info_detail() involves walking the reverse index chain to
   compute the store size of the object and is unnecessarily expensive.

   We could resurrect the code if a new caller wants to know, but I doubt
   it.

9a490590

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致