提交 · 03edb0a753fbdbfd14ae42a26ffd1e7608919c45 · 李少辉-开发者 / git

06 11月, 2007 2 次提交

remove dead code from the csum-file interface · ec640ed1

由 Nicolas Pitre 提交于 11月 04, 2007

The provided name argument is always constant and valid in every
caller's context, so no need to have an array of PATH_MAX chars to copy
it into when a simple pointer will do.  Unfortunately that means getting
rid of wascally wabbits too.

The 'error' field is also unused.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ec640ed1

make display of total transferred more accurate · 218558af

由 Nicolas Pitre 提交于 11月 04, 2007

The throughput display needs a delay period before accounting and
displaying anything.  Yet it might be called after some amount of data
has already been transferred.  The display of total data is therefore
accounted late and therefore smaller than the reality.

Let's call display_throughput() with an absolute amount of transferred
data instead of a relative number, and let the throughput code find the
relative amount of data by itself as needed.  This way the displayed
total is always exact.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

218558af

31 10月, 2007 1 次提交

add throughput display to git-push · 2a128d63

由 Nicolas Pitre 提交于 10月 30, 2007

This one triggers only when git-pack-objects is called with
--all-progress and --stdout which is the combination used by
git-push.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

2a128d63

13 6月, 2007 1 次提交

More static · 4175e9e3

由 Junio C Hamano 提交于 6月 13, 2007

There still are quite a few symbols that ought to be static.
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

4175e9e3

11 5月, 2007 1 次提交

Custom compression levels for objects and packs · 960ccca6

由 Dana How 提交于 5月 09, 2007

Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.

Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION.  This is the "pack compression level".

Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules,  or if the loose
object was written while core.legacyheaders = true.  Newly
deltified loose objects are always compressed to the current
pack compression level.

Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes,  since the previous pack data
cannot be reused.

In either case,  the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level,  instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.

This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: NDana L. How <danahow@gmail.com>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

960ccca6

11 4月, 2007 1 次提交

compute a CRC32 for each object as stored in a pack · 78d1e84f

由 Nicolas Pitre 提交于 4月 09, 2007

The most important optimization for performance when repacking is the
ability to reuse data from a previous pack as is and bypass any delta
or even SHA1 computation by simply copying the raw data from one pack
to another directly.

The problem with this is that any data corruption within a copied object
would go unnoticed and the new (repacked) pack would be self-consistent
with its own checksum despite containing a corrupted object. This is a
real issue that already happened at least once in the past.

In some attempt to prevent this, we validate the copied data by inflating
it and making sure no error is signaled by zlib. But this is still not
perfect as a significant portion of a pack content is made of object
headers and references to delta base objects which are not deflated and
therefore not validated when repacking actually making the pack data reuse
still not as safe as it could be.

Of course a full SHA1 validation could be performed, but that implies
full data inflating and delta replaying which is extremely costly, which
cost the data reuse optimization was designed to avoid in the first place.

So the best solution to this is simply to store a CRC32 of the raw pack
data for each object in the pack index. This way any object in a pack can
be validated before being copied as is in another pack, including header
and any other non deflated data.

Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia:

Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
short messages. He wrote "Briefly, the problem is that, for very short
packets, Adler32 is guaranteed to give poor coverage of the available
bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
that sum A does not wrap for short messages. The maximum value of A for
a 128-byte message is 32640, which is below the value 65521 used by the
modulo operation. An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead of Adler-32 for SCTP, the
Stream Control Transmission Protocol.

In the context of a GIT pack, we have lots of small objects, especially
deltas, which are likely to be quite small and in a size range for which
Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the
possibility for recovery from certain types of small corruptions like
single bit errors which are the most probable type of corruptions.

OK what this patch does is to compute the CRC32 of each object written to
a pack within pack-objects. It is not written to the index yet and it is
obviously not validated when reusing pack data yet either.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

78d1e84f

10 8月, 2005 1 次提交

[PATCH] -Werror fixes · 4ec99bf0

由 Timo Sirainen 提交于 8月 09, 2005

GCC's format __attribute__ is good for checking errors, especially
with -Wformat=2 parameter. This fixes most of the reported problems
against 2005-08-09 snapshot.

4ec99bf0

29 6月, 2005 1 次提交
- L
  csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file descriptor · 4397f014
  由 Linus Torvalds 提交于 6月 28, 2005
```
We'll use this soon to write pack-files to stdout.
```
  4397f014
27 6月, 2005 2 次提交

csum-file interface updates: return resulting SHA1 · e1808845

由 Linus Torvalds 提交于 6月 26, 2005

Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).

This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1). That way you can
validate the pairing of the two if you want to.

e1808845

git-pack-objects: write the pack files with a SHA1 csum · c38138cd

由 Linus Torvalds 提交于 6月 26, 2005

We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing.  The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.

Update unpack-objects to know about the extra 20 bytes at the end
of the index.

c38138cd

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致