提交 · f8bb1d94311b7438e7e9831b51ffb047b7ae65d7 · 李少辉-开发者 / git

16 7月, 2014 1 次提交

sha1_file: do not add own object directory as alternate · 539e7506

由 Ephrim Khong 提交于 7月 15, 2014

When adding alternate object directories, we try not to add the
directory of the current repository to avoid cycles.  Unfortunately,
that test was broken, since it compared an absolute with a relative
path.
Signed-off-by: NEphrim Khong <dr.khong@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

539e7506

14 7月, 2014 1 次提交

sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API · 67dc598e

由 Karsten Blees 提交于 7月 12, 2014

This changes GIT_TRACE_PACK_ACCESS functionality as follows:
 * supports the same options as GIT_TRACE (e.g. printing to stderr)
 * no longer supports relative paths
 * appends to the trace file rather than overwriting
Signed-off-by: NKarsten Blees <blees@dcon.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

67dc598e

02 7月, 2014 1 次提交

sha1_file: avoid overrunning alternate object base string · 80b47854

由 René Scharfe 提交于 7月 01, 2014

While checking if a new alternate object database is a duplicate make
sure that old and new base paths have the same length before comparing
them with memcmp.  This avoids overrunning the buffer of the existing
entry if the new one is longer and it stops rejecting foobar/ after
foo/ was already added.
Signed-off-by: NRene Scharfe <ls.r@web.de>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

80b47854

01 7月, 2014 3 次提交

prepare_packed_git_one: refactor duplicate-pack check · 47bf4b0f

由 Jeff King 提交于 6月 30, 2014

When we are reloading the list of packs, we check whether a
particular pack has been loaded. This is slightly tricky,
because we load packs based on the presence of their ".idx"
files, but record the name of the matching ".pack" file.
Therefore we want to compare their bases.

The existing code stripped off ".idx" from a file we found,
then compared that whole base length to strings containing
the ".pack" version. This meant we could end up comparing
bytes past what the ".pack" string contained, if the ".idx"
file name was much longer.

In practice, it worked OK because memcmp would end up seeing
a difference in the two strings and would return early
before hitting the full length. However, memcmp may
sometimes read extra bytes past a difference (e.g., because
it is comparing 64-bit words), or is even free to compare in
reverse order.

Furthermore, our memcmp made no guarantees that we matched
the whole pack name, up to ".pack". So "foo.idx" would match
"foo-bar.pack", which is wrong (but does not typically
happen, because our pack names have a fixed size).

We can fix both issues, avoid magic numbers, and document
that we expect to compare against a string with ".pack" by
using strip_suffix.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

47bf4b0f

replace has_extension with ends_with · 2975c770

由 Jeff King 提交于 6月 30, 2014

These two are almost the same function, with the exception
that has_extension only matches if there is content before
the suffix. So ends_with(".exe", ".exe") is true, but
has_extension would not be.

This distinction does not matter to any of the callers,
though, and we can just replace uses of has_extension with
ends_with. We prefer the "ends_with" name because it is more
generic, and there is nothing about the function that
requires it to be used for file extensions.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

2975c770

sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one() · 880fb8de

由 René Scharfe 提交于 6月 30, 2014

Instead of using strbuf to create a message string in case a path is
too long for our fixed-size buffer, replace that buffer with a strbuf
and thus get rid of the limitation.
Helped-by: NDuy Nguyen <pclouds@gmail.com>
Signed-off-by: NRene Scharfe <l.s.r@web.de>
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

880fb8de

16 5月, 2014 1 次提交

open_sha1_file: report "most interesting" errno · d6c8a05b

由 Jeff King 提交于 5月 15, 2014

When we try to open a loose object file, we first attempt to
open in the local object database, and then try any
alternates. This means that the errno value when we return
will be from the last place we looked (and due to the way
the code is structured, simply ENOENT if we do not have have
any alternates).

This can cause confusing error messages, as read_sha1_file
checks for ENOENT when reporting a missing object. If errno
is something else, we report that. If it is ENOENT, but
has_loose_object reports that we have it, then we claim the
object is corrupted. For example:

    $ chmod 0 .git/objects/??/*
    $ git rev-list --all
    fatal: loose object b2d6fab18b92d49eac46dc3c5a0bcafabda20131 (stored in .git/objects/b2/d6fab18b92d49eac46dc3c5a0bcafabda20131) is corrupt

This patch instead keeps track of the "most interesting"
errno we receive during our search. We consider ENOENT to be
the least interesting of all, and otherwise report the first
error found (so problems in the object database take
precedence over ones in alternates). Here it is with this
patch:

    $ git rev-list --all
    fatal: failed to read object b2d6fab18b92d49eac46dc3c5a0bcafabda20131: Permission denied
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d6c8a05b

01 4月, 2014 2 次提交

code and test: fix misuses of "nor" · 235e8d59

由 Justin Lebar 提交于 3月 31, 2014

Signed-off-by: NJustin Lebar <jlebar@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

235e8d59

comments: fix misuses of "nor" · 01689909

由 Justin Lebar 提交于 3月 31, 2014

Signed-off-by: NJustin Lebar <jlebar@google.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

01689909

04 3月, 2014 1 次提交

sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() · c7353967

由 Dmitry S. Dolzhenko 提交于 3月 04, 2014

Helped-by: NHe Sun <sunheehnus@gmail.com>
Signed-off-by: NDmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

c7353967

25 2月, 2014 4 次提交

sha1_file.c: document a bunch of functions defined in the file · d40d535b

由 Michael Haggerty 提交于 2月 21, 2014

Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d40d535b

sha1_file_name(): declare to return a const string · 30d6c6ea

由 Michael Haggerty 提交于 2月 21, 2014

Change the return value of sha1_file_name() to (const char *).
(Callers have no business mucking about here.)  Change callers
accordingly, deleting a few superfluous temporary variables along the
way.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

30d6c6ea

find_pack_entry(): document last_found_pack · 1b1005d1

由 Michael Haggerty 提交于 2月 21, 2014

Add a comment at the declaration of last_found_pack and where it is
used in find_pack_entry().  In the latter, separate the cases (1) to
make a place for the new comment and (2) to turn the success case into
affirmative logic.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1b1005d1

sha1_file: fix delta_stack memory leak in unpack_entry · 019d1e65

由 Nguyễn Thái Ngọc Duy 提交于 2月 21, 2014

This delta_stack array can grow to any length depending on the actual
delta chain, but we forget to free it. Normally it does not matter
because we use small_delta_stack[] from stack and small_delta_stack
can hold 64-delta chains, more than standard --depth=50 in pack-objects.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

019d1e65

23 1月, 2014 1 次提交

safe_create_leading_directories(): on Windows, \ can separate path components · 0f527403

由 Michael Haggerty 提交于 1月 19, 2014

When cloning to a directory "C:\foo\bar" from Windows' cmd.exe where
"foo" does not exist yet, Git would throw an error like

    fatal: could not create work tree dir 'c:\foo\bar'.: No such file or directory

Fix this by not hard-coding a platform specific directory separator
into safe_create_leading_directories().

This patch, including its entire commit message, is derived from a
patch by Sebastian Schuberth.
Signed-off-by: NJohannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: NSebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

0f527403

17 1月, 2014 1 次提交

do not discard revindex when re-preparing packfiles · 1a6d8b91

由 Jeff King 提交于 1月 15, 2014

When an object lookup fails, we re-read the objects/pack
directory to pick up any new packfiles that may have been
created since our last read. We also discard any pack
revindex structs we've allocated.

The discarding is a problem for the pack-bitmap code, which keeps
a pointer to the revindex for the bitmapped pack. After the
discard, the pointer is invalid, and we may read free()d
memory.

Other revindex users do not keep a bare pointer to the
revindex; instead, they always access it through
revindex_for_pack(), which lazily builds the revindex. So
one solution is to teach the pack-bitmap code a similar
trick. It would be slightly less efficient, but probably not
all that noticeable.

However, it turns out this discarding is not actually
necessary. When we call reprepare_packed_git, we do not
throw away our old pack list. We keep the existing entries,
and only add in new ones. So there is no safety problem; we
will still have the pack struct that matches each revindex.
The packfile itself may go away, of course, but we are
already prepared to handle that, and it may happen outside
of reprepare_packed_git anyway.

Throwing away the revindex may save some RAM if the pack
never gets reused (about 12 bytes per object). But it also
wastes some CPU time (to regenerate the index) if the pack
does get reused. It's hard to say which is more valuable,
but in either case, it happens very rarely (only when we
race with a simultaneous repack). Just leaving the revindex
in place is simple and safe both for current and future
code.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1a6d8b91

07 1月, 2014 8 次提交

safe_create_leading_directories(): add new error value SCLD_VANISHED · 18d37e86

由 Michael Haggerty 提交于 1月 06, 2014

Add a new possible error result that can be returned by
safe_create_leading_directories() and
safe_create_leading_directories_const(): SCLD_VANISHED.  This value
indicates that a file or directory on the path existed at one point
(either it already existed or the function created it), but then it
disappeared.  This probably indicates that another process deleted the
directory while we were working.  If SCLD_VANISHED is returned, the
caller might want to retry the function call, as there is a chance
that a new attempt will succeed.

Why doesn't safe_create_leading_directories() do the retrying
internally?  Because an empty directory isn't really ever safe until
it holds a file.  So even if safe_create_leading_directories() were
absolutely sure that the directory existed before it returned, there
would be no guarantee that the directory still existed when the caller
tried to write something in it.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

18d37e86

safe_create_leading_directories(): introduce enum for return values · 0be0521b

由 Michael Haggerty 提交于 1月 06, 2014

Instead of returning magic integer values (which a couple of callers
go to the trouble of distinguishing), return values from an enum.  Add
a docstring.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

0be0521b

safe_create_leading_directories(): always restore slash at end of loop · 9e6f885d

由 Michael Haggerty 提交于 1月 06, 2014

Always restore the slash that we scribbled over at the end of the
loop, rather than also fixing it up at each premature exit from the
loop.  This makes it harder to forget to do the cleanup as new paths
are added to the code.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

9e6f885d

safe_create_leading_directories(): split on first of multiple slashes · bf10cf70

由 Michael Haggerty 提交于 1月 06, 2014

If the input path has multiple slashes between path components (e.g.,
"foo//bar"), then the old code was breaking the path at the last
slash, not the first one.  So in the above example, the second slash
was overwritten with NUL, resulting in the parent directory being
sought as "foo/".

When stat() is called on "foo/", it fails with ENOTDIR if "foo" exists
but is not a directory.  This caused the wrong path to be taken in the
subsequent logic.

So instead, split path components at the first intercomponent slash
rather than the last one.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

bf10cf70

safe_create_leading_directories(): rename local variable · 26c8ae2a

由 Michael Haggerty 提交于 1月 06, 2014

Rename "pos" to "next_component", because now it always points at the
next component of the path name that has to be processed.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

26c8ae2a

safe_create_leading_directories(): add explicit "slash" pointer · 831651fd

由 Michael Haggerty 提交于 1月 06, 2014

Keep track of the position of the slash character independently of
"pos", thereby making the purpose of each variable clearer and
working towards other upcoming changes.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

831651fd

safe_create_leading_directories(): reduce scope of local variable · f0502332

由 Michael Haggerty 提交于 1月 06, 2014

This makes it more obvious that values of "st" don't persist across
loop iterations.
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

f0502332

M
safe_create_leading_directories(): fix format of "if" chaining · 53a39721
由 Michael Haggerty 提交于 1月 06, 2014
```
Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>
```
53a39721

31 12月, 2013 1 次提交

count-objects: recognize .bitmap in garbage-checking · d3d3e4c4

由 Nguyễn Thái Ngọc Duy 提交于 12月 21, 2013

Count-objects will report any "garbage" files in the packs
directory, including files whose extensions it does not
know (case 1), and files whose matching ".pack" file is
missing (case 2).  Without having learned about ".bitmap"
files, the current code reports all such files as garbage
(case 1), even if their pack exists. Instead, they should be
treated as case 2.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

d3d3e4c4

27 12月, 2013 1 次提交

sha1_object_info_extended: provide delta base sha1s · 5d642e75

由 Jeff King 提交于 12月 21, 2013

A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.

However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.

For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

5d642e75

19 12月, 2013 1 次提交

get_max_fd_limit(): fall back to OPEN_MAX upon getrlimit/sysconf failure · 491a8dec

由 Junio C Hamano 提交于 12月 18, 2013

On broken systems where RLIMIT_NOFILE is visible by the compliers
but underlying getrlimit() system call does not behave, we used to
simply die() when we are trying to decide how many file descriptors
to allocate for keeping packfiles open.  Instead, allow the fallback
codepath to take over when we get such a failure from getrlimit().

The same issue exists with _SC_OPEN_MAX and sysconf(); restructure
the code in a similar way to prepare for a broken sysconf() as well.
Noticed-by: NJoey Hess <joey@kitenet.net>
Helped-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

491a8dec

13 12月, 2013 4 次提交

sha1_file: perform object replacement in sha1_object_info_extended() · 1f7117ef

由 Christian Couder 提交于 12月 11, 2013

sha1_object_info_extended() should perform object replacement
if it is needed.

The simplest way to do that is to make it call
lookup_replace_object_extended().

And now its "unsigned flags" parameter is used as it is passed
to lookup_replace_object_extended().
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

1f7117ef

sha1_object_info_extended(): add an "unsigned flags" parameter · de7b5d62

由 Christian Couder 提交于 12月 11, 2013

This parameter is not used yet, but it will be used to tell
sha1_object_info_extended() if it should perform object
replacement or not.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

de7b5d62

sha1_file.c: add lookup_replace_object_extended() to pass flags · bf93eea0

由 Christian Couder 提交于 12月 11, 2013

Currently, there is only one caller to lookup_replace_object()
that can benefit from passing it some flags, but we expect
that there could be more.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

bf93eea0

rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT · ffe68cf9

由 Christian Couder 提交于 12月 11, 2013

The READ_SHA1_FILE_REPLACE flag is more related to using the
lookup_replace_object() function rather than the
read_sha1_file() function.

We also need such a flag to be used with sha1_object_info()
instead of read_sha1_file().

The name LOOKUP_REPLACE_OBJECT is therefore better for this
flag.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ffe68cf9

22 11月, 2013 1 次提交

drop support for "experimental" loose objects · b039718d

由 Jeff King 提交于 11月 21, 2013

In git v1.4.3, we introduced a new loose object format that
encoded some object information outside of the zlib stream.
Ultimately the format was dropped in v1.5.3, but we kept the
reading side around to help people migrate objects. Each
time we open a loose object, we use a heuristic to check
whether it is in the normal loose format, or the
experimental one.

This heuristic is robust in the face of valid data, but it
tends to treat corrupted or garbage data as an experimental
object. With the regular format, we would notice quickly
that zlib's crc does not check out and complain. With the
experimental object, we are likely to extract a nonsensical
object size and try to allocate a huge buffer, resulting in
xmalloc calling "die".

This latter behavior is much worse, for two reasons. One,
git reports an allocation error when the real error is
corruption. And two, the program dies unconditionally, so
you cannot even run fsck (which would otherwise ignore the
broken object and keep going).

We could try to improve the heuristic to err on the side of
normal objects in the face of corruption, but there is
really little point. The experimental format is long-dead,
and was never enabled by default to begin with. We can
instead simply remove it. The only affected repository would
be one that explicitly set core.legacyheaders in 2007, and
then never repacked in the intervening 6 years.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b039718d

07 11月, 2013 1 次提交

sha1_loose_object_info(): do not return success on missing object · 4ef8d1dd

由 Junio C Hamano 提交于 11月 06, 2013

Since 052fe5ea (sha1_loose_object_info: make type lookup optional,
2013-07-12), sha1_loose_object_info() returns happily without
checking if the object in question exists, which is not what the the
caller sha1_object_info_extended() expects; the caller does not even
bother checking the existence of the object itself.
Noticed-by: NSven Brauch <svenbrauch@googlemail.com>
Helped-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

4ef8d1dd

29 10月, 2013 2 次提交

sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs · b2476a60

由 Johan Herland 提交于 10月 27, 2013

There are cases (e.g. when running concurrent fetches in a repo) where
multiple Git processes concurrently attempt to create loose objects
within the same objects/XX/ dir. The creation of the loose object files
is (AFAICS) safe from races, but the creation of the objects/XX/ dir in
which the loose objects reside is unsafe, for example:

Two concurrent fetches - A and B. As part of its fetch, A needs to store
12aaaaa as a loose object. B, on the other hand, needs to store 12bbbbb
as a loose object. The objects/12 directory does not already exist.
Concurrently, both A and B determine that they need to create the
objects/12 directory (because their first call to git_mkstemp_mode()
within create_tmpfile() fails witn ENOENT). One of them - let's say A -
executes the following mkdir() call before the other. This first call
returns success, and A moves on. When B gets around to calling mkdir(),
it fails with EEXIST, because A won the race. The mkdir() error causes B
to return -1 from create_tmpfile(), which propagates all the way,
resulting in the fetch failing with:

error: unable to create temporary file: File exists
fatal: failed to write object
fatal: unpack-objects failed

Although it's hard to add a testcase reproducing this issue, it's easy
to provoke if we insert a sleep after the

if (mkdir(buffer, 0777) || adjust_shared_perm(buffer))
return -1;

block, and then run two concurrent "git fetch"es against the same repo.

The fix is to simply handle mkdir() failing with EEXIST as a success.
If EEXIST is somehow returned for the wrong reasons (because the relevant
objects/XX is not a directory, or is otherwise unsuitable for object
storage), the following call to adjust_shared_perm(), or ultimately the
retried call to git_mkstemp_mode() will fail, and we end up returning
error from create_tmpfile() in any case.

Note that there are still cases where two users with unsuitable umasks
in a shared repo can end up in two races where one user first wins the
mkdir() race to create an objects/XX/ directory, and then the other user
wins the adjust_shared_perms() race to chmod() that directory, but fails
because it is (transiently, until the first users completes its chmod())
unwriteable to the other user. However, (an equivalent of) this race also
exists before this patch, and is made no worse by this patch.
Signed-off-by: NJohan Herland <johan@herland.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

b2476a60

sha1_file: move comment about return value where it belongs · 3fc0dca9

由 Christian Couder 提交于 10月 27, 2013

Commit 5b086407 (sha1_object_info_extended: make type calculation
optional, Jul 12 2013) changed the return value of the
sha1_object_info_extended function to 0/-1 for success/error.

Previously this function returned the object type for success or
-1 for error. But unfortunately the above commit forgot to change
or move the comment above this function that says "returns enum
object_type or negative".

To fix this inconsistency, let's move the comment above the
sha1_object_info function where it is still true.
Signed-off-by: NChristian Couder <chriscool@tuxfamily.org>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

3fc0dca9

25 10月, 2013 1 次提交

sha1_file: export `git_open_noatime` · ec73f580

由 Vicent Marti 提交于 10月 24, 2013

The `git_open_noatime` helper can be of general interest for other
consumers of git's different on-disk formats.
Signed-off-by: NVicent Marti <tanoku@gmail.com>
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

ec73f580

14 9月, 2013 1 次提交

pack-objects: no crc check when the cached version is used · 77965f8b

由 Nguyễn Thái Ngọc Duy 提交于 9月 13, 2013

Current code makes pack-objects always do check_pack_crc() in
unpack_entry() even if right after that we find out there's a cached
version and pack access is not needed. Swap two code blocks, search
for cached version first, then check crc.
Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

77965f8b

31 8月, 2013 1 次提交

has_sha1_file: re-check pack directory before giving up · 45e8a748

由 Jeff King 提交于 8月 30, 2013

When we read a sha1 file, we first look for a packed
version, then a loose version, and then re-check the pack
directory again before concluding that we cannot find it.
This lets us handle a process that is writing to the
repository simultaneously (e.g., receive-pack writing a new
pack followed by a ref update, or git-repack packing
existing loose objects into a new pack).

However, we do not do the same trick with has_sha1_file; we
only check the packed objects once, followed by loose
objects. This means that we might incorrectly report that we
do not have an object, even though we could find it if we
simply re-checked the pack directory.

By itself, this is usually not a big deal. The other process
is running simultaneously, so we may run has_sha1_file
before it writes, anyway. It is a race whether we see the
object or not.  However, we may also see other things
the writing process has done (like updating refs); and in
that case, we must be able to also see the new objects.

For example, imagine we are doing a for_each_ref iteration,
and somebody simultaneously pushes. Receive-pack may write
the pack and update a ref after we have examined the
objects/pack directory, but before the iteration gets to the
updated ref. When we do finally see the updated ref,
for_each_ref will call has_sha1_file to check whether the
ref is broken. If has_sha1_file returns the wrong answer, we
erroneously will think that the ref is broken.

For a normal iteration without DO_FOR_EACH_INCLUDE_BROKEN,
this means that the caller does not see the ref at all
(neither the old nor the new value).  So not only will we
fail to see the new value of the ref (which is acceptable,
since we are running simultaneously with the writer, and we
might well read the ref before the writer commits its
write), but we will not see the old value either. For
programs that act on reachability like pack-objects or
prune, this can cause data loss, as we may see the objects
referenced by the original ref value as dangling (and either
omit them from the pack, or delete them via prune).

There's no test included here, because the success case is
two processes running simultaneously forever. But you can
replicate the issue with:

  # base.sh
  # run this in one terminal; it creates and pushes
  # repeatedly to a repository
  git init parent &&
  (cd parent &&

    # create a base commit that will trigger us looking at
    # the objects/pack directory before we hit the updated ref
    echo content >file &&
    git add file &&
    git commit -m base &&

    # set the unpack limit abnormally low, which
    # lets us simulate full-size pushes using tiny ones
    git config receive.unpackLimit 1
  ) &&
  git clone parent child &&
  cd child &&
  n=0 &&
  while true; do
    echo $n >file && git add file && git commit -m $n &&
    git push origin HEAD:refs/remotes/child/master &&
    n=$(($n + 1))
  done

  # fsck.sh
  # now run this simultaneously in another terminal; it
  # repeatedly fscks, looking for us to consider the
  # newly-pushed ref broken. We cannot use for-each-ref
  # here, as it uses DO_FOR_EACH_INCLUDE_BROKEN, which
  # skips the has_sha1_file check (and if it wants
  # more information on the object, it will actually read
  # the object, which does the proper two-step lookup)
  cd parent &&
  while true; do
    broken=`git fsck 2>&1 | grep remotes/child`
    if test -n "$broken"; then
      echo $broken
      exit 1
    fi
  done

Without this patch, the fsck loop fails within a few seconds
(and almost instantly if the test repository actually has a
large number of refs). With it, the two can run
indefinitely.
Signed-off-by: NJeff King <peff@peff.net>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

45e8a748

03 8月, 2013 1 次提交

Don't close pack fd when free'ing pack windows · 7c3ecb32

由 Brandon Casey 提交于 7月 31, 2013

Now that close_one_pack() has been introduced to handle file
descriptor pressure, it is not strictly necessary to close the
pack file descriptor in unuse_one_window() when we're under memory
pressure.

Jeff King provided a justification for leaving the pack file open:

   If you close packfile descriptors, you can run into racy situations
   where somebody else is repacking and deleting packs, and they go away
   while you are trying to access them. If you keep a descriptor open,
   you're fine; they last to the end of the process. If you don't, then
   they disappear from under you.

   For normal object access, this isn't that big a deal; we just rescan
   the packs and retry. But if you are packing yourself (e.g., because
   you are a pack-objects started by upload-pack for a clone or fetch),
   it's much harder to recover (and we print some warnings).

Let's do so (or uh, not do so).
Signed-off-by: NBrandon Casey <drafnel@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

7c3ecb32

02 8月, 2013 1 次提交

sha1_file: introduce close_one_pack() to close packs on fd pressure · 88d0db55

由 Brandon Casey 提交于 8月 01, 2013

When the number of open packs exceeds pack_max_fds, unuse_one_window()
is called repeatedly to attempt to release the least-recently-used
pack windows, which, as a side-effect, will also close a pack file
after closing its last open window.  If a pack file has been opened,
but no windows have been allocated into it, it will never be selected
by unuse_one_window() and hence its file descriptor will not be
closed.  When this happens, git may exceed the number of file
descriptors permitted by the system.

This latter situation can occur in show-ref or receive-pack during ref
advertisement.  During ref advertisement, receive-pack will iterate
over every ref in the repository and advertise it to the client after
ensuring that the ref exists in the local repository.  If the ref is
located inside a pack, then the pack is opened to ensure that it
exists, but since the object is not actually read from the pack, no
mmap windows are allocated.  When the number of open packs exceeds
pack_max_fds, unuse_one_window() will not be able to find any windows to
free and will not be able to close any packs.  Once the per-process
file descriptor limit is exceeded, receive-pack will produce a warning,
not an error, for each pack it cannot open, and will then most likely
fail with an error to spawn rev-list or index-pack like:

   error: cannot create standard input pipe for rev-list: Too many open files
   error: Could not run 'git rev-list'

This may also occur during upload-pack when refs are packed (in the
packed-refs file) and the number of packs that must be opened to
verify that these packed refs exist exceeds the file descriptor
limit.  If the refs are loose, then upload-pack will read each ref
from the object database (if the object is in a pack, allocating one
or more mmap windows for it) in order to peel tags and advertise the
underlying object.  But when the refs are packed and peeled,
upload-pack will use the peeled sha1 in the packed-refs file and
will not need to read from the pack files, so no mmap windows will
be allocated and just like with receive-pack, unuse_one_window()
will never select these opened packs to close.

When we have file descriptor pressure, we just need to find an open
pack to close.  We can leave the existing mmap windows open.  If
additional windows need to be mapped into the pack file, it will be
reopened when necessary.  If the pack file has been rewritten in the
mean time, open_packed_git_1() should notice when it compares the file
size or the pack's sha1 checksum to what was previously read from the
pack index, and reject it.

Let's introduce a new function close_one_pack() designed specifically
for this purpose to search for and close the least-recently-used pack,
where LRU is defined as (in order of preference):

   * pack with oldest mtime and no allocated mmap windows
   * pack with the least-recently-used windows, i.e. the pack
     with the oldest most-recently-used window, where none of
     the windows are in use
   * pack with the least-recently-used windows
Signed-off-by: NBrandon Casey <drafnel@gmail.com>
Signed-off-by: NJunio C Hamano <gitster@pobox.com>

88d0db55

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致