提交 · 30ca07a249744e57163c02250fca420cea364299 · 李少辉-开发者 / git

04 4月, 2007 1 次提交

_GIT_INDEX_OUTPUT: allow plumbing to output to an alternative index file. · 30ca07a2

由 Junio C Hamano 提交于 3月 31, 2007

When defined, this allows plumbing commands that update the
index (add, apply, checkout-index, merge-recursive, mv,
read-tree, rm, update-index, and write-tree) to write their
resulting index to an alternative index file while holding a
lock to the original index file.  With this, git-commit that
jumps the index does not have to make an extra copy of the index
file, and more importantly, it can do the update while holding
the lock on the index.

However, I think the interface to let an environment variable
specify the output is a mistake, as shown in the documentation.
If a curious user has the environment variable set to something
other than the file GIT_INDEX_FILE points at, almost everything
will break.  This should instead be a command line parameter to
tell these plumbing commands to write the result in the named
file, to prevent stupid mistakes.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

30ca07a2

21 3月, 2007 1 次提交

index-pack: use hash_sha1_file() · ce9fbf16

由 Nicolas Pitre 提交于 3月 20, 2007

Use hash_sha1_file() instead of duplicating code to compute object SHA1.
While at it make it accept a const pointer.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

ce9fbf16

19 3月, 2007 1 次提交

Limit the size of the new delta_base_cache · 18bdec11

由 Shawn O. Pearce 提交于 3月 19, 2007

The new configuration variable core.deltaBaseCacheLimit allows the
user to control how much memory they are willing to give to Git for
caching base objects of deltas.  This is not normally meant to be
a user tweakable knob; the "out of the box" settings are meant to
be suitable for almost all workloads.

We default to 16 MiB under the assumption that the cache is not
meant to consume all of the user's available memory, and that the
cache's main purpose was to cache trees, for faster path limiters
during revision traversal.  Since trees tend to be relatively small
objects, this relatively small limit should still allow a large
number of objects.

On the other hand we don't want the cache to start storing 200
different versions of a 200 MiB blob, as this could easily blow
the entire address space of a 32 bit process.

We evict OBJ_BLOB from the cache first (credit goes to Junio) as
we want to favor OBJ_TREE within the cache.  These are the objects
that have the highest inflate() startup penalty, as they tend to
be small and thus don't have that much of a chance to ammortize
that penalty over the entire data.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

18bdec11

17 3月, 2007 1 次提交

[PATCH] clean up pack index handling a bit · 42873078

由 Nicolas Pitre 提交于 3月 16, 2007

Especially with the new index format to come, it is more appropriate
to encapsulate more into check_packed_git_idx() and assume less of the
index format in struct packed_git.

To that effect, the index_base is renamed to index_data with void * type
so it is not used directly but other pointers initialized with it. This
allows for a couple pointer cast removal, as well as providing a better
generic name to grep for when adding support for new index versions or
formats.

And index_data is declared const too while at it.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

42873078

13 3月, 2007 1 次提交

Correct new compiler warnings in builtin-revert · 1a8f2741

由 Shawn O. Pearce 提交于 3月 12, 2007

The new builtin-revert code introduces a few new compiler errors
when I'm building with my stricter set of checks enabled in CFLAGS.
These all just stem from trying to store a constant string into
a non-const char*.  Simple fix, make the variables const char*.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

1a8f2741

11 3月, 2007 2 次提交

prepare_packed_git(): sort packs by age and localness. · b867092f

由 Junio C Hamano 提交于 3月 09, 2007

When accessing objects, we first look for them in packs that
are linked together in the reverse order of discovery.

Since younger packs tend to contain more recent objects, which
are more likely to be accessed often, and local packs tend to
contain objects more relevant to our specific projects, sort the
list of packs before starting to access them.  In addition,
favoring local packs over the ones borrowed from alternates can
be a win when alternates are mounted on network file systems.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

b867092f

git-branch, git-checkout: autosetup for remote branch tracking · 0746d19a

由 Paolo Bonzini 提交于 3月 08, 2007

In order to track and build on top of a branch 'topic' you track from
your upstream repository, you often would end up doing this sequence:

  git checkout -b mytopic origin/topic
  git config --add branch.mytopic.remote origin
  git config --add branch.mytopic.merge refs/heads/topic

This would first fork your own 'mytopic' branch from the 'topic'
branch you track from the 'origin' repository; then it would set up two
configuration variables so that 'git pull' without parameters does the
right thing while you are on your own 'mytopic' branch.

This commit adds a --track option to git-branch, so that "git
branch --track mytopic origin/topic" performs the latter two actions
when creating your 'mytopic' branch.

If the configuration variable branch.autosetupmerge is set to true, you
do not have to pass the --track option explicitly; further patches in
this series allow setting the variable with a "git remote add" option.
The configuration variable is off by default, and there is a --no-track
option to countermand it even if the variable is set.
Signed-off-by: NPaolo Bonzini  <bonzini@gnu.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

0746d19a

08 3月, 2007 3 次提交

Use off_t when we really mean a file offset. · c4001d92

由 Shawn O. Pearce 提交于 3月 06, 2007

Not all platforms have declared 'unsigned long' to be a 64 bit value,
but we want to support a 64 bit packfile (or close enough anyway)
in the near future as some projects are getting large enough that
their packed size exceeds 4 GiB.

By using off_t, the POSIX type that is declared to mean an offset
within a file, we support whatever maximum file size the underlying
operating system will handle.  For most modern systems this is up
around 2^60 or higher.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

c4001d92

Use uint32_t for all packed object counts. · 326bf396

由 Shawn O. Pearce 提交于 3月 06, 2007

As we permit up to 2^32-1 objects in a single packfile we cannot
use a signed int to represent the object offset within a packfile,
after 2^31-1 objects we will start seeing negative indexes and
error out or compute bad addresses within the mmap'd index.

This is a minor cleanup that does not introduce any significant
logic changes.  It is roach free.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

326bf396

General const correctness fixes · 3a55602e

由 Shawn O. Pearce 提交于 3月 06, 2007

We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime.  Likewise we should always be
treating unsigned values as unsigned values, not as signed values.

Most of these are very straightforward.  The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case.  Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

3a55602e

05 3月, 2007 1 次提交

unpack_sha1_file(): detect corrupt loose object files. · 7efbff75

由 Junio C Hamano 提交于 3月 05, 2007

We did not detect broken loose object files, either when
underlying inflate() signalled the breakage, nor inflate()
finished and we had garbage trailing at the end.  We do better
now.

We also make unpack_sha1_file() a static function to
sha1_file.c, since it is not used by anybody outside.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

7efbff75

03 3月, 2007 1 次提交

Add core.symlinks to mark filesystems that do not support symbolic links. · 78a8d641

由 Johannes Sixt 提交于 3月 02, 2007

Some file systems that can host git repositories and their working copies
do not support symbolic links. But then if the repository contains a symbolic
link, it is impossible to check out the working copy.

This patch enables partial support of symbolic links so that it is possible
to check out a working copy on such a file system. A new flag
core.symlinks (which is true by default) can be set to false to indicate
that the filesystem does not support symbolic links. In this case, symbolic
links that exist in the trees are checked out as small plain files, and
checking in modifications of these files preserve the symlink property in
the database (as long as an entry exists in the index).

Of course, this does not magically make symbolic links work on such defective
file systems; hence, this solution does not help if the working copy relies
on that an entry is a real symbolic link.
Signed-off-by: NJohannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

78a8d641

01 3月, 2007 2 次提交
- J
  index_fd(): pass optional path parameter as hint for blob conversion · 53bca91a
  由 Junio C Hamano 提交于 2月 28, 2007
```
Signed-off-by: NJunio C Hamano <junkio@cox.net>
```
  53bca91a
- J
  index_fd(): use enum object_type instead of type name string. · edaec3fb
  由 Junio C Hamano 提交于 2月 28, 2007
```
Signed-off-by: NJunio C Hamano <junkio@cox.net>
```
  edaec3fb
28 2月, 2007 2 次提交

make sure enum object_type is signed · fef742c4

由 Nicolas Pitre 提交于 2月 27, 2007

This allows for keeping the common idiom which consists of using
negative values to signal error conditions by ensuring that the enum
will be a signed type.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

fef742c4

show_date(): rename the "relative" parameter to "mode" · f8493ec0

由 Johannes Schindelin 提交于 2月 27, 2007

Now, show_date() can print three different kinds of dates: normal,
relative and short (%Y-%m-%s) dates.

To achieve this, the "int relative" was changed to "enum date_mode
mode", which has three states: DATE_NORMAL, DATE_RELATIVE and
DATE_SHORT.

Since existing users of show_date() only call it with relative_date
being either 0 or 1, and DATE_NORMAL and DATE_RELATIVE having these
values, no behaviour is changed.
Signed-off-by: NJohannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

f8493ec0

27 2月, 2007 1 次提交

convert object type handling from a string to a number · 21666f1a

由 Nicolas Pitre 提交于 2月 26, 2007

We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value.  One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.

This is an initial step for the removal of the version using a char array
found in object reading code paths.  The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

21666f1a

17 2月, 2007 1 次提交

Do not take mode bits from index after type change. · 185c975f

由 Junio C Hamano 提交于 2月 16, 2007

When we do not trust executable bit from lstat(2), we copied
existing ce_mode bits without checking if the filesystem object
is a regular file (which is the only thing we apply the "trust
executable bit" business) nor if the blob in the index is a
regular file (otherwise, we should do the same as registering a
new regular file, which is to default non-executable).

Noticed by Johannes Sixt.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

185c975f

15 2月, 2007 1 次提交

Lazy man's auto-CRLF · 6c510bee

由 Linus Torvalds 提交于 2月 13, 2007

It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.

Anyway, BY DEFAULT it is off regardless, because it requires a

	[core]
		AutoCRLF = true

in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).

But you can actually enable it on UNIX, and it will cause:

 - "git update-index" will write blobs without CRLF
 - "git diff" will diff working tree files without CRLF
 - "git checkout" will write files to the working tree _with_ CRLF

and things work fine.

Funnily, it actually shows an odd file in git itself:

	git clone -n git test-crlf
	cd test-crlf
	git config core.autocrlf true
	git checkout
	git diff

shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.

Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).

I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.

NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.

The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

6c510bee

09 2月, 2007 1 次提交

log --reflog: use dwim_log · eb3a4822

由 Johannes Schindelin 提交于 2月 09, 2007

Since "git log origin/master" uses dwim_log() to match
"refs/remotes/origin/master", it makes sense to do that for
"git log --reflog", too.
Signed-off-by: NJohannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

eb3a4822

06 2月, 2007 1 次提交

Add pretend_sha1_file() interface. · d66b37bb

由 Junio C Hamano 提交于 2月 04, 2007

The new interface allows an application to temporarily hash a
small number of objects and pretend that they are available in
the object store without actually writing them.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

d66b37bb

05 2月, 2007 1 次提交

Rename get_ident() to fmt_ident() and make it available to outside · 798123af

由 Junio C Hamano 提交于 2月 04, 2007

This makes the functionality of ident.c::get_ident() available to
other callers.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

798123af

28 1月, 2007 2 次提交

add logref support to git-symbolic-ref · 8b5157e4

由 Nicolas Pitre 提交于 1月 26, 2007

Signed-off-by: NNicolas Pitre <nico@cam.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

8b5157e4

Don't force everybody to call setup_ident(). · 01754769

由 Junio C Hamano 提交于 1月 28, 2007

Back when only handful commands that created commit and tag were
the only users of committer identity information, it made sense
to explicitly call setup_ident() to pre-fill the default value
from the gecos information.  But it is much simpler for programs
to make the call automatic when get_ident() is called these days,
since many more programs want to use the information when updating
the reflog.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

01754769

26 1月, 2007 1 次提交

Allow non-developer to clone, checkout and fetch more easily. · cb280e10

由 Junio C Hamano 提交于 1月 25, 2007

The code that uses committer_info() in reflog can barf and die
whenever it is asked to update a ref.  And I do not think
calling ignore_missing_committer_name() upfront like recent
receive-pack did in the aplication is a reasonable workaround.

What the patch does.

 - git_committer_info() takes one parameter.  It used to be "if
   this is true, then die() if the name is not available due to
   bad GECOS, otherwise issue a warning once but leave the name
   empty".  The reason was because we wanted to prevent bad
   commits from being made by git-commit-tree (and its
   callers).  The value 0 is only used by "git var -l".

   Now it takes -1, 0 or 1.  When set to -1, it does not
   complain but uses the pw->pw_name when name is not
   available.  Existing 0 and 1 values mean the same thing as
   they used to mean before.  0 means issue warnings and leave
   it empty, 1 means barf and die.

 - ignore_missing_committer_name() and its existing caller
   (receive-pack, to set the reflog) have been removed.

 - git-format-patch, to come up with the phoney message ID when
   asked to thread, now passes -1 to git_committer_info().  This
   codepath uses only the e-mail part, ignoring the name.  It
   used to barf and die.  The other call in the same program
   when asked to add signed-off-by line based on committer
   identity still passes 1 to make sure it barfs instead of
   adding a bogus s-o-b line.

 - log_ref_write in refs.c, to come up with the name to record
   who initiated the ref update in the reflog, passes -1.  It
   used to barf and die.

The last change means that git-update-ref, git-branch, and
commit walker backends can now be used in a repository with
reflog by somebody who does not have the user identity required
to make a commit.  They all used to barf and die.

I've run tests and all of them seem to pass, and also tried "git
clone" as a user whose GECOS is empty -- git clone works again
now (it was broken when reflog was enabled by default).

But this definitely needs extra sets of eyeballs.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

cb280e10

21 1月, 2007 1 次提交

Do not verify filenames in a bare repository · 68025633

由 Johannes Schindelin 提交于 1月 20, 2007

For example, it makes no sense to check the presence of a file
named "HEAD" when calling "git log HEAD" in a bare repository.

Noticed by Han-Wen Nienhuys.
Signed-off-by: NJohannes Schindelin <Johannes.Schindelin@gmx.de>

68025633

20 1月, 2007 1 次提交

dwim_ref(): Separate name-to-ref DWIM code out. · e86eb666

由 Junio C Hamano 提交于 1月 19, 2007

I'll be using this in another function to figure out what to
pass to resolve_ref().
Signed-off-by: NJunio C Hamano <junkio@cox.net>

e86eb666

19 1月, 2007 1 次提交

Use fixed-size integers for .idx file I/O · b18b00a6

由 Junio C Hamano 提交于 1月 17, 2007

This attempts to finish what Simon started in the previous commit.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

b18b00a6

17 1月, 2007 1 次提交
- C
  cache.h; fix a couple of prototypes · 276bc2ca
  由 Chris Wedgwood 提交于 1月 16, 2007
```
Trivial patch.
Signed-off-by: NJunio C Hamano <junkio@cox.net>
```
  276bc2ca
14 1月, 2007 1 次提交

Remove read_or_die in favor of better error messages. · e6e2bd62

由 Shawn O. Pearce 提交于 1月 14, 2007

Originally I introduced read_or_die for the purpose of reading
the pack header and trailer, and I was too lazy to print proper
error messages.

Linus Torvalds <torvalds@osdl.org>:
> For a read error, at the very least you have to say WHICH FILE
> couldn't be read, because it's usually a matter of some file just
> being too short, not some system-wide problem.

and of course Linus is right. Make it so.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

e6e2bd62

09 1月, 2007 2 次提交

short i/o: fix calls to read to use xread or read_in_full · 93d26e4c

由 Andy Whitcroft 提交于 1月 08, 2007

We have a number of badly checked read() calls.  Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads.  Add a read_in_full()
providing those semantics.  Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

93d26e4c

short i/o: clean up the naming for the write_{in,or}_xxx family · e0814056

由 Andy Whitcroft 提交于 1月 08, 2007

We recently introduced a write_in_full() which would either write
the specified object or emit an error message and fail.  In order
to fix the read side we now want to introduce a read_in_full()
but without an error emit.  This patch cleans up the naming
of this family of calls:

1) convert the existing write_or_whine() to write_or_whine_pipe()
   to better indicate its pipe specific nature,
2) convert the existing write_in_full() calls to write_or_whine()
   to better indicate its nature,
3) introduce a write_in_full() providing a write or fail semantic,
   and
4) convert write_or_whine() and write_or_whine_pipe() to use
   write_in_full().
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

e0814056

08 1月, 2007 2 次提交

Detached HEAD (experimental) · c847f537

由 Junio C Hamano 提交于 1月 01, 2007

This allows "git checkout v1.4.3" to dissociate the HEAD of
repository from any branch.  After this point, "git branch"
starts reporting that you are not on any branch.  You can go
back to an existing branch by saying "git checkout master", for
example.

This is still experimental.  While I think it makes sense to
allow commits on top of detached HEAD, it is rather dangerous
unless you are careful in the current form.  Next "git checkout
master" will obviously lose what you have done, so we might want
to require "git checkout -f" out of a detached HEAD if we find
that the HEAD commit is not an ancestor of any other branches.
There is no such safety valve implemented right now.

On the other hand, the reason the user did not start the ad-hoc
work on a new branch with "git checkout -b" was probably because
the work was of a throw-away nature, so the convenience of not
having that safety valve might be even better.  The user, after
accumulating some commits on top of a detached HEAD, can always
create a new branch with "git checkout -b" not to lose useful
work done while the HEAD was detached.

We'll see.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

c847f537

Introduce is_bare_repository() and core.bare configuration variable · 7d1864ce

由 Junio C Hamano 提交于 1月 07, 2007

This removes the old is_bare_git_dir(const char *) to ask if a
directory, if it is a GIT_DIR, is a bare repository, and
replaces it with is_bare_repository(void *).  The function looks
at core.bare configuration variable if exists but uses the old
heuristics: if it is ".git" or ends with "/.git", then it does
not look like a bare repository, otherwise it does.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

7d1864ce

03 1月, 2007 2 次提交

Fix infinite loop when deleting multiple packed refs. · 1084b845

由 Junio C Hamano 提交于 1月 02, 2007

It was stupid to link the same element twice to lock_file_list
and end up in a loop, so we certainly need a fix.

But it is not like we are taking a lock on multiple files in
this case.  It is just that we leave the linked element on the
list even after commit_lock_file() successfully removes the
cruft.

We cannot remove the list element in commit_lock_file(); if we
are interrupted in the middle of list manipulation, the call to
remove_lock_file_on_signal() will happen with a broken list
structure pointed by lock_file_list, which would cause the cruft
to remain, so not removing the list element is the right thing
to do.  Instead we should be reusing the element already on the
list.

There is already a code for that in lock_file() function in
lockfile.c.  The code checks lk->next and the element is linked
only when it is not already on the list -- which is incorrect
for the last element on the list (which has NULL in its next
field), but if you read the check as "is this element already on
the list?" it actually makes sense.  We do not want to link it
on the list again, nor we would want to set up signal/atexit
over and over.
Signed-off-by: NJunio C Hamano <junkio@cox.net>

1084b845

send pack check for failure to send revisions list · 825cee7b

由 Andy Whitcroft 提交于 1月 02, 2007

When passing the revisions list to pack-objects we do not check for
errors nor short writes.  Introduce a new write_in_full which will
handle short writes and report errors to the caller.  Use this to
short cut the send on failure, allowing us to wait for and report
the child in case the failure is its fault.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

825cee7b

30 12月, 2006 4 次提交

Create pack_report() as a debugging aid. · a53128b6

由 Shawn O. Pearce 提交于 12月 23, 2006

Much like the alloc_report() function can be useful to report on
object allocation statistics while debugging the new pack_report()
function can be useful to report on the behavior of the mmap window
code used for packfile access.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

a53128b6

Fully activate the sliding window pack access. · 60bb8b14

由 Shawn O. Pearce 提交于 12月 23, 2006

This finally turns on the sliding window behavior for packfile data
access by mapping limited size windows and chaining them under the
packed_git->windows list.

We consider a given byte offset to be within the window only if there
would be at least 20 bytes (one hash worth of data) accessible after
the requested offset.  This range selection relates to the contract
that use_pack() makes with its callers, allowing them to access
one hash or one object header without needing to call use_pack()
for every byte of data obtained.

In the worst case scenario we will map the same page of data twice
into memory: once at the end of one window and once again at the
start of the next window.  This duplicate page mapping will happen
only when an object header or a delta base reference is spanned
over the end of a window and is always limited to just one page of
duplication, as no sane operating system will ever have a page size
smaller than a hash.

I am assuming that the possible wasted page of virtual address
space is going to perform faster than the alternatives, which
would be to copy the object header or ref delta into a temporary
buffer prior to parsing, or to check the window range on every byte
during header parsing.  We may decide to revisit this decision in
the future since this is just a gut instinct decision and has not
actually been proven out by experimental testing.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

60bb8b14

Replace use_packed_git with window cursors. · 03e79c88

由 Shawn O. Pearce 提交于 12月 23, 2006

Part of the implementation concept of the sliding mmap window for
pack access is to permit multiple windows per pack to be mapped
independently.  Since the inuse_cnt is associated with the mmap and
not with the file, this value is in struct pack_window and needs to
be incremented/decremented for each pack_window accessed by any code.

To faciliate that implementation we need to replace all uses of
use_packed_git() and unuse_packed_git() with a different API that
follows struct pack_window objects rather than struct packed_git.

The way this works is when we need to start accessing a pack for
the first time we should setup a new window 'cursor' by declaring
a local and setting it to NULL:

  struct pack_windows *w_curs = NULL;

To obtain the memory region which contains a specific section of
the pack file we invoke use_pack(), supplying the address of our
current window cursor:

  unsigned int len;
  unsigned char *addr = use_pack(p, &w_curs, offset, &len);

the returned address `addr` will be the first byte at `offset`
within the pack file.  The optional variable len will also be
updated with the number of bytes remaining following the address.

Multiple calls to use_pack() with the same window cursor will
update the window cursor, moving it from one window to another
when necessary.  In this way each window cursor variable maintains
only one struct pack_window inuse at a time.

Finally before exiting the scope which originally declared the window
cursor we must invoke unuse_pack() to unuse the current window (which
may be different from the one that was first obtained from use_pack):

  unuse_pack(&w_curs);

This implementation is still not complete with regards to multiple
windows, as only one window per pack file is supported right now.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

03e79c88

Refactor how we open pack files to prepare for multiple windows. · 9bc879c1

由 Shawn O. Pearce 提交于 12月 23, 2006

To efficiently support mmaping of multiple regions of the same pack
file we want to keep the pack's file descriptor open while we are
actively working with that pack.  So we are now keeping that file
descriptor in packed_git.pack_fd and closing it only after we unmap
the last window.

This is going to increase the number of file descriptors that are
in use at once, however that will be bounded by the total number of
pack files present and therefore should not be very high.  It is
a small tradeoff which we may need to revisit after some testing
can be done on various repositories and systems.

For code clarity we also want to seperate out the implementation
of how we open a pack file from the implementation which locates
a suitable window (or makes a new one) from the given pack file.
Since this is a rather large delta I'm taking advantage of doing
it now, in a fairly isolated change.

When we open a pack file we need to examine the header and trailer
without having a mmap in place, as we may only need to mmap
the middle section of this particular pack.  Consequently the
verification code has been refactored to make use of the new
read_or_die function.
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
Signed-off-by: NJunio C Hamano <junkio@cox.net>

9bc879c1

李少辉-开发者 / git 与 Fork 源项目一致

李少辉-开发者 / git
与 Fork 源项目一致