- 19 1月, 2007 2 次提交
-
-
由 Shawn O. Pearce 提交于
Currently the pack .idx file format uses 32-bit unsigned integers for the fan-out table and the object offsets. We had previously defined these as 'unsigned int', but not every system will define that type to be a 32 bit value. To ensure maximum portability we should always use 'uint32_t'. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Previously we were using 'unsigned int' to update the hdr_entries field of the pack header after the file had been completed and was being hashed. This may not be 32 bits on all platforms. Instead we want to always uint32_t. I'm actually cheating here by just using the pack_header like the rest of Git and letting the struct definition declare the correct type. Right now that field is still 'unsigned int' (wrong) but a pending change submitted by Simon 'corecode' Schubert changes it to uint32_t. After that change is merged in fast-import will do the right thing all of the time. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
- 17 1月, 2007 5 次提交
-
-
由 Shawn O. Pearce 提交于
Branches are only contained by a packfile if the branch actually had its most recent commit in that packfile. So new branches are set to MAX_PACK_ID to ensure they don't cause their commit to list as part of the first packfile when it closes out if the commit was actually in existance before fast-import started. Also corrected the type of last_commit to be umaxint_t to prevent overflow and wraparound on very large imports. Though that is highly unlikely to occur as we're talking 4 billion commits, which no real project has right now. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Apparently the git convention is to declare any function which takes no arguments as taking void. I did not do this during the early fast-import development, but should have. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
The length of an atom string cannot be negative. So make it explicit and declare it as an unsigned value. The shift width in a mark table node also cannot be negative. I'm also moving it to after the pointer arrays to prevent any possible alignment problems on a 64 bit system. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Now that fast-import uses uintmax_t (the largest available unsigned integer type) for marks we don't want to say its an unsigned 32 bit integer in ASCII base 10 notation. It could be much larger, especially on 64 bit systems, and especially if a frontend uses a very large number of marks (1 per file revision on a very, very large import). Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
To help callers repack very large repositories into a series of packfiles fast-import now outputs the last commits/tags it wrote to a packfile when it prints out the packfile name. This information can be feed to pack-objects --revs to repack. For the first pack of an initial import this is pretty easy (just feed those SHA1s on stdin) but for subsequent packs you want to feed the subsequent pack's final SHA1s but also all prior pack's SHA1s prefixed with the negation operator. This way the prior pack's data does not get included into the subsequent pack. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
- 16 1月, 2007 9 次提交
-
-
由 Shawn O. Pearce 提交于
Since object_count is limited to 'unsigned long' (really an unsigned 32 bit integer value) by the pack file format we may as well use exactly that type here in fast-import for that counter. An earlier change by me incorrectly made it uintmax_t. But since object_count is a counter for the current packfile only, we don't want to output its value at the end. Instead we should sum up the individual type counters and report that total, as that will cover all of the packfiles. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Apparently amd64 has defined 'unsigned long' to be a 64 bit value, which means -1 was way over the 4 GiB packfile limit. Whoops. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Much like the pack_sha1 the pack_fd is an unnecessary global variable, we already have the fd stored in our struct packed_git *pack_data so that the core library functions in sha1_file.c are able to lookup and decompress object data that we have previously written. Keeping an extra copy of this value in our own variable is just a hold-over from earlier versions of fast-import and is now completely unnecessary. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Because we are renaming the packfile into its file destination we need to be sure its not open when the rename is called, otherwise some operating systems (e.g. Windows) may prevent the rename from occurring. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Because fast-import automatically updates all references (heads and tags) at the end of its run the repository is corrupt unless the objects are available in the .git/objects/pack directory prior to the refs being modified. The easiest way to ensure that is true is to move the packfile and its associated index directly into the .git/objects/pack directory as soon as we have finished output to it. But the only safe way to do this is to create the a temporary .keep file for that pack, so we use the same tricks that index-pack uses when its being invoked by receive-pack. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Rather than maintaing our own packfile level sha1 variable we can make use of the one already available in struct packed_git. Its meant for the SHA1 of the index but it can also hold the SHA1 of the packfile itself between final checksumming of the packfile and creation of the index. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Prior to git having read_in_full() fast-import used its own private function yread to perform the header reading task. No sense in keeping that around now that read_in_full is a public, stable function. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
If a frontend wants to use a mark per file revision and per commit and is doing a truly huge import (such as a 32 GiB SVN repository) we may need more than 2**32 unique mark values, especially if the frontend is unable (or unwilling) to recycle mark values. For mark idnums we should use the largest unsigned integer type available, hoping that will be at least 64 bits when we are compiled as a 64 bit executable. This way we may consume huge amounts of memory storing our mark table, but we'll at least be able to process the entire import without failing. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
If we previously were using a delta but we needed to checkpoint the current packfile and switch to a new packfile we need to throw away the delta and compress the raw object by itself, as delta chains cannot span non-thin packfiles. Unfortunately the output buffer in this case needs to grow, as the size of the compressed object may be quite a bit larger than the size of the compressed delta. I've also avoided recompressing the object if we are checkpointing and we didn't use a delta. In this case the output buffer is the correct size and has already been populated with the right data, we just need to close out the current packfile and open a new one. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
- 15 1月, 2007 9 次提交
-
-
由 Shawn O. Pearce 提交于
Caller scripts may want to know what packfiles the fast-import process just wrote out for them. This is now output to stdout, one packfile name per line, after we checkpoint each packfile. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
When the number of objects or number of bytes gets close to the limit allowed by the packfile format (or configured on the command line by our caller) we should automatically checkpoint the current packfile and start a new one before writing the object out. This does however require that we abandon the delta (if we had one) as its not valid in a new packfile. I also added the simple rule that if we got a delta back but the delta itself is the same size as or larger than the uncompressed object to ignore the delta and just store the object data. This should avoid some really bad behavior caused by our current delta strategy. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
When we are generating multiple packfiles at once we only need to scan the blocks of object_entry structs which contain objects for the current packfile. Because the most recent blocks are at the front of the linked list, and because all new objects going into the current file are allocated from the front of that list, we can stop scanning for objects as soon as we identify one which doesn't belong to the current packfile. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
If the last packfile is going to be empty (has 0 objects) then it shouldn't be kept after the import has terminated, as there is no point to the packfile. So rather than hashing it and making the index file, just delete the packfile. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
To help importers which are dealing with massive amounts of data fast-import needs to be able to close the packfile it is currently writing to and open a new packfile for any additional data that will be received. A new 'checkpoint' command has been introduced which can be used by the frontend import process to force this to occur at any time. This may be useful to ensure a very long running import doesn't lose any work due to unexpected failures. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
There is little reason to be keeping a global duplicate_count value when we also keep it per object type. The global counter can easily be computed at the end, once all processing has completed. This saves us a couple of machine instructions in an unimportant part of code. But it looks slightly better to me to not keep two counters around. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Now that we are starting to see some really large projects (such as KDE or a fork of FreeBSD) get imported into Git we're running into the upper limit on packfile object count as well as overall byte length. The KDE and FreeBSD projects are both likely to require more than 4 GiB to store their current history, which means we really need multiple packfiles to handle their content. This is a fairly simple restructuring of the internal code to help us support creating multiple packfiles from within fast-import. We are now adding a 5 digit incrementing suffix to the end of the basename supplied to us by the caller, permitting up to 99,999 packs to be generated in a single fast-import run. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Now that the sha1_file.c library routines use the sliding mmap routines to perform efficient access to portions of a packfile I can remove that code from fast-import.c and just invoke it. One benefit is we now have reloading support for any packfile which uses OBJ_OFS_DELTA. Another is we have significantly less code to maintain. This code reuse change *requires* that fast-import generate only an OBJ_OFS_DELTA format packfile, as there is absolutely no index available to perform OBJ_REF_DELTA lookup in while unpacking an object. This is probably reasonable to require as the delta offsets result in smaller packfiles and are faster to unpack, as no index searching is required. Its also only a temporary requirement as users could always repack without offsets before making the import available to older versions of Git. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
- 14 1月, 2007 15 次提交
-
-
由 Shawn O. Pearce 提交于
I'm bringing master in early so that the OBJ_OFS_DELTA implementation is available as part of the topic. This way git-fast-import can learn about this new slightly smaller and faster packfile format, and can generate them directly rather than needing to have them be repacked with git-pack-objects. Due to the API changes in master during the period of development of git-fast-import, a few minor tweaks to fast-import.c are needed to produce a working merge. I've done them here as part of the merge to ensure bisection always works. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Some importers may want to create a branch long before they actually commit to it, or in some cases they may never commit to the branch but they still need the ref to be created in the repository after the import is complete. This extends the 'reset ' command to automatically create a new branch if the supplied reference isn't already known as a branch. While I'm at it I also modified the syntax of the reset command to terminate with an empty line, like commit and tag operate. This just makes the command set more consistent. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Some importers are able to determine when branch merges occurred within their source data. In these cases they will want to supply the correct commits to fast-import so that a proper merge commit will exist in Git. This is now supported by supplying a 'merge ' command after the commit message and optional from command. A merge is not actually performed by fast-import, its assumed that the frontend performed any sort of merging activity already and that fast-import should simply be storing its result. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Apparently we did not copy the blob SHA1 into the stack variable 'sha1' when a mark is used to refer to a prior blob. This code was not previously tested as the Mozilla CVS -> git-fast-import program always fed us full SHA1s for modified blobs and did not use the mark feature there. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
The new tree delta implementation caused blob SHA1s to be used instead of a tree SHA1 when a tree was written out. This really only appeared to happen when converting an existing file to a tree, but may have been possible in some other situations. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Since most commits and tag objects are around the same size and we only generate one at a time we can reuse the same buffer rather than xmalloc'ing and free'ing the buffer every time we generate a commit. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
We only ever generate at most two tree streams at a time. Since most trees are around the same size we can simply recycle the buffers from one tree generation to the next rather than constantly xmalloc'ing and free'ing them. This should perform slightly better when handling a large number of trees as malloc has less work to do. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
We now store for every tree entry two modes and two sha1 values; the base (aka "version 0") and the current/new (aka "version 1"). When we generate a tree object we also regenerate the prior version object and use that as our base object for a delta. This strategy saves a significant amount of memory as we can continue to use the atom pool for file/directory names and only increases each tree entry by an additional 24 bytes of memory. Branches should automatically delta against their ancestor tree, unless the ancestor tree is already at the delta chain limit. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Sometimes an import frontend may need to work with a temporary branch which will actually contain many different branches over the life of the import. This is especially useful when the frontend needs to create a tag from a set of file versions which are otherwise never a commit. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
When generating a very large pack file (for example close to 1 GB in size) it may be impossible for the kernel to find a contiguous free range within a 32 bit address space for the mapping to be located at. This is especially problematic on large imports where there is a lot of malloc activity occuring within the same process and the malloc'd regions may straddle the previously mapped regions, thereby creating large holes in the address space. So instead we map only 128 MB of the pack at any given time. This will likely increase the number of times the file gets mapped (with additional system time required to update the page tables more frequently) but will allow the program to handle packs up to 4 GB in size. Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-
由 Shawn O. Pearce 提交于
Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
-