diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt new file mode 100644 index 0000000000000000000000000000000000000000..16308731fb8e134699ef0b59ec5505bb7329e5ca --- /dev/null +++ b/Documentation/git-fast-import.txt @@ -0,0 +1,655 @@ +git-fast-import(1) +================== + +NAME +---- +git-fast-import - Backend for fast Git data importers. + + +SYNOPSIS +-------- +frontend | 'git-fast-import' [options] + +DESCRIPTION +----------- +This program is usually not what the end user wants to run directly. +Most end users want to use one of the existing frontend programs, +which parses a specific type of foreign source and feeds the contents +stored there to git-fast-import (gfi). + +gfi reads a mixed command/data stream from standard input and +writes one or more packfiles directly into the current repository. +When EOF is received on standard input, fast import writes out +updated branch and tag refs, fully updating the current repository +with the newly imported data. + +The gfi backend itself can import into an empty repository (one that +has already been initialized by gitlink:git-init[1]) or incrementally +update an existing populated repository. Whether or not incremental +imports are supported from a particular foreign source depends on +the frontend program in use. + + +OPTIONS +------- +--max-pack-size=:: + Maximum size of each output packfile, expressed in MiB. + The default is 4096 (4 GiB) as that is the maximum allowed + packfile size (due to file format limitations). Some + importers may wish to lower this, such as to ensure the + resulting packfiles fit on CDs. + +--depth=:: + Maximum delta depth, for blob and tree deltification. + Default is 10. + +--active-branches=:: + Maximum number of branches to maintain active at once. + See ``Memory Utilization'' below for details. Default is 5. + +--export-marks=:: + Dumps the internal marks table to when complete. + Marks are written one per line as `:markid SHA-1`. + Frontends can use this file to validate imports after they + have been completed. + +--branch-log=:: + Records every tag and commit made to a log file. (This file + can be quite verbose on large imports.) This particular + option has been primarily intended to facilitate debugging + gfi and has limited usefulness in other contexts. It may + be removed in future versions. + + +Performance +----------- +The design of gfi allows it to import large projects in a minimum +amount of memory usage and processing time. Assuming the frontend +is able to keep up with gfi and feed it a constant stream of data, +import times for projects holding 10+ years of history and containing +100,000+ individual commits are generally completed in just 1-2 +hours on quite modest (~$2,000 USD) hardware. + +Most bottlenecks appear to be in foreign source data access (the +source just cannot extract revisions fast enough) or disk IO (gfi +writes as fast as the disk will take the data). Imports will run +faster if the source data is stored on a different drive than the +destination Git repository (due to less IO contention). + + +Development Cost +---------------- +A typical frontend for gfi tends to weigh in at approximately 200 +lines of Perl/Python/Ruby code. Most developers have been able to +create working importers in just a couple of hours, even though it +is their first exposure to gfi, and sometimes even to Git. This is +an ideal situation, given that most conversion tools are throw-away +(use once, and never look back). + + +Parallel Operation +------------------ +Like `git-push` or `git-fetch`, imports handled by gfi are safe to +run alongside parallel `git repack -a -d` or `git gc` invocations, +or any other Git operation (including `git prune`, as loose objects +are never used by gfi). + +However, gfi does not lock the branch or tag refs it is actively +importing. After EOF, during its ref update phase, gfi blindly +overwrites each imported branch or tag ref. Consequently it is not +safe to modify refs that are currently being used by a running gfi +instance, as work could be lost when gfi overwrites the refs. + + +Technical Discussion +-------------------- +gfi tracks a set of branches in memory. Any branch can be created +or modified at any point during the import process by sending a +`commit` command on the input stream. This design allows a frontend +program to process an unlimited number of branches simultaneously, +generating commits in the order they are available from the source +data. It also simplifies the frontend programs considerably. + +gfi does not use or alter the current working directory, or any +file within it. (It does however update the current Git repository, +as referenced by `GIT_DIR`.) Therefore an import frontend may use +the working directory for its own purposes, such as extracting file +revisions from the foreign source. This ignorance of the working +directory also allows gfi to run very quickly, as it does not +need to perform any costly file update operations when switching +between branches. + +Input Format +------------ +With the exception of raw file data (which Git does not interpret) +the gfi input format is text (ASCII) based. This text based +format simplifies development and debugging of frontend programs, +especially when a higher level language such as Perl, Python or +Ruby is being used. + +gfi is very strict about its input. Where we say SP below we mean +*exactly* one space. Likewise LF means one (and only one) linefeed. +Supplying additional whitespace characters will cause unexpected +results, such as branch names or file names with leading or trailing +spaces in their name, or early termination of gfi when it encounters +unexpected input. + +Commands +~~~~~~~~ +gfi accepts several commands to update the current repository +and control the current import process. More detailed discussion +(with examples) of each command follows later. + +`commit`:: + Creates a new branch or updates an existing branch by + creating a new commit and updating the branch to point at + the newly created commit. + +`tag`:: + Creates an annotated tag object from an existing commit or + branch. Lightweight tags are not supported by this command, + as they are not recommended for recording meaningful points + in time. + +`reset`:: + Reset an existing branch (or a new branch) to a specific + revision. This command must be used to change a branch to + a specific revision without making a commit on it. + +`blob`:: + Convert raw file data into a blob, for future use in a + `commit` command. This command is optional and is not + needed to perform an import. + +`checkpoint`:: + Forces gfi to close the current packfile, generate its + unique SHA-1 checksum and index, and start a new packfile. + This command is optional and is not needed to perform + an import. + +`commit` +~~~~~~~~ +Create or update a branch with a new commit, recording one logical +change to the project. + +.... + 'commit' SP LF + mark? + ('author' SP SP LT GT SP