• I
    fetch-pack: new --stdin option to read refs from stdin · 078b895f
    Ivan Todoroski 提交于
    If a remote repo has too many tags (or branches), cloning it over the
    smart HTTP transport can fail because remote-curl.c puts all the refs
    from the remote repo on the fetch-pack command line. This can make the
    command line longer than the global OS command line limit, causing
    fetch-pack to fail.
    
    This is especially a problem on Windows where the command line limit is
    orders of magnitude shorter than Linux. There are already real repos out
    there that msysGit cannot clone over smart HTTP due to this problem.
    
    Here is an easy way to trigger this problem:
    
    	git init too-many-refs
    	cd too-many-refs
    	echo bla > bla.txt
    	git add .
    	git commit -m test
    	sha=$(git rev-parse HEAD)
    	tag=$(perl -e 'print "bla" x 30')
    	for i in `seq 50000`; do
    		echo $sha refs/tags/$tag-$i >> .git/packed-refs
    	done
    
    Then share this repo over the smart HTTP protocol and try cloning it:
    
    	$ git clone http://localhost/.../too-many-refs/.git
    	Cloning into 'too-many-refs'...
    	fatal: cannot exec 'fetch-pack': Argument list too long
    
    50k tags is obviously an absurd number, but it is required to
    demonstrate the problem on Linux because it has a much more generous
    command line limit. On Windows the clone fails with as little as 500
    tags in the above loop, which is getting uncomfortably close to the
    number of tags you might see in real long lived repos.
    
    This is not just theoretical, msysGit is already failing to clone our
    company repo due to this. It's a large repo converted from CVS, nearly
    10 years of history.
    
    Four possible solutions were discussed on the Git mailing list (in no
    particular order):
    
    1) Call fetch-pack multiple times with smaller batches of refs.
    
    This was dismissed as inefficient and inelegant.
    
    2) Add option --refs-fd=$n to pass a an fd from where to read the refs.
    
    This was rejected because inheriting descriptors other than
    stdin/stdout/stderr through exec() is apparently problematic on Windows,
    plus it would require changes to the run-command API to open extra
    pipes.
    
    3) Add option --refs-from=$tmpfile to pass the refs using a temp file.
    
    This was not favored because of the temp file requirement.
    
    4) Add option --stdin to pass the refs on stdin, one per line.
    
    In the end this option was chosen as the most efficient and most
    desirable from scripting perspective.
    
    There was however a small complication when using stdin to pass refs to
    fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
    communication with the remote server.
    
    If we are going to sneak refs on stdin line by line, it would have to be
    done very carefully in the presence of --stateless-rpc, because when
    reading refs line by line we might read ahead too much data into our
    buffer and eat some of the remote protocol data which is also coming on
    stdin.
    
    One way to solve this would be to refactor get_remote_heads() in
    fetch-pack.c to accept a residual buffer from our stdin line parsing
    above, but this function is used in several places so other callers
    would be burdened by this residual buffer interface even when most of
    them don't need it.
    
    In the end we settled on the following solution:
    
    If --stdin is specified without --stateless-rpc, fetch-pack would read
    the refs from stdin one per line, in a script friendly format.
    
    However if --stdin is specified together with --stateless-rpc,
    fetch-pack would read the refs from stdin in packetized format
    (pkt-line) with a flush packet terminating the list of refs. This way we
    can read the exact number of bytes that we need from stdin, and then
    get_remote_heads() can continue reading from the same fd without losing
    a single byte of remote protocol data.
    
    This way the --stdin option only loses generality and scriptability when
    used together with --stateless-rpc, which is not easily scriptable
    anyway because it also uses pkt-line when talking to the remote server.
    Signed-off-by: NIvan Todoroski <grnch@gmx.net>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    078b895f
fetch-pack.c 26.1 KB