1. 21 2月, 2013 1 次提交
    • J
      pkt-line: teach packet_read_line to chomp newlines · 819b929d
      Jeff King 提交于
      The packets sent during ref negotiation are all terminated
      by newline; even though the code to chomp these newlines is
      short, we end up doing it in a lot of places.
      
      This patch teaches packet_read_line to auto-chomp the
      trailing newline; this lets us get rid of a lot of inline
      chomping code.
      
      As a result, some call-sites which are not reading
      line-oriented data (e.g., when reading chunks of packfiles
      alongside sideband) transition away from packet_read_line to
      the generic packet_read interface. This patch converts all
      of the existing callsites.
      
      Since the function signature of packet_read_line does not
      change (but its behavior does), there is a possibility of
      new callsites being introduced in later commits, silently
      introducing an incompatibility.  However, since a later
      patch in this series will change the signature, such a
      commit would have to be merged directly into this commit,
      not to the tip of the series; we can therefore ignore the
      issue.
      
      This is an internal cleanup and should produce no change of
      behavior in the normal case. However, there is one corner
      case to note. Callers of packet_read_line have never been
      able to tell the difference between a flush packet ("0000")
      and an empty packet ("0004"), as both cause packet_read_line
      to return a length of 0. Readers treat them identically,
      even though Documentation/technical/protocol-common.txt says
      we must not; it also says that implementations should not
      send an empty pkt-line.
      
      By stripping out the newline before the result gets to the
      caller, we will now treat the newline-only packet ("0005\n")
      the same as an empty packet, which in turn gets treated like
      a flush packet. In practice this doesn't matter, as neither
      empty nor newline-only packets are part of git's protocols
      (at least not for the line-oriented bits, and readers who
      are not expecting line-oriented packets will be calling
      packet_read directly, anyway). But even if we do decide to
      care about the distinction later, it is orthogonal to this
      patch.  The right place to tighten would be to stop treating
      empty packets as flush packets, and this change does not
      make doing so any harder.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      819b929d
  2. 27 1月, 2013 1 次提交
    • J
      fetch-pack: avoid repeatedly re-scanning pack directory · b495697b
      Jeff King 提交于
      When we look up a sha1 object for reading via parse_object() =>
      read_sha1_file() => read_object() callpath, we first check
      packfiles, and then loose objects. If we still haven't found it, we
      re-scan the list of packfiles in `objects/pack`. This final step
      ensures that we can co-exist with a simultaneous repack process
      which creates a new pack and then prunes the old object.
      
      This extra re-scan usually does not have a performance impact for
      two reasons:
      
        1. If an object is missing, then typically the re-scan will find a
           new pack, then no more misses will occur.  Or if it truly is
           missing, then our next step is usually to die().
      
        2. Re-scanning is cheap enough that we do not even notice.
      
      However, these do not always hold. The assumption in (1) is that the
      caller is expecting to find the object. This is usually the case,
      but the call to `parse_object` in `everything_local` does not follow
      this pattern. It is looking to see whether we have objects that the
      remote side is advertising, not something we expect to
      have. Therefore if we are fetching from a remote which has many refs
      pointing to objects we do not have, we may end up re-scanning the
      pack directory many times.
      
      Even with this extra re-scanning, the impact is often not noticeable
      due to (2); we just readdir() the packs directory and skip any packs
      that are already loaded. However, if there are a large number of
      packs, even enumerating the directory can be expensive, especially
      if we do it repeatedly.
      
      Having this many packs is a good sign the user should run `git gc`,
      but it would still be nice to avoid having to scan the directory at
      all.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b495697b
  3. 20 12月, 2012 1 次提交
  4. 29 10月, 2012 2 次提交
  5. 13 9月, 2012 11 次提交
  6. 14 8月, 2012 1 次提交
  7. 11 8月, 2012 2 次提交
    • J
      fetch-pack: do not ask for unadvertised capabilities · 74991a98
      Junio C Hamano 提交于
      In the same spirit as the previous fix, stop asking for thin-pack, no-progress
      and include-tag capabilities when the other end does not claim to support them.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      74991a98
    • J
      do not send client agent unless server does first · d50c3871
      Jeff King 提交于
      Commit ff5effdf taught both clients and servers of the git protocol
      to send an "agent" capability that just advertises their version for
      statistics and debugging purposes.  The protocol-capabilities.txt
      document however indicates that the client's advertisement is
      actually a response, and should never include capabilities not
      mentioned in the server's advertisement.
      
      Adding the unconditional advertisement in the server programs was
      OK, then, but the clients broke the protocol.  The server
      implementation of git-core itself does not care, but at least one
      does: the Google Code git server (or any server using Dulwich), will
      hang up with an internal error upon seeing an unknown capability.
      
      Instead, each client must record whether we saw an agent string from
      the server, and respond with its agent only if the server mentioned
      it first.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d50c3871
  8. 04 8月, 2012 1 次提交
    • J
      include agent identifier in capability string · ff5effdf
      Jeff King 提交于
      Instead of having the client advertise a particular version
      number in the git protocol, we have managed extensions and
      backwards compatibility by having clients and servers
      advertise capabilities that they support. This is far more
      robust than having each side consult a table of
      known versions, and provides sufficient information for the
      protocol interaction to complete.
      
      However, it does not allow servers to keep statistics on
      which client versions are being used. This information is
      not necessary to complete the network request (the
      capabilities provide enough information for that), but it
      may be helpful to conduct a general survey of client
      versions in use.
      
      We already send the client version in the user-agent header
      for http requests; adding it here allows us to gather
      similar statistics for non-http requests.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ff5effdf
  9. 25 5月, 2012 1 次提交
  10. 23 5月, 2012 8 次提交
  11. 03 4月, 2012 1 次提交
    • I
      fetch-pack: new --stdin option to read refs from stdin · 078b895f
      Ivan Todoroski 提交于
      If a remote repo has too many tags (or branches), cloning it over the
      smart HTTP transport can fail because remote-curl.c puts all the refs
      from the remote repo on the fetch-pack command line. This can make the
      command line longer than the global OS command line limit, causing
      fetch-pack to fail.
      
      This is especially a problem on Windows where the command line limit is
      orders of magnitude shorter than Linux. There are already real repos out
      there that msysGit cannot clone over smart HTTP due to this problem.
      
      Here is an easy way to trigger this problem:
      
      	git init too-many-refs
      	cd too-many-refs
      	echo bla > bla.txt
      	git add .
      	git commit -m test
      	sha=$(git rev-parse HEAD)
      	tag=$(perl -e 'print "bla" x 30')
      	for i in `seq 50000`; do
      		echo $sha refs/tags/$tag-$i >> .git/packed-refs
      	done
      
      Then share this repo over the smart HTTP protocol and try cloning it:
      
      	$ git clone http://localhost/.../too-many-refs/.git
      	Cloning into 'too-many-refs'...
      	fatal: cannot exec 'fetch-pack': Argument list too long
      
      50k tags is obviously an absurd number, but it is required to
      demonstrate the problem on Linux because it has a much more generous
      command line limit. On Windows the clone fails with as little as 500
      tags in the above loop, which is getting uncomfortably close to the
      number of tags you might see in real long lived repos.
      
      This is not just theoretical, msysGit is already failing to clone our
      company repo due to this. It's a large repo converted from CVS, nearly
      10 years of history.
      
      Four possible solutions were discussed on the Git mailing list (in no
      particular order):
      
      1) Call fetch-pack multiple times with smaller batches of refs.
      
      This was dismissed as inefficient and inelegant.
      
      2) Add option --refs-fd=$n to pass a an fd from where to read the refs.
      
      This was rejected because inheriting descriptors other than
      stdin/stdout/stderr through exec() is apparently problematic on Windows,
      plus it would require changes to the run-command API to open extra
      pipes.
      
      3) Add option --refs-from=$tmpfile to pass the refs using a temp file.
      
      This was not favored because of the temp file requirement.
      
      4) Add option --stdin to pass the refs on stdin, one per line.
      
      In the end this option was chosen as the most efficient and most
      desirable from scripting perspective.
      
      There was however a small complication when using stdin to pass refs to
      fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
      communication with the remote server.
      
      If we are going to sneak refs on stdin line by line, it would have to be
      done very carefully in the presence of --stateless-rpc, because when
      reading refs line by line we might read ahead too much data into our
      buffer and eat some of the remote protocol data which is also coming on
      stdin.
      
      One way to solve this would be to refactor get_remote_heads() in
      fetch-pack.c to accept a residual buffer from our stdin line parsing
      above, but this function is used in several places so other callers
      would be burdened by this residual buffer interface even when most of
      them don't need it.
      
      In the end we settled on the following solution:
      
      If --stdin is specified without --stateless-rpc, fetch-pack would read
      the refs from stdin one per line, in a script friendly format.
      
      However if --stdin is specified together with --stateless-rpc,
      fetch-pack would read the refs from stdin in packetized format
      (pkt-line) with a flush packet terminating the list of refs. This way we
      can read the exact number of bytes that we need from stdin, and then
      get_remote_heads() can continue reading from the same fd without losing
      a single byte of remote protocol data.
      
      This way the --stdin option only loses generality and scriptability when
      used together with --stateless-rpc, which is not easily scriptable
      anyway because it also uses pkt-line when talking to the remote server.
      Signed-off-by: NIvan Todoroski <grnch@gmx.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      078b895f
  12. 14 2月, 2012 1 次提交
  13. 13 2月, 2012 3 次提交
  14. 14 12月, 2011 2 次提交
    • J
      fetch-pack: match refs exactly · 1e7ba0f9
      Jeff King 提交于
      When we are determining the list of refs to fetch via
      fetch-pack, we have two sets of refs to compare: those on
      the remote side, and a "match" list of things we want to
      fetch. We iterate through the remote refs alphabetically,
      seeing if each one is wanted by the "match" list.
      
      Since def88e9a (Commit first cut at "git-fetch-pack",
      2005-07-04), we have used the "path_match" function to do a
      suffix match, where a remote ref is considered wanted if
      any of the "match" elements is a suffix of the remote
      refname.
      
      This enables callers of fetch-pack to specify unqualified
      refs and have them matched up with remote refs (e.g., ask
      for "A" and get remote's "refs/heads/A"). However, if you
      provide a fully qualified ref, then there are corner cases
      where we provide the wrong answer. For example, given a
      remote with two refs:
      
         refs/foo/refs/heads/master
         refs/heads/master
      
      asking for "refs/heads/master" will first match
      "refs/foo/refs/heads/master" by the suffix rule, and we will
      erroneously fetch it instead of refs/heads/master.
      
      As it turns out, all callers of fetch_pack do provide
      fully-qualified refs for the match list. There are two ways
      fetch_pack can get match lists:
      
        1. Through the transport code (i.e., via git-fetch)
      
        2. On the command-line of git-fetch-pack
      
      In the first case, we will always be providing the names of
      fully-qualified refs from "struct ref" objects. We will have
      pre-matched those ref objects already (since we have to
      handle more advanced matching, like wildcard refspecs), and
      are just providing a list of the refs whose objects we need.
      
      In the second case, users could in theory be providing
      non-qualified refs on the command-line. However, the
      fetch-pack documentation claims that refs should be fully
      qualified (and has always done so since it was written in
      2005).
      
      Let's change this path_match call to simply check for string
      equality, matching what the callers of fetch_pack are
      expecting.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1e7ba0f9
    • J
      drop "match" parameter from get_remote_heads · afe7c5ff
      Jeff King 提交于
      The get_remote_heads function reads the list of remote refs
      during git protocol session. It dates all the way back to
      def88e9a (Commit first cut at "git-fetch-pack", 2005-07-04).
      At that time, the idea was to come up with a list of refs we
      were interested in, and then filter the list as we got it
      from the remote side.
      
      Later, 1baaae5e (Make maximal use of the remote refs,
      2005-10-28) stopped filtering at the get_remote_heads layer,
      letting us use the non-matching refs to find common history.
      
      As a result, all callers now simply pass an empty match
      list (and any future callers will want to do the same). So
      let's drop these now-useless parameters.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      afe7c5ff
  15. 06 10月, 2011 1 次提交
    • M
      Change check_ref_format() to take a flags argument · 8d9c5010
      Michael Haggerty 提交于
      Change check_ref_format() to take a flags argument that indicates what
      is acceptable in the reference name (analogous to "git
      check-ref-format"'s "--allow-onelevel" and "--refspec-pattern").  This
      is more convenient for callers and also fixes a failure in the test
      suite (and likely elsewhere in the code) by enabling "onelevel" and
      "refspec-pattern" to be allowed independently of each other.
      
      Also rename check_ref_format() to check_refname_format() to make it
      obvious that it deals with refnames rather than references themselves.
      Signed-off-by: NMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      8d9c5010
  16. 05 9月, 2011 2 次提交
  17. 19 8月, 2011 1 次提交