1. 23 2月, 2006 4 次提交
    • N
      nicer eye candies for pack-objects · b2504a0d
      Nicolas Pitre 提交于
      This provides a stable and simpler progress reporting mechanism that
      updates progress as often as possible but accurately not updating more
      than once a second.  The deltification phase is also made more
      interesting to watch (since repacking a big repository and only seeing a
      dot appear once every many seconds is rather boring and doesn't provide
      much food for anticipation).
      Signed-off-by: NNicolas Pitre <nico@cam.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      b2504a0d
    • J
      pack-objects: avoid delta chains that are too long. · 15b4d577
      Junio C Hamano 提交于
      This tries to rework the solution for the excess delta chain
      problem. An earlier commit worked it around ``cheaply'', but
      repeated repacking risks unbound growth of delta chains.
      
      This version counts the length of delta chain we are reusing
      from the existing pack, and makes sure a base object that has
      sufficiently long delta chain does not get deltified.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      15b4d577
    • J
      pack-objects: finishing touches. · ab7cd7bb
      Junio C Hamano 提交于
      This introduces --no-reuse-delta option to disable reusing of
      existing delta, which is a large part of the optimization
      introduced by this series.  This may become necessary if
      repeated repacking makes delta chain too long.  With this, the
      output of the command becomes identical to that of the older
      implementation.  But the performance suffers greatly.
      
      It still allows reusing non-deltified representations; there is
      no point uncompressing and recompressing the whole text.
      
      It also adds a couple more statistics output, while squelching
      it under -q flag, which the last round forgot to do.
      
        $ time old-git-pack-objects --stdout >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects....................
        real    12m8.530s       user    11m1.450s       sys     0m57.920s
        $ time git-pack-objects --stdout >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects.....................
        Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
        real    0m59.549s       user    0m56.670s       sys     0m2.400s
        $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects.....................
        Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
        real    11m13.830s      user    9m45.240s       sys     0m44.330s
      
      There is one remaining issue when --no-reuse-delta option is not
      used.  It can create delta chains that are deeper than specified.
      
          A<--B<--C<--D   E   F   G
      
      Suppose we have a delta chain A to D (A is stored in full either
      in a pack or as a loose object. B is depth1 delta relative to A,
      C is depth2 delta relative to B...) with loose objects E, F, G.
      And we are going to pack all of them.
      
      B, C and D are left as delta against A, B and C respectively.
      So A, E, F, and G are examined for deltification, and let's say
      we decided to keep E expanded, and store the rest as deltas like
      this:
      
          E<--F<--G<--A
      
      Oops.  We ended up making D a bit too deep, didn't we?  B, C and
      D form a chain on top of A!
      
      This is because we did not know what the final depth of A would
      be, when we checked objects and decided to keep the existing
      delta.  Unfortunately, deferring the decision until just before
      the deltification is not an option.  To be able to make B, C,
      and D candidates for deltification with the rest, we need to
      know the type and final unexpanded size of them, but the major
      part of the optimization comes from the fact that we do not read
      the delta data to do so -- getting the final size is quite an
      expensive operation.
      
      To prevent this from happening, we should keep A from being
      deltified.  But how would we tell that, cheaply?
      
      To do this most precisely, after check_object() runs, each
      object that is used as the base object of some existing delta
      needs to be marked with the maximum depth of the objects we
      decided to keep deltified (in this case, D is depth 3 relative
      to A, so if no other delta chain that is longer than 3 based on
      A exists, mark A with 3).  Then when attempting to deltify A, we
      would take that number into account to see if the final delta
      chain that leads to D becomes too deep.
      
      However, this is a bit cumbersome to compute, so we would cheat
      and reduce the maximum depth for A arbitrarily to depth/4 in
      this implementation.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      ab7cd7bb
    • J
      pack-objects: reuse data from existing packs. · 3f9ac8d2
      Junio C Hamano 提交于
      When generating a new pack, notice if we have already needed
      objects in existing packs.  If an object is stored deltified,
      and its base object is also what we are going to pack, then
      reuse the existing deltified representation unconditionally,
      bypassing all the expensive find_deltas() and try_deltas()
      calls.
      
      Also, notice if what we are going to write out exactly match
      what is already in an existing pack (either deltified or just
      compressed).  In such a case, we can just copy it instead of
      going through the usual uncompressing & recompressing cycle.
      
      Without this patch, in linux-2.6 repository with about 1500
      loose objects and a single mega pack:
      
          $ git-rev-list --objects v2.6.16-rc3 >RL
          $ wc -l RL
          184141 RL
          $ time git-pack-objects p <RL
          Generating pack...
          Done counting 184141 objects.
          Packing 184141 objects....................
          a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
      
          real    12m4.323s
          user    11m2.560s
          sys     0m55.950s
      
      With this patch, the same input:
      
          $ time ../git.junio/git-pack-objects q <RL
          Generating pack...
          Done counting 184141 objects.
          Packing 184141 objects.....................
          a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
          Total 184141, written 184141, reused 182441
      
          real    1m2.608s
          user    0m55.090s
          sys     0m1.830s
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      3f9ac8d2
  2. 13 2月, 2006 1 次提交
  3. 12 2月, 2006 1 次提交
  4. 29 12月, 2005 1 次提交
  5. 09 12月, 2005 1 次提交
  6. 29 11月, 2005 1 次提交
  7. 22 11月, 2005 1 次提交
  8. 27 10月, 2005 1 次提交
    • J
      pack-objects: Allow use of pre-generated pack. · f3123c4a
      Junio C Hamano 提交于
      git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache
      directory, when a necessary pack is found.  This is hopefully useful
      when upload-pack (called from git-daemon) is expected to receive
      requests for the same set of objects many times (e.g full cloning
      request of any project, or updates from the set of heads previous day
      to the latest for a slow moving project).
      
      Currently git-pack-objects does *not* keep pack files it creates for
      reusing.  It might be useful to add --update-cache option to it,
      which would allow it store pack files it created in the pack-cache
      directory, and prune rarely used ones from it.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      f3123c4a
  9. 15 10月, 2005 1 次提交
  10. 14 10月, 2005 1 次提交
    • L
      Add support for "local" packing · 64560374
      Linus Torvalds 提交于
      This adds the "--local" flag to git-pack-objects, which acts like
      "--incremental", except that instead of ignoring all packed objects, it
      only ignores objects that are packed and in an alternate object tree.
      
      As a result, it effectively only does a local re-pack: any remote-packed
      objects will stay in the alternate object directories.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      64560374
  11. 13 10月, 2005 1 次提交
    • J
      Fix packname hash generation. · 84c8d8ae
      Junio C Hamano 提交于
      This changes the generation of hash packfiles have in their names, from
      "hash of object names as fed to us" to "hash of object names in the
      resulting pack, in the order they appear in the index file".  The new
      "git-index-pack" command is taught to output the computed hash value
      to its standard output.
      
      With this, we can store downloaded pack in a temporary file without
      knowing its final name, run git-index-pack to generate idx for it
      while finding out its final name, and then rename the pack and idx to
      their final names.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      84c8d8ae
  12. 09 8月, 2005 1 次提交
  13. 04 7月, 2005 3 次提交
  14. 30 6月, 2005 1 次提交
  15. 29 6月, 2005 6 次提交
  16. 28 6月, 2005 2 次提交
  17. 27 6月, 2005 4 次提交
    • L
      csum-file interface updates: return resulting SHA1 · e1808845
      Linus Torvalds 提交于
      Also, make the writing of the SHA1 as a end-header be conditional: not
      every user will necessarily want to write the SHA1 to the file itself,
      even though current users do (but we migh end up using the same helper
      functions for the object files themselves, that don't do this).
      
      This also makes the packed index file contain the SHA1 of the packed
      data file at the end (just before its own SHA1).  That way you can
      validate the pairing of the two if you want to.
      e1808845
    • L
      git-pack-objects: write the pack files with a SHA1 csum · c38138cd
      Linus Torvalds 提交于
      We want to be able to check their integrity later, and putting the
      sha1-sum of the contents at the end is a good thing.  The writing
      routines are generic, so we could try to re-use them for the index file,
      instead of having the same logic duplicated.
      
      Update unpack-objects to know about the extra 20 bytes at the end
      of the index.
      c38138cd
    • L
      git-pack-objects: use name information (if any) to sort objects for packing. · 27225f2e
      Linus Torvalds 提交于
      This is incredibly cheezy. But it's cheap, and it works pretty well.
      27225f2e
    • L
      git-pack-objects: do the delta search in reverse size order · 521a4f4c
      Linus Torvalds 提交于
      Starting from big objects and going backwards means that we end up
      picking a delta that goes from a bigger object to a smaller one.  That's
      advantageous for two reasons: the bigger object is likely the newer one
      (since things tend to grow, rather than shrink), and doing a delete
      tends to be smaller than doing an add.
      
      So the deltas don't tend to be top-of-tree, and the packed end result is
      just slightly smaller.
      521a4f4c
  18. 26 6月, 2005 9 次提交