1. 27 9月, 2006 1 次提交
    • N
      introduce delta objects with offset to base · eb32d236
      Nicolas Pitre 提交于
      This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to
      OBJ_REF_DELTA to better make the distinction between those two delta
      objects, and adds support for the handling of those new delta objects
      in sha1_file.c only.
      
      The OBJ_OFS_DELTA contains a relative offset from the delta object's
      position in a pack instead of the 20-byte SHA1 reference to identify
      the base object.  Since the base is likely to be not so far away, the
      relative offset is more likely to have a smaller encoding on average
      than an absolute offset.  And for those delta objects the base must
      always be stored first because there is no way to know the distance of
      later objects when streaming a pack.  Hence this relative offset is
      always meant to be negative.
      
      The offset encoding is slightly denser than the one used for object
      size -- credits to <linux@horizon.com> (whoever this is) for bringing
      it to my attention.
      
      This allows for pack size reduction between 3.2% (Linux-2.6) to over 5%
      (linux-historic).  Runtime pack access should be faster too since delta
      replay does skip a search in the pack index for each delta in a chain.
      Signed-off-by: NNicolas Pitre <nico@cam.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      eb32d236
  2. 23 9月, 2006 1 次提交
    • N
      many cleanups to sha1_file.c · 43057304
      Nicolas Pitre 提交于
      Those cleanups are mainly to set the table for the support of deltas
      with base objects referenced by offsets instead of sha1.  This means
      that many pack lookup functions are converted to take a pack/offset
      tuple instead of a sha1.
      
      This eliminates many struct pack_entry usages since this structure
      carried redundent information in many cases, and it increased stack
      footprint needlessly for a couple recursively called functions that used
      to declare a local copy of it for every recursion loop.
      
      In the process, packed_object_info_detail() has been reorganized as well
      so to look much saner and more amenable to deltas with offset support.
      
      Finally the appropriate adjustments have been made to functions that
      depend on the above changes.  But there is no functionality changes yet
      simply some code refactoring at this point.
      Signed-off-by: NNicolas Pitre <nico@cam.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      43057304
  3. 13 9月, 2006 1 次提交
  4. 07 9月, 2006 2 次提交
  5. 04 9月, 2006 2 次提交
  6. 03 9月, 2006 1 次提交
    • J
      pack-objects: re-validate data we copy from elsewhere. · df6d6101
      Junio C Hamano 提交于
      When reusing data from an existing pack and from a new style
      loose objects, we used to just copy it staight into the
      resulting pack.  Instead make sure they are not corrupt, but
      do so only when we are not streaming to stdout, in which case
      the receiving end will do the validation either by unpacking
      the stream or by constructing the .idx file.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      df6d6101
  7. 24 8月, 2006 1 次提交
    • S
      Convert memcpy(a,b,20) to hashcpy(a,b). · e702496e
      Shawn Pearce 提交于
      This abstracts away the size of the hash values when copying them
      from memory location to memory location, much as the introduction
      of hashcmp abstracted away hash value comparsion.
      
      A few call sites were using char* rather than unsigned char* so
      I added the cast rather than open hashcpy to be void*.  This is a
      reasonable tradeoff as most call sites already use unsigned char*
      and the existing hashcmp is also declared to be unsigned char*.
      
      [jc: Splitted the patch to "master" part, to be followed by a
       patch for merge-recursive.c which is not in "master" yet.
      
       Fixed the cast in the latter hunk to combine-diff.c which was
       wrong in the original.
      
       Also converted ones left-over in combine-diff.c, diff-lib.c and
       upload-pack.c ]
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      e702496e
  8. 18 8月, 2006 1 次提交
  9. 16 8月, 2006 1 次提交
  10. 04 8月, 2006 1 次提交
  11. 26 7月, 2006 1 次提交
  12. 24 7月, 2006 1 次提交
  13. 10 7月, 2006 1 次提交
  14. 01 7月, 2006 1 次提交
  15. 30 6月, 2006 2 次提交
  16. 21 6月, 2006 1 次提交
  17. 20 6月, 2006 1 次提交
  18. 06 6月, 2006 1 次提交
    • L
      pack-objects: improve path grouping heuristics. · ce0bd642
      Linus Torvalds 提交于
      This trivial patch not only simplifies the name hashing, it actually
      improves packing for both git and the kernel.
      
      The git archive pack shrinks from 6824090->6622627 bytes (a 3%
      improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
      mere 0.5% improvement, but still, it's an improvement from making the
      hashing much simpler!)
      
      We just create a 32-bit hash, where we "age" previous characters by two
      bits, so the last characters in a filename count most. So when we then
      compare the hashes in the sort routine, filenames that end the same way
      sort the same way.
      
      It takes the subdirectory into account (unless the filename is > 16
      characters), but files with the same name within the same subdirectory
      will obviously sort closer than files in different subdirectories.
      
      And, incidentally (which is why I tried the hash change in the first
      place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.
      
      And no, it's not a "good hash" in the sense of being secure or unique, but
      that's not what we're looking for. The whole "hash" thing is misnamed
      here. It's not so much a hash as a "sorting number".
      
      [jc: rolled in simplification for computing the sorting number
       computation for thin pack base objects]
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      ce0bd642
  19. 31 5月, 2006 1 次提交
    • L
      tree_entry(): new tree-walking helper function · 4c068a98
      Linus Torvalds 提交于
      This adds a "tree_entry()" function that combines the common operation of
      doing a "tree_entry_extract()" + "update_tree_entry()".
      
      It also has a simplified calling convention, designed for simple loops
      that traverse over a whole tree: the arguments are pointers to the tree
      descriptor and a name_entry structure to fill in, and it returns a boolean
      "true" if there was an entry left to be gotten in the tree.
      
      This allows tree traversal with
      
      	struct tree_desc desc;
      	struct name_entry entry;
      
      	desc.buf = tree->buffer;
      	desc.size = tree->size;
      	while (tree_entry(&desc, &entry) {
      		... use "entry.{path, sha1, mode, pathlen}" ...
      	}
      
      which is not only shorter than writing it out in full, it's hopefully less
      error prone too.
      
      [ It's actually a tad faster too - we don't need to recalculate the entry
        pathlength in both extract and update, but need to do it only once.
        Also, some callers can avoid doing a "strlen()" on the result, since
        it's returned as part of the name_entry structure.
      
        However, by now we're talking just 1% speedup on "git-rev-list --objects
        --all", and we're definitely at the point where tree walking is no
        longer the issue any more. ]
      
      NOTE! Not everybody wants to use this new helper function, since some of
      the tree walkers very much on purpose do the descriptor update separately
      from the entry extraction. So the "extract + update" sequence still
      remains as the core sequence, this is just a simplified interface.
      
      We should probably add a silly two-line inline helper function for
      initializing the descriptor from the "struct tree" too, just to cut down
      on the noise from that common "desc" initializer.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      4c068a98
  20. 17 5月, 2006 1 次提交
  21. 16 5月, 2006 3 次提交
    • J
      Fix pack-index issue on 64-bit platforms a bit more portably. · 1b9bc5a7
      Junio C Hamano 提交于
      Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use
      "unsigned int" -- hopefully that would stay 32-bit on every
      platform we care about, at least until we update the pack-index
      file format.
      
      Our sha1 routines optimized for architectures use uint32_t and
      expects '#include <stdint.h>' to be enough, so OpenBSD on arm or
      ppc might have similar issues down the road, I dunno.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      1b9bc5a7
    • N
      pack-object: slightly more efficient · ff45715c
      Nicolas Pitre 提交于
      Avoid creating a delta index for objects with maximum depth since they
      are not going to be used as delta base anyway.  This also reduce peak
      memory usage slightly as the current object's delta index is not useful
      until the next object in the loop is considered for deltification. This
      saves a bit more than 1% on CPU usage.
      Signed-off-by: NNicolas Pitre <nico@cam.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      ff45715c
    • N
      simple euristic for further free packing improvements · 4e8da195
      Nicolas Pitre 提交于
      Given that the early eviction of objects with maximum delta depth
      may exhibit bad packing on its own, why not considering a bias against
      deep base objects in try_delta() to mitigate that bad behavior.
      
      This patch adjust the MAX_size allowed for a delta based on the depth of
      the base object as well as enabling the early eviction of max depth
      objects from the object window.  When used separately, those two things
      produce slightly better and much worse results respectively.  But their
      combined effect is a surprising significant packing improvement.
      
      With this really simple patch the GIT repo gets nearly 15% smaller, and
      the Linux kernel repo about 5% smaller, with no significantly measurable
      CPU usage difference.
      Signed-off-by: NNicolas Pitre <nico@cam.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      4e8da195
  22. 15 5月, 2006 1 次提交
  23. 14 5月, 2006 1 次提交
    • D
      Fix git-pack-objects for 64-bit platforms · 66561f5a
      Dennis Stosberg 提交于
      The offset of an object in the pack is recorded as a 4-byte integer
      in the index file.  When reading the offset from the mmap'ed index
      in prepare_pack_revindex(), the address is dereferenced as a long*.
      This works fine as long as the long type is four bytes wide.  On
      NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing
      the offset produces garbage.
      
      [jc: taking suggestion by Linus to use uint32_t]
      Signed-off-by: NDennis Stosberg <dennis@stosberg.net>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      66561f5a
  24. 06 5月, 2006 1 次提交
    • J
      pack-object: squelch eye-candy on non-tty · 86118bcb
      Junio C Hamano 提交于
      One of my post-update scripts runs a git-fetch into a separate
      repository and sends the results back to me (2>&1); I end up
      getting this in the mail:
      
          Generating pack...
          Done counting 180 objects.
          Result has 131 objects.
          Deltifying 131 objects.
             0% (0/131) done^M   1% (2/131) done^M...
      
      This defaults not to do the progress report when not on a tty.
      
      You could give --progress to force the progress report, but
      let's not bother even documenting it nor mentioning it in the
      usage string.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      86118bcb
  25. 28 4月, 2006 1 次提交
  26. 27 4月, 2006 1 次提交
  27. 21 4月, 2006 2 次提交
  28. 17 4月, 2006 1 次提交
  29. 07 4月, 2006 1 次提交
    • J
      Thin pack generation: optimization. · 5379a5c5
      Junio C Hamano 提交于
      Jens Axboe noticed that recent "git push" has become very slow
      since we made --thin transfer the default.
      
      Thin pack generation to push a handful revisions that touch
      relatively small number of paths out of huge tree was stupid; it
      registered _everything_ from the excluded revisions.  As a
      result, "Counting objects" phase was unnecessarily expensive.
      
      This changes the logic to register the blobs and trees from
      excluded revisions only for paths we are actually going to send
      to the other end.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      5379a5c5
  30. 04 4月, 2006 2 次提交
  31. 03 4月, 2006 2 次提交
  32. 30 3月, 2006 1 次提交
    • J
      tree/diff header cleanup. · 1b0c7174
      Junio C Hamano 提交于
      Introduce tree-walk.[ch] and move "struct tree_desc" and
      associated functions from various places.
      
      Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
      move it to cache.h.  This macro returns the canonicalized
      st_mode value in the host byte order for files, symlinks and
      directories -- to be compared with a tree_desc entry.
      create_ce_mode(mode) in cache.h is similar but is intended to be
      used for index entries (so it does not work for directories) and
      returns the value in the network byte order.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      1b0c7174