1. 05 4月, 2014 1 次提交
    • V
      add `ignore_missing_links` mode to revwalk · 2db1a43f
      Vicent Marti 提交于
      When pack-objects is computing the reachability bitmap to
      serve a fetch request, it can erroneously die() if some of
      the UNINTERESTING objects are not present. Upload-pack
      throws away HAVE lines from the client for objects we do not
      have, but we may have a tip object without all of its
      ancestors (e.g., if the tip is no longer reachable and was
      new enough to survive a `git prune`, but some of its
      reachable objects did get pruned).
      
      In the non-bitmap case, we do a revision walk with the HAVE
      objects marked as UNINTERESTING. The revision walker
      explicitly ignores errors in accessing UNINTERESTING commits
      to handle this case (and we do not bother looking at
      UNINTERESTING trees or blobs at all).
      
      When we have bitmaps, however, the process is quite
      different.  The bitmap index for a pack-objects run is
      calculated in two separate steps:
      
      First, we perform an extensive walk from all the HAVEs to
      find the full set of objects reachable from them. This walk
      is usually optimized away because we are expected to hit an
      object with a bitmap during the traversal, which allows us
      to terminate early.
      
      Secondly, we perform an extensive walk from all the WANTs,
      which usually also terminates early because we hit a commit
      with an existing bitmap.
      
      Once we have the resulting bitmaps from the two walks, we
      AND-NOT them together to obtain the resulting set of objects
      we need to pack.
      
      When we are walking the HAVE objects, the revision walker
      does not know that we are walking it only to mark the
      results as uninteresting. We strip out the UNINTERESTING flag,
      because those objects _are_ interesting to us during the
      first walk. We want to keep going to get a complete set of
      reachable objects if we can.
      
      We need some way to tell the revision walker that it's OK to
      silently truncate the HAVE walk, just like it does for the
      UNINTERESTING case. This patch introduces a new
      `ignore_missing_links` flag to the `rev_info` struct, which
      we set only for the HAVE walk.
      
      It also adds tests to cover UNINTERESTING objects missing
      from several positions: a missing blob, a missing tree, and
      a missing parent commit. The missing blob already worked (as
      we do not care about its contents at all), but the other two
      cases caused us to die().
      
      Note that there are a few cases we do not need to test:
      
        1. We do not need to test a missing tree, with the blob
           still present. Without the tree that refers to it, we
           would not know that the blob is relevant to our walk.
      
        2. We do not need to test a tip commit that is missing.
           Upload-pack omits these for us (and in fact, we
           complain even in the non-bitmap case if it fails to do
           so).
      Reported-by: NSiddharth Agarwal <sid0@fb.com>
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      2db1a43f
  2. 18 3月, 2014 1 次提交
    • J
      pack-objects: turn off bitmaps when skipping objects · 373c67da
      Jeff King 提交于
      The pack bitmap format requires that we have a single bit
      for each object in the pack, and that each object's bitmap
      represents its complete set of reachable objects. Therefore
      we have no way to represent the bitmap of an object which
      references objects outside the pack.
      
      We notice this problem while generating the bitmaps, as we
      try to find the offset of a particular object and realize
      that we do not have it. In this case we die, and neither the
      bitmap nor the pack is generated. This is correct, but
      perhaps a little unfriendly. If you have bitmaps turned on
      in the config, many repacks will fail which would otherwise
      succeed. E.g., incremental repacks, repacks with "-l" when
      you have alternates, ".keep" files.
      
      Instead, this patch notices early that we are omitting some
      objects from the pack and turns off bitmaps (with a
      warning). Note that this is not strictly correct, as it's
      possible that the object being omitted is not reachable from
      any other object in the pack. In practice, this is almost
      never the case, and there are two advantages to doing it
      this way:
      
        1. The code is much simpler, as we do not have to cleanly
           abort the bitmap-generation process midway through.
      
        2. We do not waste time partially generating bitmaps only
           to find out that some object deep in the history is not
           being packed.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      373c67da
  3. 13 2月, 2014 1 次提交
    • J
      ewah: unconditionally ntohll ewah data · 6b5b3a27
      Jeff King 提交于
      Commit a201c20b tried to optimize out a loop like:
      
        for (i = 0; i < len; i++)
      	  data[i] = ntohll(data[i]);
      
      in the big-endian case, because we know that ntohll is a
      noop, and we do not need to pay the cost of the loop at all.
      However, it mistakenly assumed that __BYTE_ORDER was always
      defined, whereas it may not be on systems which do not
      define it by default, and where we did not need to define it
      to set up the ntohll macro. This includes OS X and Windows.
      
      We could muck with the ordering in compat/bswap.h to make
      sure it is defined unconditionally, but it is simpler to
      still to just execute the loop unconditionally. That avoids
      the application code knowing anything about these magic
      macros, and lets it depend only on having ntohll defined.
      
      And since the resulting loop looks like (on a big-endian
      system):
      
        for (i = 0; i < len; i++)
      	  data[i] = data[i];
      
      any decent compiler can probably optimize it out.
      
      Original report and analysis by Brian Gernhardt.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      6b5b3a27
  4. 24 1月, 2014 3 次提交
  5. 17 1月, 2014 1 次提交
    • J
      do not discard revindex when re-preparing packfiles · 1a6d8b91
      Jeff King 提交于
      When an object lookup fails, we re-read the objects/pack
      directory to pick up any new packfiles that may have been
      created since our last read. We also discard any pack
      revindex structs we've allocated.
      
      The discarding is a problem for the pack-bitmap code, which keeps
      a pointer to the revindex for the bitmapped pack. After the
      discard, the pointer is invalid, and we may read free()d
      memory.
      
      Other revindex users do not keep a bare pointer to the
      revindex; instead, they always access it through
      revindex_for_pack(), which lazily builds the revindex. So
      one solution is to teach the pack-bitmap code a similar
      trick. It would be slightly less efficient, but probably not
      all that noticeable.
      
      However, it turns out this discarding is not actually
      necessary. When we call reprepare_packed_git, we do not
      throw away our old pack list. We keep the existing entries,
      and only add in new ones. So there is no safety problem; we
      will still have the pack struct that matches each revindex.
      The packfile itself may go away, of course, but we are
      already prepared to handle that, and it may happen outside
      of reprepare_packed_git anyway.
      
      Throwing away the revindex may save some RAM if the pack
      never gets reused (about 12 bytes per object). But it also
      wastes some CPU time (to regenerate the index) if the pack
      does get reused. It's hard to say which is more valuable,
      but in either case, it happens very rarely (only when we
      race with a simultaneous repack). Just leaving the revindex
      in place is simple and safe both for current and future
      code.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1a6d8b91
  6. 31 12月, 2013 15 次提交
    • V
      pack-bitmap: implement optional name_hash cache · ae4f07fb
      Vicent Marti 提交于
      When we use pack bitmaps rather than walking the object
      graph, we end up with the list of objects to include in the
      packfile, but we do not know the path at which any tree or
      blob objects would be found.
      
      In a recently packed repository, this is fine. A fetch would
      use the paths only as a heuristic in the delta compression
      phase, and a fully packed repository should not need to do
      much delta compression.
      
      As time passes, though, we may acquire more objects on top
      of our large bitmapped pack. If clients fetch frequently,
      then they never even look at the bitmapped history, and all
      works as usual. However, a client who has not fetched since
      the last bitmap repack will have "have" tips in the
      bitmapped history, but "want" newer objects.
      
      The bitmaps themselves degrade gracefully in this
      circumstance. We manually walk the more recent bits of
      history, and then use bitmaps when we hit them.
      
      But we would also like to perform delta compression between
      the newer objects and the bitmapped objects (both to delta
      against what we know the user already has, but also between
      "new" and "old" objects that the user is fetching). The lack
      of pathnames makes our delta heuristics much less effective.
      
      This patch adds an optional cache of the 32-bit name_hash
      values to the end of the bitmap file. If present, a reader
      can use it to match bitmapped and non-bitmapped names during
      delta compression.
      
      Here are perf results for p5310:
      
      Test                      origin/master       HEAD^                      HEAD
      -------------------------------------------------------------------------------------------------
      5310.2: repack to disk    36.81(37.82+1.43)   47.70(48.74+1.41) +29.6%   47.75(48.70+1.51) +29.7%
      5310.3: simulated clone   30.78(29.70+2.14)   1.08(0.97+0.10) -96.5%     1.07(0.94+0.12) -96.5%
      5310.4: simulated fetch   3.16(6.10+0.08)     3.54(10.65+0.06) +12.0%    1.70(3.07+0.06) -46.2%
      5310.6: partial bitmap    36.76(43.19+1.81)   6.71(11.25+0.76) -81.7%    4.08(6.26+0.46) -88.9%
      
      You can see that the time spent on an incremental fetch goes
      down, as our delta heuristics are able to do their work.
      And we save time on the partial bitmap clone for the same
      reason.
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ae4f07fb
    • J
      t/perf: add tests for pack bitmaps · bbcefa1f
      Jeff King 提交于
      This adds a few basic perf tests for the pack bitmap code to
      show off its improvements. The tests are:
      
        1. How long does it take to do a repack (it gets slower
           with bitmaps, since we have to do extra work)?
      
        2. How long does it take to do a clone (it gets faster
           with bitmaps)?
      
        3. How does a small fetch perform when we've just
           repacked?
      
        4. How does a clone perform when we haven't repacked since
           a week of pushes?
      
      Here are results against linux.git:
      
      Test                      origin/master       this tree
      -----------------------------------------------------------------------
      5310.2: repack to disk    33.64(32.64+2.04)   67.67(66.75+1.84) +101.2%
      5310.3: simulated clone   30.49(29.47+2.05)   1.20(1.10+0.10) -96.1%
      5310.4: simulated fetch   3.49(6.79+0.06)     5.57(22.35+0.07) +59.6%
      5310.6: partial bitmap    36.70(43.87+1.81)   8.18(21.92+0.73) -77.7%
      
      You can see that we do take longer to repack, but we do way
      better for further clones. A small fetch performs a bit
      worse, as we spend way more time on delta compression (note
      the heavy user CPU time, as we have 8 threads) due to the
      lack of name hashes for the bitmapped objects.
      
      The final test shows how the bitmaps degrade over time
      between packs. There's still a significant speedup over the
      non-bitmap case, but we don't do quite as well (we have to
      spend time accessing the "new" objects the old fashioned
      way, including delta compression).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      bbcefa1f
    • J
      t: add basic bitmap functionality tests · 212f2ffb
      Jeff King 提交于
      Now that we can read and write bitmaps, we can exercise them
      with some basic functionality tests. These tests aren't
      particularly useful for seeing the benefit, as the test
      repo is too small for it to make a difference. However, we
      can at least check that using bitmaps does not break anything.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      212f2ffb
    • N
      count-objects: recognize .bitmap in garbage-checking · d3d3e4c4
      Nguyễn Thái Ngọc Duy 提交于
      Count-objects will report any "garbage" files in the packs
      directory, including files whose extensions it does not
      know (case 1), and files whose matching ".pack" file is
      missing (case 2).  Without having learned about ".bitmap"
      files, the current code reports all such files as garbage
      (case 1), even if their pack exists. Instead, they should be
      treated as case 2.
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      d3d3e4c4
    • V
      repack: consider bitmaps when performing repacks · 5cf2741c
      Vicent Marti 提交于
      Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
      `.idx` files, this commit teaches `git-repack` to consider the new
      bitmap indexes (if they exist) when performing repack operations.
      
      This implies moving old bitmap indexes out of the way if we are
      repacking a repository that already has them, and moving the newly
      generated bitmap indexes into the `objects/pack` directory, next to
      their corresponding packfiles.
      
      Since `git repack` is now capable of handling these `.bitmap` files,
      a normal `git gc` run on a repository that has `pack.writebitmaps` set
      to true in its config file will generate bitmap indexes as part of the
      garbage collection process.
      
      Alternatively, `git repack` can be called with the `-b` switch to
      explicitly generate bitmap indexes if you are experimenting
      and don't want them on all the time.
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5cf2741c
    • J
      repack: handle optional files created by pack-objects · b77fcd1e
      Jeff King 提交于
      We ask pack-objects to pack to a set of temporary files, and
      then rename them into place. Some files that pack-objects
      creates may be optional (like a .bitmap file), in which case
      we would not want to call rename(). We already call stat()
      and make the chmod optional if the file cannot be accessed.
      We could simply skip the rename step in this case, but that
      would be a minor regression in noticing problems with
      non-optional files (like the .pack and .idx files).
      
      Instead, we can now annotate extensions as optional, and
      skip them if they don't exist (and otherwise rely on
      rename() to barf).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b77fcd1e
    • J
      repack: turn exts array into array-of-struct · 42a02d85
      Jeff King 提交于
      This is slightly more verbose, but will let us annotate the
      extensions with further options in future commits.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      42a02d85
    • J
      repack: stop using magic number for ARRAY_SIZE(exts) · b328c216
      Jeff King 提交于
      We have a static array of extensions, but hardcode the size
      of the array in our loops. Let's pull out this magic number,
      which will make it easier to change.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b328c216
    • V
      pack-objects: implement bitmap writing · 7cc8f971
      Vicent Marti 提交于
      This commit extends more the functionality of `pack-objects` by allowing
      it to write out a `.bitmap` index next to any written packs, together
      with the `.idx` index that currently gets written.
      
      If bitmap writing is enabled for a given repository (either by calling
      `pack-objects` with the `--write-bitmap-index` flag or by having
      `pack.writebitmaps` set to `true` in the config) and pack-objects is
      writing a packfile that would normally be indexed (i.e. not piping to
      stdout), we will attempt to write the corresponding bitmap index for the
      packfile.
      
      Bitmap index writing happens after the packfile and its index has been
      successfully written to disk (`finish_tmp_packfile`). The process is
      performed in several steps:
      
          1. `bitmap_writer_set_checksum`: this call stores the partial
             checksum for the packfile being written; the checksum will be
             written in the resulting bitmap index to verify its integrity
      
          2. `bitmap_writer_build_type_index`: this call uses the array of
             `struct object_entry` that has just been sorted when writing out
             the actual packfile index to disk to generate 4 type-index bitmaps
             (one for each object type).
      
             These bitmaps have their nth bit set if the given object is of
             the bitmap's type. E.g. the nth bit of the Commits bitmap will be
             1 if the nth object in the packfile index is a commit.
      
             This is a very cheap operation because the bitmap writing code has
             access to the metadata stored in the `struct object_entry` array,
             and hence the real type for each object in the packfile.
      
          3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
             index for one of the packfiles we're trying to repack, this call
             will efficiently rebuild the existing bitmaps so they can be
             reused on the new index. All the existing bitmaps will be stored
             in a `reuse` hash table, and the commit selection phase will
             prioritize these when selecting, as they can be written directly
             to the new index without having to perform a revision walk to
             fill the bitmap. This can greatly speed up the repack of a
             repository that already has bitmaps.
      
          4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
             a given `pack-objects` run, the sequence of commits generated
             during the Counting Objects phase will be stored in an array.
      
             We then use that array to build up the list of selected commits.
             Writing a bitmap in the index for each object in the repository
             would be cost-prohibitive, so we use a simple heuristic to pick
             the commits that will be indexed with bitmaps.
      
             The current heuristics are a simplified version of JGit's
             original implementation. We select a higher density of commits
             depending on their age: the 100 most recent commits are always
             selected, after that we pick 1 commit of each 100, and the gap
             increases as the commits grow older. On top of that, we make sure
             that every single branch that has not been merged (all the tips
             that would be required from a clone) gets their own bitmap, and
             when selecting commits between a gap, we tend to prioritize the
             commit with the most parents.
      
             Do note that there is no right/wrong way to perform commit
             selection; different selection algorithms will result in
             different commits being selected, but there's no such thing as
             "missing a commit". The bitmap walker algorithm implemented in
             `prepare_bitmap_walk` is able to adapt to missing bitmaps by
             performing manual walks that complete the bitmap: the ideal
             selection algorithm, however, would select the commits that are
             more likely to be used as roots for a walk in the future (e.g.
             the tips of each branch, and so on) to ensure a bitmap for them
             is always available.
      
          5. `bitmap_writer_build`: this is the computationally expensive part
             of bitmap generation. Based on the list of commits that were
             selected in the previous step, we perform several incremental
             walks to generate the bitmap for each commit.
      
             The walks begin from the oldest commit, and are built up
             incrementally for each branch. E.g. consider this dag where A, B,
             C, D, E, F are the selected commits, and a, b, c, e are a chunk
             of simplified history that will not receive bitmaps.
      
                  A---a---B--b--C--c--D
                           \
                            E--e--F
      
             We start by building the bitmap for A, using A as the root for a
             revision walk and marking all the objects that are reachable
             until the walk is over. Once this bitmap is stored, we reuse the
             bitmap walker to perform the walk for B, assuming that once we
             reach A again, the walk will be terminated because A has already
             been SEEN on the previous walk.
      
             This process is repeated for C, and D, but when we try to
             generate the bitmaps for E, we can reuse neither the current walk
             nor the bitmap we have generated so far.
      
             What we do now is resetting both the walk and clearing the
             bitmap, and performing the walk from scratch using E as the
             origin. This new walk, however, does not need to be completed.
             Once we hit B, we can lookup the bitmap we have already stored
             for that commit and OR it with the existing bitmap we've composed
             so far, allowing us to limit the walk early.
      
             After all the bitmaps have been generated, another iteration
             through the list of commits is performed to find the best XOR
             offsets for compression before writing them to disk. Because of
             the incremental nature of these bitmaps, XORing one of them with
             its predecesor results in a minimal "bitmap delta" most of the
             time. We can write this delta to the on-disk bitmap index, and
             then re-compose the original bitmaps by XORing them again when
             loaded.
      
             This is a phase very similar to pack-object's `find_delta` (using
             bitmaps instead of objects, of course), except the heuristics
             have been greatly simplified: we only check the 10 bitmaps before
             any given one to find best compressing one. This gives good
             results in practice, because there is locality in the ordering of
             the objects (and therefore bitmaps) in the packfile.
      
           6. `bitmap_writer_finish`: the last step in the process is
      	serializing to disk all the bitmap data that has been generated
      	in the two previous steps.
      
      	The bitmap is written to a tmp file and then moved atomically to
      	its final destination, using the same process as
      	`pack-write.c:write_idx_file`.
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7cc8f971
    • V
      rev-list: add bitmap mode to speed up object lists · aa32939f
      Vicent Marti 提交于
      The bitmap reachability index used to speed up the counting objects
      phase during `pack-objects` can also be used to optimize a normal
      rev-list if the only thing required are the SHA1s of the objects during
      the list (i.e., not the path names at which trees and blobs were found).
      
      Calling `git rev-list --objects --use-bitmap-index [committish]` will
      perform an object iteration based on a bitmap result instead of actually
      walking the object graph.
      
      These are some example timings for `torvalds/linux` (warm cache,
      best-of-five):
      
          $ time git rev-list --objects master > /dev/null
      
          real    0m34.191s
          user    0m33.904s
          sys     0m0.268s
      
          $ time git rev-list --objects --use-bitmap-index master > /dev/null
      
          real    0m1.041s
          user    0m0.976s
          sys     0m0.064s
      
      Likewise, using `git rev-list --count --use-bitmap-index` will speed up
      the counting operation by building the resulting bitmap and performing a
      fast popcount (number of bits set on the bitmap) on the result.
      
      Here are some sample timings of different ways to count commits in
      `torvalds/linux`:
      
          $ time git rev-list master | wc -l
              399882
      
              real    0m6.524s
              user    0m6.060s
              sys     0m3.284s
      
          $ time git rev-list --count master
              399882
      
              real    0m4.318s
              user    0m4.236s
              sys     0m0.076s
      
          $ time git rev-list --use-bitmap-index --count master
              399882
      
              real    0m0.217s
              user    0m0.176s
              sys     0m0.040s
      
      This also respects negative refs, so you can use it to count
      a slice of history:
      
              $ time git rev-list --count v3.0..master
              144843
      
              real    0m1.971s
              user    0m1.932s
              sys     0m0.036s
      
              $ time git rev-list --use-bitmap-index --count v3.0..master
              real    0m0.280s
              user    0m0.220s
              sys     0m0.056s
      
      Though note that the closer the endpoints, the less it helps. In the
      traversal case, we have fewer commits to cross, so we take less time.
      But the bitmap time is dominated by generating the pack revindex, which
      is constant with respect to the refs given.
      
      Note that you cannot yet get a fast --left-right count of a symmetric
      difference (e.g., "--count --left-right master...topic"). The slow part
      of that walk actually happens during the merge-base determination when
      we parse "master...topic". Even though a count does not actually need to
      know the real merge base (it only needs to take the symmetric difference
      of the bitmaps), the revision code would require some refactoring to
      handle this case.
      
      Additionally, a `--test-bitmap` flag has been added that will perform
      the same rev-list manually (i.e. using a normal revwalk) and using
      bitmaps, and verify that the results are the same. This can be used to
      exercise the bitmap code, and also to verify that the contents of the
      .bitmap file are sane.
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      aa32939f
    • V
      pack-objects: use bitmaps when packing objects · 6b8fda2d
      Vicent Marti 提交于
      In this patch, we use the bitmap API to perform the `Counting Objects`
      phase in pack-objects, rather than a traditional walk through the object
      graph. For a reasonably-packed large repo, the time to fetch and clone
      is often dominated by the full-object revision walk during the Counting
      Objects phase. Using bitmaps can reduce the CPU time required on the
      server (and therefore start sending the actual pack data with less
      delay).
      
      For bitmaps to be used, the following must be true:
      
        1. We must be packing to stdout (as a normal `pack-objects` from
           `upload-pack` would do).
      
        2. There must be a .bitmap index containing at least one of the
           "have" objects that the client is asking for.
      
        3. Bitmaps must be enabled (they are enabled by default, but can be
           disabled by setting `pack.usebitmaps` to false, or by using
           `--no-use-bitmap-index` on the command-line).
      
      If any of these is not true, we fall back to doing a normal walk of the
      object graph.
      
      Here are some sample timings from a full pack of `torvalds/linux` (i.e.
      something very similar to what would be generated for a clone of the
      repository) that show the speedup produced by various
      methods:
      
          [existing graph traversal]
          $ time git pack-objects --all --stdout --no-use-bitmap-index \
      			    </dev/null >/dev/null
          Counting objects: 3237103, done.
          Compressing objects: 100% (508752/508752), done.
          Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
      
          real    0m44.111s
          user    0m42.396s
          sys     0m3.544s
      
          [bitmaps only, without partial pack reuse; note that
           pack reuse is automatic, so timing this required a
           patch to disable it]
          $ time git pack-objects --all --stdout </dev/null >/dev/null
          Counting objects: 3237103, done.
          Compressing objects: 100% (508752/508752), done.
          Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
      
          real    0m5.413s
          user    0m5.604s
          sys     0m1.804s
      
          [bitmaps with pack reuse (what you get with this patch)]
          $ time git pack-objects --all --stdout </dev/null >/dev/null
          Reusing existing pack: 3237103, done.
          Total 3237103 (delta 0), reused 0 (delta 0)
      
          real    0m1.636s
          user    0m1.460s
          sys     0m0.172s
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      6b8fda2d
    • J
      pack-objects: split add_object_entry · ce2bc424
      Jeff King 提交于
      This function actually does three things:
      
        1. Check whether we've already added the object to our
           packing list.
      
        2. Check whether the object meets our criteria for adding.
      
        3. Actually add the object to our packing list.
      
      It's a little hard to see these three phases, because they
      happen linearly in the rather long function. Instead, this
      patch breaks them up into three separate helper functions.
      
      The result is a little easier to follow, though it
      unfortunately suffers from some optimization
      interdependencies between the stages (e.g., during step 3 we
      use the packing list index from step 1 and the packfile
      information from step 2).
      
      More importantly, though, the various parts can be
      composed differently, as they will be in the next patch.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ce2bc424
    • V
      pack-bitmap: add support for bitmap indexes · fff42755
      Vicent Marti 提交于
      A bitmap index is a `.bitmap` file that can be found inside
      `$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
      contains precalculated reachability information for selected commits.
      The full specification of the format for these bitmap indexes can be found
      in `Documentation/technical/bitmap-format.txt`.
      
      For a given commit SHA1, if it happens to be available in the bitmap
      index, its bitmap will represent every single object that is reachable
      from the commit itself. The nth bit in the bitmap is the nth object in
      the packfile; if it's set to 1, the object is reachable.
      
      By using the bitmaps available in the index, this commit implements
      several new functions:
      
      	- `prepare_bitmap_git`
      	- `prepare_bitmap_walk`
      	- `traverse_bitmap_commit_list`
      	- `reuse_partial_packfile_from_bitmap`
      
      The `prepare_bitmap_walk` function tries to build a bitmap of all the
      objects that can be reached from the commit roots of a given `rev_info`
      struct by using the following algorithm:
      
      - If all the interesting commits for a revision walk are available in
      the index, the resulting reachability bitmap is the bitwise OR of all
      the individual bitmaps.
      
      - When the full set of WANTs is not available in the index, we perform a
      partial revision walk using the commits that don't have bitmaps as
      roots, and limiting the revision walk as soon as we reach a commit that
      has a corresponding bitmap. The earlier OR'ed bitmap with all the
      indexed commits can now be completed as this walk progresses, so the end
      result is the full reachability list.
      
      - For revision walks with a HAVEs set (a set of commits that are deemed
      uninteresting), first we perform the same method as for the WANTs, but
      using our HAVEs as roots, in order to obtain a full reachability bitmap
      of all the uninteresting commits. This bitmap then can be used to:
      
      	a) limit the subsequent walk when building the WANTs bitmap
      	b) finding the final set of interesting commits by performing an
      	   AND-NOT of the WANTs and the HAVEs.
      
      If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
      stored and the equivalent of a `traverse_commit_list` call can be
      performed by using `traverse_bitmap_commit_list`; the bitmap version
      of this call yields the objects straight from the packfile index
      (without having to look them up or parse them) and hence is several
      orders of magnitude faster.
      
      As an extra optimization, when `prepare_bitmap_walk` succeeds, the
      `reuse_partial_packfile_from_bitmap` call can be attempted: it will find
      the amount of objects at the beginning of the on-disk packfile that can
      be reused as-is, and return an offset into the packfile. The source
      packfile can then be loaded and the bytes up to `offset` can be written
      directly to the result without having to consider the entires inside the
      packfile individually.
      
      If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
      are available), the `rev_info` struct is left untouched, and can be used
      to perform a manual rev-walk using `traverse_commit_list`.
      
      Hence, this new set of functions are a generic API that allows to
      perform the equivalent of
      
      	git rev-list --objects [roots...] [^uninteresting...]
      
      for any set of commits, even if they don't have specific bitmaps
      generated for them.
      
      In further patches, we'll use this bitmap traversal optimization to
      speed up the `pack-objects` and `rev-list` commands.
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      fff42755
    • V
      documentation: add documentation for the bitmap format · 0d4455a3
      Vicent Marti 提交于
      This is the technical documentation for the JGit-compatible Bitmap v1
      on-disk format.
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      0d4455a3
    • V
      ewah: compressed bitmap implementation · e1273106
      Vicent Marti 提交于
      EWAH is a word-aligned compressed variant of a bitset (i.e. a data
      structure that acts as a 0-indexed boolean array for many entries).
      
      It uses a 64-bit run-length encoding (RLE) compression scheme,
      trading some compression for better processing speed.
      
      The goal of this word-aligned implementation is not to achieve
      the best compression, but rather to improve query processing time.
      As it stands right now, this EWAH implementation will always be more
      efficient storage-wise than its uncompressed alternative.
      
      EWAH arrays will be used as the on-disk format to store reachability
      bitmaps for all objects in a repository while keeping reasonable sizes,
      in the same way that JGit does.
      
      This EWAH implementation is a mostly straightforward port of the
      original `javaewah` library that JGit currently uses. The library is
      self-contained and has been embedded whole (4 files) inside the `ewah`
      folder to ease redistribution.
      
      The library is re-licensed under the GPLv2 with the permission of Daniel
      Lemire, the original author. The source code for the C version can
      be found on GitHub:
      
      	https://github.com/vmg/libewok
      
      The original Java implementation can also be found on GitHub:
      
      	https://github.com/lemire/javaewah
      
      [jc: stripped debug-only code per Peff's $gmane/239768]
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Helped-by: NRamsay Jones <ramsay@ramsay1.demon.co.uk>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e1273106
  7. 19 11月, 2013 1 次提交
    • V
      compat: add endianness helpers · 7e3dae49
      Vicent Marti 提交于
      The POSIX standard doesn't currently define a `ntohll`/`htonll`
      function pair to perform network-to-host and host-to-network
      swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk
      storage of EWAH bitmaps if they are not in native byte order.
      
      Many thanks to Ramsay Jones <ramsay@ramsay1.demon.co.uk> and
      Torsten Bögershausen <tboegi@web.de> for cygwin/mingw/msvc
      portability fixes.
      Signed-off-by: NVicent Marti <tanoku@gmail.com>
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7e3dae49
  8. 25 10月, 2013 6 次提交
  9. 24 10月, 2013 11 次提交
    • J
      Update draft release notes to 1.8.5 · 3d092bfc
      Junio C Hamano 提交于
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      3d092bfc
    • J
      Sync with 'maint' · ea21efc7
      Junio C Hamano 提交于
      ea21efc7
    • J
      Almost 1.8.4.2 ;-) · ca462804
      Junio C Hamano 提交于
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ca462804
    • J
      Merge branch 'jc/ls-files-killed-optim' into maint · e03a5010
      Junio C Hamano 提交于
      "git ls-files -k" needs to crawl only the part of the working tree
      that may overlap the paths in the index to find killed files, but
      shared code with the logic to find all the untracked files, which
      made it unnecessarily inefficient.
      
      * jc/ls-files-killed-optim:
        dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
        t3010: update to demonstrate "ls-files -k" optimization pitfalls
        ls-files -k: a directory only can be killed if the index has a non-directory
        dir.c: use the cache_* macro to access the current index
      e03a5010
    • J
      Merge branch 'jh/checkout-auto-tracking' into maint · 74051fa8
      Junio C Hamano 提交于
      "git branch --track" had a minor regression in v1.8.3.2 and later
      that made it impossible to base your local work on anything but a
      local branch of the upstream repository you are tracking from.
      
      * jh/checkout-auto-tracking:
        t3200: fix failure on case-insensitive filesystems
        branch.c: Relax unnecessary requirement on upstream's remote ref name
        t3200: Add test demonstrating minor regression in 41c21f22
        Refer to branch.<name>.remote/merge when documenting --track
        t3200: Minor fix when preparing for tracking failure
        t2024: Fix &&-chaining and a couple of typos
      74051fa8
    • J
      Merge branch 'nd/fetch-into-shallow' into maint · 6ba0d955
      Junio C Hamano 提交于
      When there is no sufficient overlap between old and new history
      during a "git fetch" into a shallow repository, objects that the
      sending side knows the receiving end has were unnecessarily sent.
      
      * nd/fetch-into-shallow:
        Add testcase for needless objects during a shallow fetch
        list-objects: mark more commits as edges in mark_edges_uninteresting
        list-objects: reduce one argument in mark_edges_uninteresting
        upload-pack: delegate rev walking in shallow fetch to pack-objects
        shallow: add setup_temporary_shallow()
        shallow: only add shallow graft points to new shallow file
        move setup_alternate_shallow and write_shallow_commits to shallow.c
      6ba0d955
    • J
      Merge branch 'bc/gnome-keyring' · 26145c9c
      Junio C Hamano 提交于
      Cleanups and tweaks for credential handling to work with ancient versions
      of the gnome-keyring library that are still in use.
      
      * bc/gnome-keyring:
        contrib/git-credential-gnome-keyring.c: support really ancient gnome-keyring
        contrib/git-credential-gnome-keyring.c: support ancient gnome-keyring
        contrib/git-credential-gnome-keyring.c: report failure to store password
        contrib/git-credential-gnome-keyring.c: use glib messaging functions
        contrib/git-credential-gnome-keyring.c: use glib memory allocation functions
        contrib/git-credential-gnome-keyring.c: use secure memory for reading passwords
        contrib/git-credential-gnome-keyring.c: use secure memory functions for passwds
        contrib/git-credential-gnome-keyring.c: use gnome helpers in keyring_object()
        contrib/git-credential-gnome-keyring.c: set Gnome application name
        contrib/git-credential-gnome-keyring.c: ensure buffer is non-empty before accessing
        contrib/git-credential-gnome-keyring.c: strlen() returns size_t, not ssize_t
        contrib/git-credential-gnome-keyring.c: exit non-zero when called incorrectly
        contrib/git-credential-gnome-keyring.c: add static where applicable
        contrib/git-credential-gnome-keyring.c: *style* use "if ()" not "if()" etc.
        contrib/git-credential-gnome-keyring.c: remove unused die() function
        contrib/git-credential-gnome-keyring.c: remove unnecessary pre-declarations
      26145c9c
    • J
      Merge branch 'po/dot-url' · f92f068e
      Junio C Hamano 提交于
      Explain how '.' can be used to refer to the "current repository"
      in the documentation.
      
      * po/dot-url:
        doc/cli: make "dot repository" an independent bullet point
        config doc: update dot-repository notes
        doc: command line interface (cli) dot-repository dwimmery
      f92f068e
    • J
      Merge branch 'jc/prompt-upstream' · 807c895f
      Junio C Hamano 提交于
      An enhancement to the GIT_PS1_SHOWUPSTREAM facility.
      
      * jc/prompt-upstream:
        git-prompt.sh: optionally show upstream branch name
      807c895f
    • J
      Merge branch 'hu/cherry-pick-previous-branch' · f2c1b01c
      Junio C Hamano 提交于
      "git cherry-pick" without further options would segfault.
      
      Could use a follow-up to handle '-' after argv[1] better.
      
      * hu/cherry-pick-previous-branch:
        cherry-pick: handle "-" after parsing options
      f2c1b01c
    • J
      Merge branch 'mg/more-textconv' · 4197361e
      Junio C Hamano 提交于
      Make "git grep" and "git show" pay attention to --textconv when
      dealing with blob objects.
      
      * mg/more-textconv:
        grep: honor --textconv for the case rev:path
        grep: allow to use textconv filters
        t7008: demonstrate behavior of grep with textconv
        cat-file: do not die on --textconv without textconv filters
        show: honor --textconv for blobs
        diff_opt: track whether flags have been set explicitly
        t4030: demonstrate behavior of show with textconv
      4197361e