1. 16 9月, 2016 12 次提交
  2. 14 9月, 2016 6 次提交
    • R
      checkout: constify parameters of checkout_stage() and checkout_merged() · ce25e4c7
      René Scharfe 提交于
      Document the fact that checkout_stage() and checkout_merged() don't
      change the objects passed to them by adding the modifier const.
      Signed-off-by: NRene Scharfe <l.s.r@web.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ce25e4c7
    • J
      init: reset cached config when entering new repo · 4543926b
      Jeff King 提交于
      After we copy the templates into place, we re-read the
      config in case we copied in a default config file. But since
      git_config() is backed by a cache these days, it's possible
      that the call will not actually touch the filesystem at all;
      we need to tell it that something has changed behind the
      scenes.
      
      Note that we also need to reset the shared_repository
      config. At first glance, it seems like this should probably
      just be folded into git_config_clear(). But unfortunately
      that is not quite right. The shared repository value may
      come from config, _or_ it may have been set manually. So
      only the caller who knows whether or not they set it is the
      one who can clear it (and indeed, if you _do_ put it into
      git_config_clear(), then many tests fail, as we have to
      clear the config cache any time we set a new config
      variable).
      
      There are three tests here. The first two actually pass
      already, though it's largely luck: they just don't happen to
      actually read any config before we enter the new repo.
      
      But the third one does fail without this patch; we look at
      core.sharedrepository while creating the directory, but need
      to make sure the value from the template config overrides
      it.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      4543926b
    • J
      init: expand comments explaining config trickery · 7c0a842b
      Jeff King 提交于
      git-init may copy "config" from the templates directory and
      then re-read it. There are some comments explaining what's
      going on here, but they are not grouped very well with the
      matching code. Let's rearrange and expand them.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7c0a842b
    • J
      diff: always try to set up the repository · 28a4e580
      Jeff King 提交于
      If we see an explicit "--no-index", we do not bother calling
      setup_git_directory_gently() at all. This means that we may
      miss out on reading repo-specific config.
      
      It's arguable whether this is correct or not. If we were
      designing from scratch, making "git diff --no-index"
      completely ignore the repository makes some sense. But we
      are nowhere near scratch, so let's look at the existing
      behavior:
      
        1. If you're in the top-level of a repository and run an
           explicit "diff --no-index", the config subsystem falls
           back to reading ".git/config", and we will respect repo
           config.
      
        2. If you're in a subdirectory of a repository, then we
           still try to read ".git/config", but it generally
           doesn't exist. So "diff --no-index" there does not
           respect repo config.
      
        3. If you have $GIT_DIR set in the environment, we read
           and respect $GIT_DIR/config,
      
        4. If you run "git diff /tmp/foo /tmp/bar" to get an
           implicit no-index, we _do_ run the repository setup,
           and set $GIT_DIR (or respect an existing $GIT_DIR
           variable). We find the repo config no matter where we
           started, and respect it.
      
      So we already respect the repository config in a number of
      common cases, and case (2) is the only one that does not.
      And at least one of our tests, t4034, depends on case (1)
      behaving as it does now (though it is just incidental, not
      an explicit test for this behavior).
      
      So let's bring case (2) in line with the others by always
      running the repository setup, even with an explicit
      "--no-index". We shouldn't need to change anything else, as the
      implicit case already handles the prefix.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      28a4e580
    • J
      diff: skip implicit no-index check when given --no-index · 475b362c
      Jeff King 提交于
      We can invoke no-index mode in two ways: by an explicit
      request from the user, or implicitly by noticing that we
      have two paths, and at least one is outside the repository.
      
      If the user already told us --no-index, there is no need for
      us to do the implicit test at all.  However, we currently
      do, and downgrade our "explicit" to DIFF_NO_INDEX_IMPLICIT.
      
      This doesn't have any user-visible behavior, though it's not
      immediately obvious why. We only trigger the implicit check
      when we have exactly two non-option arguments. And the only
      code that cares about implicit versus explicit is an error
      message that we show when we _don't_ have two non-option
      arguments.
      
      However, it's worth fixing anyway. Besides being slightly
      more efficient, it makes the code easier to follow, which
      will help when we modify it in future patches.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      475b362c
    • J
      hash-object: always try to set up the git repository · 0e94ee94
      Jeff King 提交于
      When "hash-object" is run without "-w", we don't need to be
      in a git repository at all; we can just hash the object and
      write its sha1 to stdout. However, if we _are_ in a git
      repository, we would want to know that so we can follow the
      normal rules for respecting config, .gitattributes, etc.
      
      This happens to work at the top-level of a git repository
      because we blindly read ".git/config", but as the included
      test shows, it does not work when you are in a subdirectory.
      
      The solution is to just do a "gentle" setup in this case. We
      already take care to use prefix_filename() on any filename
      arguments we get (to handle the "-w" case), so we don't need
      to do anything extra to handle the side effects of repo
      setup.
      
      An alternative would be to specify RUN_SETUP_GENTLY for this
      command in git.c, and then die if "-w" is set but we are not
      in a repository. However, the error messages generated at
      the time of setup_git_directory() are more detailed, so it's
      better to find out which mode we are in, and then call the
      appropriate function.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      0e94ee94
  3. 13 9月, 2016 2 次提交
    • K
      pack-objects: use reachability bitmap index when generating non-stdout pack · 645c432d
      Kirill Smelkov 提交于
      Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects)
      if a repository has bitmap index, pack-objects can nicely speedup
      "Counting objects" graph traversal phase. That however was done only for
      case when resultant pack is sent to stdout, not written into a file.
      
      The reason here is for on-disk repack by default we want:
      
      - to produce good pack (with bitmap index not-yet-packed objects are
        emitted to pack in suboptimal order).
      
      - to use more robust pack-generation codepath (avoiding possible
        bugs in bitmap code and possible bitmap index corruption).
      
      Jeff King further explains:
      
          The reason for this split is that pack-objects tries to determine how
          "careful" it should be based on whether we are packing to disk or to
          stdout. Packing to disk implies "git repack", and that we will likely
          delete the old packs after finishing. We want to be more careful (so
          as not to carry forward a corruption, and to generate a more optimal
          pack), and we presumably run less frequently and can afford extra CPU.
          Whereas packing to stdout implies serving a remote via "git fetch" or
          "git push". This happens more frequently (e.g., a server handling many
          fetching clients), and we assume the receiving end takes more
          responsibility for verifying the data.
      
          But this isn't always the case. One might want to generate on-disk
          packfiles for a specialized object transfer. Just using "--stdout" and
          writing to a file is not optimal, as it will not generate the matching
          pack index.
      
          So it would be useful to have some way of overriding this heuristic:
          to tell pack-objects that even though it should generate on-disk
          files, it is still OK to use the reachability bitmaps to do the
          traversal.
      
      So we can teach pack-objects to use bitmap index for initial object
      counting phase when generating resultant pack file too:
      
      - if we take care to not let it be activated under git-repack:
      
        See above about repack robustness and not forward-carrying corruption.
      
      - if we know bitmap index generation is not enabled for resultant pack:
      
        The current code has singleton bitmap_git, so it cannot work
        simultaneously with two bitmap indices.
      
        We also want to avoid (at least with current implementation)
        generating bitmaps off of bitmaps. The reason here is: when generating
        a pack, not-yet-packed objects will be emitted into pack in
        suboptimal order and added to tail of the bitmap as "extended entries".
        When the resultant pack + some new objects in associated repository
        are in turn used to generate another pack with bitmap, the situation
        repeats: new objects are again not emitted optimally and just added to
        bitmap tail - not in recency order.
      
        So the pack badness can grow over time when at each step we have
        bitmapped pack + some other objects. That's why we want to avoid
        generating bitmaps off of bitmaps, not to let pack badness grow.
      
      - if we keep pack reuse enabled still only for "send-to-stdout" case:
      
        Because pack-to-file needs to generate index for destination pack, and
        currently on pack reuse raw entries are directly written out to the
        destination pack by write_reused_pack(), bypassing needed for pack index
        generation bookkeeping done by regular codepath in write_one() and
        friends.
      
        ( In the future we might teach pack-reuse code about cases when index
          also needs to be generated for resultant pack and remove
          pack-reuse-only-for-stdout limitation )
      
      This way for pack-objects -> file we get nice speedup:
      
          erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup
          repository managed by git-backup[2] via
      
          time echo 0186ac99 | git pack-objects --revs erp5pack
      
      before:  37.2s
      after:   26.2s
      
      And for `git repack -adb` packed git.git
      
          time echo 5c589a73 | git pack-objects --revs gitpack
      
      before:   7.1s
      after:    3.6s
      
      i.e. it can be 30% - 50% speedup for pack extraction.
      
      git-backup extracts many packs on repositories restoration. That was my
      initial motivation for the patch.
      
      [1] https://lab.nexedi.com/nexedi/erp5
      [2] https://lab.nexedi.com/kirr/git-backup
      
      NOTE
      
      Jeff also suggests that pack.useBitmaps was probably a mistake to
      introduce originally. This way we are not adding another config point,
      but instead just always default to-file pack-objects not to use bitmap
      index: Tools which need to generate on-disk packs with using bitmap, can
      pass --use-bitmap-index explicitly. And git-repack does never pass
      --use-bitmap-index, so this way we can be sure regular on-disk repacking
      remains robust.
      
      NOTE2
      
      `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower
      than `git pack-objects file.pack`. Extracting erp5.git pack from
      lab.nexedi.com backup repository:
      
          $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack
      
          real    0m22.309s
          user    0m21.148s
          sys     0m0.932s
      
          $ time git index-pack erp5pack-stdout.pack
      
          real    0m50.873s   <-- more than 2 times slower than time to generate pack itself!
          user    0m49.300s
          sys     0m1.360s
      
      So the time for
      
          `pack-object --stdout >file.pack` + `index-pack file.pack`  is  72s,
      
      while
      
          `pack-objects file.pack` which does both pack and index     is  27s.
      
      And even
      
          `pack-objects --no-use-bitmap-index file.pack`              is  37s.
      
      Jeff explains:
      
          The packfile does not carry the sha1 of the objects. A receiving
          index-pack has to compute them itself, including inflating and applying
          all of the deltas.
      
      that's why for `git-backup restore` we want to teach `git pack-objects
      file.pack` to use bitmaps instead of using `git pack-objects --stdout
      >file.pack` + `git index-pack file.pack`.
      
      NOTE3
      
      The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh
      
          Test                                    56dfeb62          this tree
          --------------------------------------------------------------------------------
          5310.2: repack to disk                  8.98(8.05+0.29)   9.05(8.08+0.33) +0.8%
          5310.3: simulated clone                 2.02(2.27+0.09)   2.01(2.25+0.08) -0.5%
          5310.4: simulated fetch                 0.81(1.07+0.02)   0.81(1.05+0.04) +0.0%
          5310.5: pack to file                    7.58(7.04+0.28)   7.60(7.04+0.30) +0.3%
          5310.6: pack to file (bitmap)           7.55(7.02+0.28)   3.25(2.82+0.18) -57.0%
          5310.8: clone (partial bitmap)          1.83(2.26+0.12)   1.82(2.22+0.14) -0.5%
          5310.9: pack to file (partial bitmap)   6.86(6.58+0.30)   2.87(2.74+0.20) -58.2%
      
      More context:
      
          http://marc.info/?t=146792101400001&r=1&w=2
          http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t
      
      Cc: Vicent Marti <tanoku@gmail.com>
      Helped-by: NJeff King <peff@peff.net>
      Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      645c432d
    • K
      pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use · 702d1b95
      Kirill Smelkov 提交于
      Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
      are two codepaths in pack-objects: with & without using bitmap
      reachability index.
      
      However add_object_entry_from_bitmap(), despite its non-bitmapped
      counterpart add_object_entry(), in no way does check for whether --local
      or --honor-pack-keep or --incremental should be respected. In
      non-bitmapped codepath this is handled in want_object_in_pack(), but
      bitmapped codepath has simply no such checking at all.
      
      The bitmapped codepath however was allowing to pass in all those options
      and with bitmap indices still being used under such conditions -
      potentially giving wrong output (e.g. including objects from non-local or
      .keep'ed pack).
      
      We can easily fix this by noting the following: when an object comes to
      add_object_entry_from_bitmap() it can come for two reasons:
      
          1. entries coming from main pack covered by bitmap index, and
          2. object coming from, possibly alternate, loose or other packs.
      
      "2" can be already handled by want_object_in_pack() and to cover
      "1" we can teach want_object_in_pack() to expect that *found_pack can be
      non-NULL, meaning calling client already found object's pack entry.
      
      In want_object_in_pack() we care to start the checks from already found
      pack, if we have one, this way determining the answer right away
      in case neither --local nor --honour-pack-keep are active. In
      particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do
      not do harm to served-with-bitmap clones performance-wise:
      
          Test                      56dfeb62          this tree
          -----------------------------------------------------------------
          5310.2: repack to disk    9.08(8.20+0.25)   9.09(8.14+0.32) +0.1%
          5310.3: simulated clone   1.92(2.12+0.08)   1.93(2.12+0.09) +0.5%
          5310.4: simulated fetch   0.82(1.07+0.04)   0.82(1.06+0.04) +0.0%
          5310.6: partial bitmap    1.96(2.42+0.13)   1.95(2.40+0.15) -0.5%
      
          Test                      56dfeb62          this tree
          -----------------------------------------------------------------
          5310.2: repack to disk    9.11(8.16+0.32)   9.11(8.19+0.28) +0.0%
          5310.3: simulated clone   1.93(2.14+0.07)   1.92(2.11+0.10) -0.5%
          5310.4: simulated fetch   0.82(1.06+0.04)   0.82(1.04+0.05) +0.0%
          5310.6: partial bitmap    1.95(2.38+0.16)   1.94(2.39+0.14) -0.5%
      
          Test                      56dfeb62          this tree
          -----------------------------------------------------------------
          5310.2: repack to disk    9.13(8.17+0.31)   9.07(8.13+0.28) -0.7%
          5310.3: simulated clone   1.92(2.13+0.07)   1.91(2.12+0.06) -0.5%
          5310.4: simulated fetch   0.82(1.08+0.03)   0.82(1.08+0.03) +0.0%
          5310.6: partial bitmap    1.96(2.43+0.14)   1.96(2.42+0.14) +0.0%
      
      with delta timings showing they are all within noise from run to run.
      
      In the general case we do not want to call find_pack_entry_one() more than
      once, because it is expensive. This patch splits the loop in
      want_object_in_pack() into two parts: finding the object and seeing if it
      impacts our choice to include it in the pack. We may call the inexpensive
      want_found_object() twice, but we will never call find_pack_entry_one() if we
      do not need to.
      
      I appreciate help and discussing this change with Junio C Hamano and
      Jeff King.
      Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      702d1b95
  4. 12 9月, 2016 3 次提交
  5. 10 9月, 2016 1 次提交
  6. 09 9月, 2016 2 次提交
  7. 08 9月, 2016 14 次提交