1. 12 8月, 2016 1 次提交
    • J
      gc: default aggressive depth to 50 · 07e7dbf0
      Jeff King 提交于
      This commit message is long and has lots of background and
      numbers. The summary is: the current default of 250 doesn't
      save much space, and costs CPU. It's not a good tradeoff.
      Read on for details.
      
      The "--aggressive" flag to git-gc does three things:
      
        1. use "-f" to throw out existing deltas and recompute from
           scratch
      
        2. use "--window=250" to look harder for deltas
      
        3. use "--depth=250" to make longer delta chains
      
      Items (1) and (2) are good matches for an "aggressive"
      repack. They ask the repack to do more computation work in
      the hopes of getting a better pack. You pay the costs during
      the repack, and other operations see only the benefit.
      
      Item (3) is not so clear. Allowing longer chains means fewer
      restrictions on the deltas, which means potentially finding
      better ones and saving some space. But it also means that
      operations which access the deltas have to follow longer
      chains, which affects their performance. So it's a tradeoff,
      and it's not clear that the tradeoff is even a good one.
      
      The existing "250" numbers for "--aggressive" come
      originally from this thread:
      
        http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/
      
      where Linus says:
      
        So when I said "--depth=250 --window=250", I chose those
        numbers more as an example of extremely aggressive
        packing, and I'm not at all sure that the end result is
        necessarily wonderfully usable. It's going to save disk
        space (and network bandwidth - the delta's will be re-used
        for the network protocol too!), but there are definitely
        downsides too, and using long delta chains may
        simply not be worth it in practice.
      
      There are some numbers in that thread, but they're mostly
      focused on the improved window size, and measure the
      improvement from --depth=250 and --window=250 together.
      E.g.:
      
        http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/
      
      talks about the improved run-time of "git-blame", which
      comes from the reduced pack size. But most of that reduction
      is coming from --window=250, whereas most of the extra costs
      come from --depth=250. There's a link in that thread showing
      that increasing the depth beyond 50 doesn't seem to help
      much with the size:
      
        https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html
      
      but again, no discussion of the timing impact.
      
      In an earlier thread from Ted Ts'o which discussed setting
      the non-aggressive default (from 10 to 50):
      
        http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/
      
      we have more numbers, with the conclusion that going past 50
      does not help size much, and hurts the speed of normal
      operations.
      
      So from that, we might guess that 50 is actually a sweet
      spot, even for aggressive, if we interpret aggressive to
      "spend time now to make a better pack". It is not clear that
      "--depth=250" is actually a better pack. It may be slightly
      _smaller_, but it carries a run-time penalty.
      
      Here are some more recent timings I did to verify that. They
      show three things:
      
        - the size of the resulting pack (so disk saved to store,
          bandwidth saved on clones/fetches)
      
        - the cost of "rev-list --objects --all", which shows the
          effect of the delta chains on trees (commits typically
          don't delta, and the command doesn't touch the blobs at
          all)
      
        - the cost of "log -Sfoo", which will additionally access
          each blob
      
      All cases were repacked with "git repack -adf --depth=$d
      --window=250" (so basically, what would happen if we tweaked
      the "gc --aggressive" default depth).
      
      The timings are all wall-clock best-of-3. The machine itself
      has plenty of RAM compared to the repositories (which is
      probably typical of most workstations these days), so we're
      really measuring CPU usage, as the whole thing will be in
      disk cache after the first run.
      
      The core.deltaBaseCacheLimit is at its default of 96MiB.
      It's possible that tweaking it would have some impact on the
      tests, as some of them (especially "log -S" on a large repo)
      are likely to overflow that. But bumping that carries a
      run-time memory cost, so for these tests, I focused on what
      we could do just with the on-disk pack tradeoffs.
      
      Each test is done for four depths: 250 (the current value),
      50 (the current default that tested well previously), 100
      (to show something on the larger side, which previous tests
      showed was not a good tradeoff), and 10 (the very old
      default, which previous tests showed was worse than 50).
      
      Here are the numbers for linux.git:
      
         depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
        -------+-------+-------+----------+--------+-----------+-------
          250  | 967MB |  n/a  | 48.159s  |   n/a  | 378.088   |   n/a
          100  | 971MB | +0.4% | 41.471s  | -13.9% | 342.060   |  -9.5%
           50  | 979MB | +1.2% | 37.778s  | -21.6% | 311.040s  | -17.7%
           10  | 1.1GB | +6.6% | 32.518s  | -32.5% | 279.890s  | -25.9%
      
      and for git.git:
      
         depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
        -------+-------+-------+----------+--------+-----------+-------
          250  |  48MB |  n/a  |  2.215s  |   n/a  |  20.922s  |   n/a
          100  |  49MB | +0.5% |  2.140s  |  -3.4% |  17.736s  | -15.2%
           50  |  49MB | +1.7% |  2.099s  |  -5.2% |  15.418s  | -26.3%
           10  |  53MB | +9.3% |  2.001s  |  -9.7% |  12.677s  | -39.4%
      
      You can see that that the CPU savings for regular operations improves as we
      decrease the depth. The savings are less for "rev-list" on a smaller repository
      than they are for blob-accessing operations, or even rev-list on a larger
      repository. This may mean that a larger delta cache would help (though setting
      core.deltaBaseCacheLimit by itself doesn't).
      
      But we can also see that the space savings are not that great as the depth goes
      higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
      Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
      probably not.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      07e7dbf0
  2. 27 6月, 2016 1 次提交
  3. 05 11月, 2015 1 次提交
  4. 27 10月, 2015 1 次提交
  5. 26 9月, 2015 1 次提交
    • J
      convert trivial sprintf / strcpy calls to xsnprintf · 5096d490
      Jeff King 提交于
      We sometimes sprintf into fixed-size buffers when we know
      that the buffer is large enough to fit the input (either
      because it's a constant, or because it's numeric input that
      is bounded in size). Likewise with strcpy of constant
      strings.
      
      However, these sites make it hard to audit sprintf and
      strcpy calls for buffer overflows, as a reader has to
      cross-reference the size of the array with the input. Let's
      use xsnprintf instead, which communicates to a reader that
      we don't expect this to overflow (and catches the mistake in
      case we do).
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      5096d490
  6. 22 9月, 2015 1 次提交
    • N
      gc: save log from daemonized gc --auto and print it next time · 329e6e87
      Nguyễn Thái Ngọc Duy 提交于
      While commit 9f673f94 (gc: config option for running --auto in
      background - 2014-02-08) helps reduce some complaints about 'gc
      --auto' hogging the terminal, it creates another set of problems.
      
      The latest in this set is, as the result of daemonizing, stderr is
      closed and all warnings are lost. This warning at the end of cmd_gc()
      is particularly important because it tells the user how to avoid "gc
      --auto" running repeatedly. Because stderr is closed, the user does
      not know, naturally they complain about 'gc --auto' wasting CPU.
      
      Daemonized gc now saves stderr to $GIT_DIR/gc.log. Following gc --auto
      will not run and gc.log printed out until the user removes gc.log.
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      329e6e87
  7. 13 8月, 2015 2 次提交
  8. 21 7月, 2015 1 次提交
  9. 29 6月, 2015 1 次提交
  10. 25 6月, 2015 1 次提交
    • J
      introduce "preciousObjects" repository extension · 067fbd41
      Jeff King 提交于
      If this extension is used in a repository, then no
      operations should run which may drop objects from the object
      storage. This can be useful if you are sharing that storage
      with other repositories whose refs you cannot see.
      
      For instance, if you do:
      
        $ git clone -s parent child
        $ git -C parent config extensions.preciousObjects true
        $ git -C parent config core.repositoryformatversion 1
      
      you now have additional safety when running git in the
      parent repository. Prunes and repacks will bail with an
      error, and `git gc` will skip those operations (it will
      continue to pack refs and do other non-object operations).
      Older versions of git, when run in the repository, will
      fail on every operation.
      
      Note that we do not set the preciousObjects extension by
      default when doing a "clone -s", as doing so breaks
      backwards compatibility. It is a decision the user should
      make explicitly.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      067fbd41
  11. 15 1月, 2015 1 次提交
  12. 02 12月, 2014 3 次提交
  13. 02 10月, 2014 1 次提交
  14. 08 8月, 2014 1 次提交
  15. 28 5月, 2014 1 次提交
  16. 01 4月, 2014 1 次提交
    • N
      gc --aggressive: make --depth configurable · 125f8146
      Nguyễn Thái Ngọc Duy 提交于
      When 1c192f34 (gc --aggressive: make it really aggressive - 2007-12-06)
      made --depth=250 the default value, it didn't really explain the
      reason behind, especially the pros and cons of --depth=250.
      
      An old mail from Linus below explains it at length. Long story short,
      --depth=250 is a disk saver and a performance killer. Not everybody
      agrees on that aggressiveness. Let the user configure it.
      
          From: Linus Torvalds <torvalds@linux-foundation.org>
          Subject: Re: [PATCH] gc --aggressive: make it really aggressive
          Date: Thu, 6 Dec 2007 08:19:24 -0800 (PST)
          Message-ID: <alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org>
          Gmane-URL: http://article.gmane.org/gmane.comp.gcc.devel/94637
      
          On Thu, 6 Dec 2007, Harvey Harrison wrote:
          >
          > 7:41:25elapsed 86%CPU
      
          Heh. And this is why you want to do it exactly *once*, and then just
          export the end result for others ;)
      
          > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46...pack
      
          But yeah, especially if you allow longer delta chains, the end result can
          be much smaller (and what makes the one-time repack more expensive is the
          window size, not the delta chain - you could make the delta chains longer
          with no cost overhead at packing time)
      
          HOWEVER.
      
          The longer delta chains do make it potentially much more expensive to then
          use old history. So there's a trade-off. And quite frankly, a delta depth
          of 250 is likely going to cause overflows in the delta cache (which is
          only 256 entries in size *and* it's a hash, so it's going to start having
          hash conflicts long before hitting the 250 depth limit).
      
          So when I said "--depth=250 --window=250", I chose those numbers more as
          an example of extremely aggressive packing, and I'm not at all sure that
          the end result is necessarily wonderfully usable. It's going to save disk
          space (and network bandwidth - the delta's will be re-used for the network
          protocol too!), but there are definitely downsides too, and using long
          delta chains may simply not be worth it in practice.
      
          (And some of it might just want to have git tuning, ie if people think
          that long deltas are worth it, we could easily just expand on the delta
          hash, at the cost of some more memory used!)
      
          That said, the good news is that working with *new* history will not be
          affected negatively, and if you want to be _really_ sneaky, there are ways
          to say "create a pack that contains the history up to a version one year
          ago, and be very aggressive about those old versions that we still want to
          have around, but do a separate pack for newer stuff using less aggressive
          parameters"
      
          So this is something that can be tweaked, although we don't really have
          any really nice interfaces for stuff like that (ie the git delta cache
          size is hardcoded in the sources and cannot be set in the config file, and
          the "pack old history more aggressively" involves some manual scripting
          and knowing how "git pack-objects" works rather than any nice simple
          command line switch).
      
          So the thing to take away from this is:
           - git is certainly flexible as hell
           - .. but to get the full power you may need to tweak things
           - .. happily you really only need to have one person to do the tweaking,
             and the tweaked end results will be available to others that do not
             need to know/care.
      
          And whether the difference between 320MB and 500MB is worth any really
          involved tweaking (considering the potential downsides), I really don't
          know. Only testing will tell.
      
      			    Linus
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      125f8146
  17. 19 3月, 2014 1 次提交
  18. 11 2月, 2014 1 次提交
  19. 01 2月, 2014 1 次提交
  20. 03 1月, 2014 1 次提交
    • K
      gc: notice gc processes run by other users · ed7eda8b
      Kyle J. McKay 提交于
      Since 64a99eb4 git gc refuses to run without the --force option if
      another gc process on the same repository is already running.
      
      However, if the repository is shared and user A runs git gc on the
      repository and while that gc is still running user B runs git gc on
      the same repository the gc process run by user A will not be noticed
      and the gc run by user B will go ahead and run.
      
      The problem is that the kill(pid, 0) test fails with an EPERM error
      since user B is not allowed to signal processes owned by user A
      (unless user B is root).
      
      Update the test to recognize an EPERM error as meaning the process
      exists and another gc should not be run (unless --force is given).
      Signed-off-by: NKyle J. McKay <mackyle@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ed7eda8b
  21. 11 12月, 2013 1 次提交
  22. 19 10月, 2013 1 次提交
    • J
      gc: remove gc.pid file at end of execution · 4c5baf02
      Jonathan Nieder 提交于
      This file isn't really harmful, but isn't useful either, and can create
      minor annoyance for the user:
      
      * It's confusing, as the presence of a *.pid file often implies that a
        process is currently running. A user running "ls .git/" and finding
        this file may incorrectly guess that a "git gc" is currently running.
      
      * Leaving this file means that a "git gc" in an already gc-ed repo is
        no-longer a no-op. A user running "git gc" in a set of repositories,
        and then synchronizing this set (e.g. rsync -av, unison, ...) will see
        all the gc.pid files as changed, which creates useless noise.
      
      This patch unlinks the file after the garbage collection is done, so that
      gc.pid is actually present only during execution.
      
      Future versions of Git may want to use the information left in the gc.pid
      file (e.g. for policies like "don't attempt to run a gc if one has
      already been ran less than X hours ago"). If so, this patch can safely be
      reverted. For now, let's not bother the users.
      Explained-by: NMatthieu Moy <Matthieu.Moy@imag.fr>
      Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
      Improved-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      4c5baf02
  23. 10 8月, 2013 1 次提交
  24. 06 8月, 2013 1 次提交
  25. 28 9月, 2012 1 次提交
  26. 21 8月, 2012 1 次提交
  27. 19 4月, 2012 1 次提交
    • J
      gc: use argv-array for sub-commands · 234587fc
      Jeff King 提交于
      git-gc executes many sub-commands. The argument list for
      some of these is constant, but for others we add more
      arguments at runtime. The latter is implemented by allocating
      a constant extra number of NULLs, and either using a custom
      append function, or just referencing unused slots by number.
      
      As of commit 7e52f566, which added two new arguments, it is
      possible to exceed the constant number of slots for "repack"
      by running "git gc --aggressive", causing "git gc" to die.
      
      This patch converts all of the static argv lists to use
      argv-array. In addition to fixing the overflow caused by
      7e52f566, it has a few advantages:
      
        1. We can drop the custom append function (which,
           incidentally, had an off-by-one error exacerbating the
           static limit).
      
        2. We can drop the ugly magic numbers used when adding
           arguments to "prune".
      
        3. Adding further arguments will be easier; you can just
           add new "push" calls without worrying about increasing
           any static limits.
      Signed-off-by: NJeff King <peff@peff.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      234587fc
  28. 12 4月, 2012 1 次提交
    • J
      gc: do not explode objects which will be immediately pruned · 7e52f566
      Jeff King 提交于
      When we pack everything into one big pack with "git repack
      -Ad", any unreferenced objects in to-be-deleted packs are
      exploded into loose objects, with the intent that they will
      be examined and possibly cleaned up by the next run of "git
      prune".
      
      Since the exploded objects will receive the mtime of the
      pack from which they come, if the source pack is old, those
      loose objects will end up pruned immediately. In that case,
      it is much more efficient to skip the exploding step
      entirely for these objects.
      
      This patch teaches pack-objects to receive the expiration
      information and avoid writing these objects out. It also
      teaches "git gc" to pass the value of gc.pruneexpire to
      repack (which in turn learns to pass it along to
      pack-objects) so that this optimization happens
      automatically during "git gc" and "git gc --auto".
      Signed-off-by: NJeff King <peff@peff.net>
      Acked-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      7e52f566
  29. 08 11月, 2011 1 次提交
  30. 20 6月, 2011 1 次提交
  31. 10 3月, 2011 2 次提交
  32. 16 11月, 2010 2 次提交
  33. 23 10月, 2010 1 次提交
  34. 23 2月, 2010 1 次提交
    • L
      Move 'builtin-*' into a 'builtin/' subdirectory · 81b50f3c
      Linus Torvalds 提交于
      This shrinks the top-level directory a bit, and makes it much more
      pleasant to use auto-completion on the thing. Instead of
      
      	[torvalds@nehalem git]$ em buil<tab>
      	Display all 180 possibilities? (y or n)
      	[torvalds@nehalem git]$ em builtin-sh
      	builtin-shortlog.c     builtin-show-branch.c  builtin-show-ref.c
      	builtin-shortlog.o     builtin-show-branch.o  builtin-show-ref.o
      	[torvalds@nehalem git]$ em builtin-shor<tab>
      	builtin-shortlog.c  builtin-shortlog.o
      	[torvalds@nehalem git]$ em builtin-shortlog.c
      
      you get
      
      	[torvalds@nehalem git]$ em buil<tab>		[type]
      	builtin/   builtin.h
      	[torvalds@nehalem git]$ em builtin		[auto-completes to]
      	[torvalds@nehalem git]$ em builtin/sh<tab>	[type]
      	shortlog.c     shortlog.o     show-branch.c  show-branch.o  show-ref.c     show-ref.o
      	[torvalds@nehalem git]$ em builtin/sho		[auto-completes to]
      	[torvalds@nehalem git]$ em builtin/shor<tab>	[type]
      	shortlog.c  shortlog.o
      	[torvalds@nehalem git]$ em builtin/shortlog.c
      
      which doesn't seem all that different, but not having that annoying
      break in "Display all 180 possibilities?" is quite a relief.
      
      NOTE! If you do this in a clean tree (no object files etc), or using an
      editor that has auto-completion rules that ignores '*.o' files, you
      won't see that annoying 'Display all 180 possibilities?' message - it
      will just show the choices instead.  I think bash has some cut-off
      around 100 choices or something.
      
      So the reason I see this is that I'm using an odd editory, and thus
      don't have the rules to cut down on auto-completion.  But you can
      simulate that by using 'ls' instead, or something similar.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      81b50f3c
  35. 04 12月, 2009 1 次提交