• J
    fetch: avoid quadratic loop checking for updated submodules · 6859de45
    Jeff King 提交于
    Recent versions of git can be slow to fetch repositories with a
    large number of refs (or when they already have a large
    number of refs). For example, GitHub makes pull-requests
    available as refs, which can lead to a large number of
    available refs. This slowness goes away when submodule
    recursion is turned off:
    
      $ git ls-remote git://github.com/rails/rails.git | wc -l
      3034
    
      [this takes ~10 seconds of CPU time to complete]
      git fetch --recurse-submodules=no \
        git://github.com/rails/rails.git "refs/*:refs/*"
    
      [this still isn't done after 10 _minutes_ of pegging the CPU]
      git fetch \
        git://github.com/rails/rails.git "refs/*:refs/*"
    
    You can produce a quicker and simpler test case like this:
    
      doit() {
        head=`git rev-parse HEAD`
        for i in `seq 1 $1`; do
          echo $head refs/heads/ref$i
        done >.git/packed-refs
        echo "==> $1"
        rm -rf dest
        git init -q --bare dest &&
          (cd dest && time git.compile fetch -q .. refs/*:refs/*)
      }
    
      rm -rf repo
      git init -q repo && cd repo &&
      >file && git add file && git commit -q -m one
    
      doit 100
      doit 200
      doit 400
      doit 800
      doit 1600
      doit 3200
    
    Which yields timings like:
    
      # refs  seconds of CPU
         100            0.06
         200            0.24
         400            0.95
         800            3.39
        1600           13.66
        3200           54.09
    
    Notice that although the number of refs doubles in each
    trial, the CPU time spent quadruples.
    
    The problem is that the submodule recursion code works
    something like:
    
      - for each ref we fetch
        - for each commit in git rev-list $new_sha1 --not --all
          - add modified submodules to list
      - fetch any newly referenced submodules
    
    But that means if we fetch N refs, we start N revision
    walks. Worse, because we use "--all", the number of refs we
    must process that constitute "--all" keeps growing, too. And
    you end up doing O(N^2) ref resolutions.
    
    Instead, this patch structures the code like this:
    
      - for each sha1 we already have
        - add $old_sha1 to list $old
      - for each ref we fetch
        - add $new_sha1 to list $new
      - for each commit in git rev-list $new --not $old
        - add modified submodules to list
      - fetch any newly referenced submodules
    
    This yields timings like:
    
      # refs  seconds of CPU
      100               0.00
      200               0.04
      400               0.04
      800               0.10
      1600              0.21
      3200              0.39
    
    Note that the amount of effort doubles as the number of refs
    doubles. Similarly, the fetch of rails.git takes about as
    much time as it does with --recurse-submodules=no.
    Signed-off-by: NJeff King <peff@peff.net>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    6859de45
submodule.c 22.8 KB