1. 19 1月, 2007 2 次提交
    • S
      Use fixed-size integers when writing out the index in fast-import. · ebea9dd4
      Shawn O. Pearce 提交于
      Currently the pack .idx file format uses 32-bit unsigned integers
      for the fan-out table and the object offsets.  We had previously
      defined these as 'unsigned int', but not every system will define
      that type to be a 32 bit value.  To ensure maximum portability we
      should always use 'uint32_t'.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      ebea9dd4
    • S
      Always use struct pack_header for pack header in fast-import. · 566f4425
      Shawn O. Pearce 提交于
      Previously we were using 'unsigned int' to update the hdr_entries
      field of the pack header after the file had been completed and
      was being hashed.  This may not be 32 bits on all platforms.
      Instead we want to always uint32_t.
      
      I'm actually cheating here by just using the pack_header like the
      rest of Git and letting the struct definition declare the correct
      type.  Right now that field is still 'unsigned int' (wrong) but a
      pending change submitted by Simon 'corecode' Schubert changes it
      to uint32_t.  After that change is merged in fast-import will do
      the right thing all of the time.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      566f4425
  2. 17 1月, 2007 5 次提交
    • S
      Correct packfile edge output in fast-import. · 69e74e74
      Shawn O. Pearce 提交于
      Branches are only contained by a packfile if the branch actually
      had its most recent commit in that packfile.  So new branches are
      set to MAX_PACK_ID to ensure they don't cause their commit to list
      as part of the first packfile when it closes out if the commit was
      actually in existance before fast-import started.
      
      Also corrected the type of last_commit to be umaxint_t to prevent
      overflow and wraparound on very large imports.  Though that is
      highly unlikely to occur as we're talking 4 billion commits, which
      no real project has right now.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      69e74e74
    • S
      Declare no-arg functions as (void) in fast-import. · fd99224e
      Shawn O. Pearce 提交于
      Apparently the git convention is to declare any function which
      takes no arguments as taking void.  I did not do this during the
      early fast-import development, but should have.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      fd99224e
    • S
      Correct a few types to be unsigned in fast-import. · 6f64f6d9
      Shawn O. Pearce 提交于
      The length of an atom string cannot be negative.  So make it
      explicit and declare it as an unsigned value.
      
      The shift width in a mark table node also cannot be negative.
      I'm also moving it to after the pointer arrays to prevent any
      possible alignment problems on a 64 bit system.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      6f64f6d9
    • S
      Corrected BNF input documentation for fast-import. · 2104838b
      Shawn O. Pearce 提交于
      Now that fast-import uses uintmax_t (the largest available unsigned
      integer type) for marks we don't want to say its an unsigned 32
      bit integer in ASCII base 10 notation.  It could be much larger,
      especially on 64 bit systems, and especially if a frontend uses
      a very large number of marks (1 per file revision on a very, very
      large import).
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      2104838b
    • S
      Print out the edge commits for each packfile in fast-import. · 2369ed79
      Shawn O. Pearce 提交于
      To help callers repack very large repositories into a series of
      packfiles fast-import now outputs the last commits/tags it wrote to
      a packfile when it prints out the packfile name.  This information
      can be feed to pack-objects --revs to repack.  For the first pack
      of an initial import this is pretty easy (just feed those SHA1s on
      stdin) but for subsequent packs you want to feed the subsequent
      pack's final SHA1s but also all prior pack's SHA1s prefixed with
      the negation operator.  This way the prior pack's data does not
      get included into the subsequent pack.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      2369ed79
  3. 16 1月, 2007 9 次提交
    • S
      Correct object_count type and stat output in fast-import. · a7ddc487
      Shawn O. Pearce 提交于
      Since object_count is limited to 'unsigned long' (really an
      unsigned 32 bit integer value) by the pack file format we may as
      well use exactly that type here in fast-import for that counter.
      An earlier change by me incorrectly made it uintmax_t.
      
      But since object_count is a counter for the current packfile only,
      we don't want to output its value at the end.  Instead we should
      sum up the individual type counters and report that total, as that
      will cover all of the packfiles.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      a7ddc487
    • S
      Correct max_packsize default in fast-import. · eec11c24
      Shawn O. Pearce 提交于
      Apparently amd64 has defined 'unsigned long' to be a 64 bit value,
      which means -1 was way over the 4 GiB packfile limit.  Whoops.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      eec11c24
    • S
      Remove unnecessary pack_fd global in fast-import. · 0fcbcae7
      Shawn O. Pearce 提交于
      Much like the pack_sha1 the pack_fd is an unnecessary global
      variable, we already have the fd stored in our struct packed_git
      *pack_data so that the core library functions in sha1_file.c are
      able to lookup and decompress object data that we have previously
      written.  Keeping an extra copy of this value in our own variable
      is just a hold-over from earlier versions of fast-import and is
      now completely unnecessary.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      0fcbcae7
    • S
      Ensure we close the packfile after creating it in fast-import. · 12801587
      Shawn O. Pearce 提交于
      Because we are renaming the packfile into its file destination we
      need to be sure its not open when the rename is called, otherwise
      some operating systems (e.g. Windows) may prevent the rename from
      occurring.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      12801587
    • S
      Use .keep files in fast-import during processing. · 8455e484
      Shawn O. Pearce 提交于
      Because fast-import automatically updates all references (heads
      and tags) at the end of its run the repository is corrupt unless
      the objects are available in the .git/objects/pack directory prior
      to the refs being modified.  The easiest way to ensure that is true
      is to move the packfile and its associated index directly into the
      .git/objects/pack directory as soon as we have finished output to it.
      
      But the only safe way to do this is to create the a temporary .keep
      file for that pack, so we use the same tricks that index-pack uses
      when its being invoked by receive-pack.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      8455e484
    • S
      Reuse sha1 in packed_git in fast-import. · 09543c96
      Shawn O. Pearce 提交于
      Rather than maintaing our own packfile level sha1 variable we
      can make use of the one already available in struct packed_git.
      Its meant for the SHA1 of the index but it can also hold the
      SHA1 of the packfile itself between final checksumming of the
      packfile and creation of the index.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      09543c96
    • S
      Replace redundant yread() with read_in_full() in fast-import. · 6cf09261
      Shawn O. Pearce 提交于
      Prior to git having read_in_full() fast-import used its own private
      function yread to perform the header reading task.  No sense in
      keeping that around now that read_in_full is a public, stable
      function.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      6cf09261
    • S
      Use uintmax_t for marks in fast-import. · 0ea9f045
      Shawn O. Pearce 提交于
      If a frontend wants to use a mark per file revision and per commit
      and is doing a truly huge import (such as a 32 GiB SVN repository)
      we may need more than 2**32 unique mark values, especially if the
      frontend is unable (or unwilling) to recycle mark values.  For mark
      idnums we should use the largest unsigned integer type available,
      hoping that will be at least 64 bits when we are compiled as a 64
      bit executable.  This way we may consume huge amounts of memory
      storing our mark table, but we'll at least be able to process
      the entire import without failing.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      0ea9f045
    • S
      Corrected buffer overflow during automatic checkpoint in fast-import. · 5d6f3ef6
      Shawn O. Pearce 提交于
      If we previously were using a delta but we needed to checkpoint the
      current packfile and switch to a new packfile we need to throw away
      the delta and compress the raw object by itself, as delta chains
      cannot span non-thin packfiles.  Unfortunately the output buffer
      in this case needs to grow, as the size of the compressed object
      may be quite a bit larger than the size of the compressed delta.
      
      I've also avoided recompressing the object if we are checkpointing
      and we didn't use a delta.  In this case the output buffer is the
      correct size and has already been populated with the right data,
      we just need to close out the current packfile and open a new one.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      5d6f3ef6
  4. 15 1月, 2007 9 次提交
    • S
      Print the packfile names to stdout from fast-import. · 9d1b1b5e
      Shawn O. Pearce 提交于
      Caller scripts may want to know what packfiles the fast-import
      process just wrote out for them.  This is now output to stdout,
      one packfile name per line, after we checkpoint each packfile.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      9d1b1b5e
    • S
      Implemented automatic checkpoints within fast-import. · d9ee53ce
      Shawn O. Pearce 提交于
      When the number of objects or number of bytes gets close to the limit
      allowed by the packfile format (or configured on the command line by
      our caller) we should automatically checkpoint the current packfile
      and start a new one before writing the object out.  This does however
      require that we abandon the delta (if we had one) as its not valid
      in a new packfile.
      
      I also added the simple rule that if we got a delta back but the
      delta itself is the same size as or larger than the uncompressed
      object to ignore the delta and just store the object data.  This
      should avoid some really bad behavior caused by our current delta
      strategy.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      d9ee53ce
    • S
      Optimize index creation on large object sets in fast-import. · 2fce1f3c
      Shawn O. Pearce 提交于
      When we are generating multiple packfiles at once we only need
      to scan the blocks of object_entry structs which contain objects
      for the current packfile.  Because the most recent blocks are at
      the front of the linked list, and because all new objects going
      into the current file are allocated from the front of that list,
      we can stop scanning for objects as soon as we identify one which
      doesn't belong to the current packfile.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      2fce1f3c
    • S
      Don't create a final empty packfile in fast-import. · 3e005baf
      Shawn O. Pearce 提交于
      If the last packfile is going to be empty (has 0 objects) then it
      shouldn't be kept after the import has terminated, as there is no
      point to the packfile.  So rather than hashing it and making the
      index file, just delete the packfile.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      3e005baf
    • S
      Implemented manual packfile switching in fast-import. · 7bfe6e26
      Shawn O. Pearce 提交于
      To help importers which are dealing with massive amounts of data
      fast-import needs to be able to close the packfile it is currently
      writing to and open a new packfile for any additional data that
      will be received.  A new 'checkpoint' command has been introduced
      which can be used by the frontend import process to force this
      to occur at any time.  This may be useful to ensure a very long
      running import doesn't lose any work due to unexpected failures.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      7bfe6e26
    • S
      Remove unnecessary duplicate_count in fast-import. · 80144727
      Shawn O. Pearce 提交于
      There is little reason to be keeping a global duplicate_count
      value when we also keep it per object type.  The global counter can
      easily be computed at the end, once all processing has completed.
      This saves us a couple of machine instructions in an unimportant
      part of code.  But it looks slightly better to me to not keep
      two counters around.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      80144727
    • S
      Restructure fast-import to support creating multiple packfiles. · f70b6534
      Shawn O. Pearce 提交于
      Now that we are starting to see some really large projects (such
      as KDE or a fork of FreeBSD) get imported into Git we're running
      into the upper limit on packfile object count as well as overall
      byte length.  The KDE and FreeBSD projects are both likely to
      require more than 4 GiB to store their current history, which means
      we really need multiple packfiles to handle their content.
      
      This is a fairly simple restructuring of the internal code to help
      us support creating multiple packfiles from within fast-import.
      We are now adding a 5 digit incrementing suffix to the end of the
      basename supplied to us by the caller, permitting up to 99,999
      packs to be generated in a single fast-import run.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      f70b6534
    • S
      Misc. type cleanups within fast-import. · 03842d8e
      Shawn O. Pearce 提交于
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      03842d8e
    • S
      Improve reuse of sha1_file library within fast-import. · d489bc14
      Shawn O. Pearce 提交于
      Now that the sha1_file.c library routines use the sliding mmap
      routines to perform efficient access to portions of a packfile
      I can remove that code from fast-import.c and just invoke it.
      One benefit is we now have reloading support for any packfile which
      uses OBJ_OFS_DELTA.  Another is we have significantly less code
      to maintain.
      
      This code reuse change *requires* that fast-import generate only
      an OBJ_OFS_DELTA format packfile, as there is absolutely no index
      available to perform OBJ_REF_DELTA lookup in while unpacking
      an object.  This is probably reasonable to require as the delta
      offsets result in smaller packfiles and are faster to unpack,
      as no index searching is required.  Its also only a temporary
      requirement as users could always repack without offsets before
      making the import available to older versions of Git.
      Signed-off-by: NShawn O. Pearce <spearce@spearce.org>
      d489bc14
  5. 14 1月, 2007 15 次提交